Demystifying Regular Expressions in Ruby (1/2)
While being recognized as powerful, compact, and expressive, Regular Expressions (or RegExps) also have a reputation for being notoriously hard for humans to parse. In fact, a great developer once said this about them:
shaving yaks in a rabbit hole
sum it up: "for your sanity, don't do regexp" :D
In this series of posts, sanity is preserved as we review by example. We'll see how RegExps are particularly effective at finding patterns in text, along with some less well known tricks that can improve readability.
In this post we'll focus on how they are particularly effective at finding matches in text, along with some details on how it all works in Ruby.
Kinds of characters
There are various kinds of characters used in RegExp. Some common ones include:
- Literals: Matches the character in the target string.
- Escaping:
\
escapes a meta-character to be matched as a literal. - Wildcard:
.
means match any character. - Character classes: a specific set of characters to match (in any order).
- Repetition:
+
means match one or more times.
A few examples of character classes are:
[aeiou] # Any vowel.
\w # Any word character.
[[:blank:]] # Space or tab.
Searching for whether a pattern exists
Given some text to match on, say, some Markdown:
markdown = <<-END
I have a graph showing incredible statistics.
You will be amazed by the clarity it brings!
Behold: 
And yet another:

END
We can use =~
, #match
, or ===
methods to detect if a png
image is present:
if /\w*\.png/ =~ markdown
puts 'Found it using a squiggly.'
end
if /\w*\.png/.match(markdown)
puts 'Found a MATCH!'
end
if /\(\w*\.png/ === markdown
puts 'Found it with a looong equals sign!'
end
The above methods all achieve the same thing: a truthy value for whether a match was successful (nil
also evaluates to false
in Ruby).
Further, they are all defined on the Regexp class, and the /
character delimits a literal RegExp. It's just as valid to do:
match = RegExp.new('\w*\.png').match(markdown)
# Which returns...
# => #<MatchData "the_graph.png">
String methods
The above examples all used methods defined on Regexp
. However, it's very interesting to note that they're all defined on String too. This means you can switch around the order:
if markdown =~ /\w*\.png/
puts 'Yup, it's a png alright.'
end
if markdown.match(/\w*\.png/)
puts 'Yup, works that way too!'
end
Actually, as we'll see RegExp are commonly seen in many String
methods.
So which do I use?
As a matter of style, I like to use =~
when searching for text, since it looks more like an operator. You just need to remember the equal sign goes first, then the tilde.
The #match
method is useful when you want more information, as it returns what was matched, including captured substrings.
Finally, the triple equals is known as "case equality" since it is what Ruby calls in case expressions, so it's useful when you have several matches:
what_i_found = case markdown
when /\w*\.png/ then 'A png.'
when /jpg/ then 'A jpg.'
else 'Don't know.'
end
It's worth mentioning that the Ruby Style Guide recommends using []
for simple matches. It's an alias of #slice
and returns the matching string.
I think of it like a window looking into a part of the string.
matched_string = markdown[/\w*\.png/]
if !matched_string.empty?
puts "Found #{matched_string} with square brackets."
end
# Returns:
# Found the_graph.png with square brackets.
# => nil
Indexing
We used =~
above to detect pattern matches in a string, but to be honest, it really returns the index within the string. It works like a boolean above since it returns nil
when there's no match, and the positional index within the string otherwise.
I like to think of it as "equals-squiggle" since the RegExp can indeed be a squiggly looking mess.
As mentioned above, we'll get the index returned, so actually:
pos = markdown =~ /\w*\.png/
# Returns:
# => 120
...however, if we actually want the index, then it would be more intention revealing if we used String#index
:
pos = markdown.index(/\w*\.png/)
# Also returns:
# => 120
We can reverse the process to see what's there using []
with a range starting at pos
120
:
markdown[120..132]
# Returns:
# => 'the_graph.png'
Search and replace
As well as checking for whether a pattern exists, we can easily run a search-and-replace type operation using #sub
or #gsub
; they stand for substitute and global-substitute respectively.
markdown.sub(/\.png/, '.jpg')
=> "I have a graph showing incredible statistics.
You will be amazed by the clarity it brings!
Behold: 
And yet another:
"
If you look carefully, you'll see in the above that only the first .png
got replaced. This is where #gsub
is more useful:
markdown.gsub(/\.png/, '.jpg')
=> "I have a graph showing incredible statistics.
You will be amazed by the clarity it brings!
Behold: 
And yet another:
#"
Splitting strings up
In the following we use a regular expression to define the delimiter to split the string on:
markdown.split(/\W+/)
# Returns:
# => ["I", "have", "a", "graph", "showing", "incredible", "statistics",
# "Behold", "Incredible", "Graph", "the_graph", "png", "Graph"]
Here \W
(upper-case) means any non-word character, so along with the +
it uses these as delimiters, effectively pulling out the words.
Of course, this can be done using the inverse logic -- and #scan
:
markdown.scan(/\w+/)
Another way of splitting strings is to partition on a split pattern. Here's an example showing the use of #partition
returning the pre-matched, matched and post-matched text:
"Can you find _emphasis_ in your text?".partition(/_.+_/)
# Returns:
# => ["Can you find ", "_emphasis_", " in your text?"]
Where to go next?
So far we've seen simple patterns be used in various ways, including checking for existence, looking up positions, search-and-replace, and splitting strings up. The next post in this series will look at how we can extract a more complicated pattern-match, specifically, the image details.
Latest Articles by Our Team
Our expert team of designers and developers love what the do and enjoy sharing their knowledge with the world.
-
No app left behind: Upgrade your application to Ruby 3.0 and s...
-
A look forward from 2020
-
Testing Rails applications on real mobile devices (both design...
We Hire Only the Best
reinteractive is Australia’s largest dedicated Ruby on Rails development company. We don’t cut corners and we know what we are doing.
We are an organisation made up of amazing individuals and we take pride in our team. We are 100% remote work enabling us to choose the best talent no matter which part of the country they live in. reinteractive is dedicated to making it a great place for any developer to work.
Free Community Workshops
We created the Ruby on Rails InstallFest and Ruby on Rails Development Hub to help introduce new people to software development and to help existing developers hone their skills. These workshops provide invaluable mentorship to train developers, addressing key skills shortages in the industry. Software development is a great career choice for all ages and these events help you get started and skilled up.