Skip to content
PLAY VIDEO PLAY VIDEO PLAY VIDEO
By Adam Davies

Demystifying Regular Expressions in Ruby (1/2)

While being recognized as powerful, compact, and expressive, Regular Expressions (or RegExps) also have a reputation for being notoriously hard for humans to parse. In fact, a great developer once said this about them:

shaving yaks in a rabbit hole
sum it up: "for your sanity, don't do regexp" :D

In this series of posts, sanity is preserved as we review by example. We'll see how RegExps are particularly effective at finding patterns in text, along with some less well known tricks that can improve readability.

In this post we'll focus on how they are particularly effective at finding matches in text, along with some details on how it all works in Ruby.

Kinds of characters

There are various kinds of characters used in RegExp. Some common ones include:

  • Literals: Matches the character in the target string.
  • Escaping: \ escapes a meta-character to be matched as a literal.
  • Wildcard: . means match any character.
  • Character classes: a specific set of characters to match (in any order).
  • Repetition: + means match one or more times.

A few examples of character classes are:

[aeiou]     # Any vowel.
\w          # Any word character.
[[:blank:]] # Space or tab.

Searching for whether a pattern exists

Given some text to match on, say, some Markdown:

markdown = <<-END
  I have a graph showing incredible statistics.
  You will be amazed by the clarity it brings!
  Behold: ![Incredible Graph](/the_graph.png "Graph")
  And yet another:
  ![Graph2](/other_graph.png)
END

We can use =~, #match, or === methods to detect if a png image is present:

if /\w*\.png/ =~ markdown
  puts 'Found it using a squiggly.'
end
if /\w*\.png/.match(markdown)
  puts 'Found a MATCH!'
end
if /\(\w*\.png/ === markdown
  puts 'Found it with a looong equals sign!'
end

The above methods all achieve the same thing: a truthy value for whether a match was successful (nil also evaluates to false in Ruby).

Further, they are all defined on the Regexp class, and the / character delimits a literal RegExp. It's just as valid to do:

match = RegExp.new('\w*\.png').match(markdown)

# Which returns...
# => #<MatchData "the_graph.png">

String methods

The above examples all used methods defined on Regexp. However, it's very interesting to note that they're all defined on String too. This means you can switch around the order:

if markdown =~ /\w*\.png/
  puts 'Yup, it's a png alright.'
end
if markdown.match(/\w*\.png/)
  puts 'Yup, works that way too!'
end

Actually, as we'll see RegExp are commonly seen in many String methods.

So which do I use?

As a matter of style, I like to use =~ when searching for text, since it looks more like an operator. You just need to remember the equal sign goes first, then the tilde.

The #match method is useful when you want more information, as it returns what was matched, including captured substrings.

Finally, the triple equals is known as "case equality" since it is what Ruby calls in case expressions, so it's useful when you have several matches:

what_i_found = case markdown
  when /\w*\.png/ then 'A png.'
  when /jpg/ then 'A jpg.'
  else 'Don't know.'
end

It's worth mentioning that the Ruby Style Guide recommends using [] for simple matches. It's an alias of #slice and returns the matching string.

I think of it like a window looking into a part of the string.

matched_string = markdown[/\w*\.png/]
if !matched_string.empty?
  puts "Found #{matched_string} with square brackets."
end

# Returns:
# Found the_graph.png with square brackets.
# => nil

Indexing

We used =~ above to detect pattern matches in a string, but to be honest, it really returns the index within the string. It works like a boolean above since it returns nil when there's no match, and the positional index within the string otherwise.

I like to think of it as "equals-squiggle" since the RegExp can indeed be a squiggly looking mess.

As mentioned above, we'll get the index returned, so actually:

pos = markdown =~ /\w*\.png/

# Returns:
# => 120

...however, if we actually want the index, then it would be more intention revealing if we used String#index:

pos = markdown.index(/\w*\.png/)

# Also returns:
# => 120

We can reverse the process to see what's there using [] with a range starting at pos 120:

markdown[120..132]

# Returns:
# => 'the_graph.png'

Search and replace

As well as checking for whether a pattern exists, we can easily run a search-and-replace type operation using #sub or #gsub; they stand for substitute and global-substitute respectively.

markdown.sub(/\.png/, '.jpg')
=> "I have a graph showing incredible statistics.
   You will be amazed by the clarity it brings!
   Behold: ![Incredible Graph](/the_graph.jpg \"Graph\")
   And yet another:
   ![Graph2](/other_graph.png)"

If you look carefully, you'll see in the above that only the first .png got replaced. This is where #gsub is more useful:

markdown.gsub(/\.png/, '.jpg')
=> "I have a graph showing incredible statistics.
   You will be amazed by the clarity it brings!
   Behold: ![Incredible Graph](/the_graph.jpg \"Graph\")
   And yet another:
   #![Graph2](/other_graph.jpg)"

Splitting strings up

In the following we use a regular expression to define the delimiter to split the string on:

markdown.split(/\W+/)

# Returns:
# => ["I", "have", "a", "graph", "showing", "incredible", "statistics",
#     "Behold", "Incredible", "Graph", "the_graph", "png", "Graph"]

Here \W (upper-case) means any non-word character, so along with the + it uses these as delimiters, effectively pulling out the words.

Of course, this can be done using the inverse logic -- and #scan:

markdown.scan(/\w+/)

Another way of splitting strings is to partition on a split pattern. Here's an example showing the use of #partition returning the pre-matched, matched and post-matched text:

"Can you find _emphasis_ in your text?".partition(/_.+_/)

# Returns:
# => ["Can you find ", "_emphasis_", " in your text?"]

Where to go next?

So far we've seen simple patterns be used in various ways, including checking for existence, looking up positions, search-and-replace, and splitting strings up. The next post in this series will look at how we can extract a more complicated pattern-match, specifically, the image details.

Latest Articles by Our Team

Our expert team of designers and developers love what the do and enjoy sharing their knowledge with the world.

We Hire Only the Best

reinteractive is Australia’s largest dedicated Ruby on Rails development company. We don’t cut corners and we know what we are doing.

We are an organisation made up of amazing individuals and we take pride in our team. We are 100% remote work enabling us to choose the best talent no matter which part of the country they live in. reinteractive is dedicated to making it a great place for any developer to work.

Free Community Workshops

We created the Ruby on Rails InstallFest and Ruby on Rails Development Hub to help introduce new people to software development and to help existing developers hone their skills. These workshops provide invaluable mentorship to train developers, addressing key skills shortages in the industry. Software development is a great career choice for all ages and these events help you get started and skilled up.

  • Webinars

    Webinars

    Webinars are our online portal for tips, tricks and lessons learned in everything we do. Make the most of this free resource to help you become a better developer.

    Learn more about webinars

  • Installfest

    Installfest

    The Ruby on Rails Installfest includes a full setup of your development environment and step-by-step instructions on how to build your first app hosted on Heroku. Over 1,800 attendees to date and counting.

    Learn more about Installfest

  • Development Hub

    Development Hub

    The Ruby on Rails Development Hub is a monthly event where you will get the chance to spend time with our team and others in the community to improve and hone your Ruby on Rails skills.

    Learn more about Development Hub

Get the “reinteractive Review” Monthly Email