Skip to content
By Adam Davies

Demystifying Regular Expressions in Ruby (2/2)

While being recognized as powerful, compact, and expressive, Regular Expressions (or RegExps) also have a reputation of being notoriously hard for humans to parse. In fact, a great developer once said this about using regular expressions:

Now you have two problems!

In this post we follow up from part 1 and we'll break down a regular expression that matches Markdown image links:

markdown = <<-END
I have a graph showing incredible statistics.
You will be amazed by the clarity it brings!
Behold: ![Incredible Graph](/the_graph.png "Graph")

And yet another:
![Graph2](/other_graph.png)
END

The code to match the image links would look something like this:

# Matching a pattern for Markdown images that look like:
#
#    ![<alt_text>](<url> "<optional_title>")
#
if markdown =~ /!\[.*\]\(.*?( ".*")?\)/
  puts 'Ugh, I guess a markdown image is in there?'
end

Breaking down a complex regular expression

If you're unfamiliar with the characters used in regular expressions for escaping, wildcard matching, grouping and repeating, then you could be forgiven for not fully understanding the above. Even if you do, it takes a bit of squinting to see the matches and escaping; it's certainly not super readable.

Let's break it down to better understand it. Starting with the original pattern, wrapped in regular expression / markers:

/![<alt_text>](<url> "<optional_title>")/
-                                       -

First, we have to escape meta-characters that have special meanings by prefixing with a '\', and this applies to '[', '(', and ']':

/!\[<alt_text>\]\(<url> "<optional_title>"\)/
  -           - -                         -

Next, let's match any character (using wildcard .) zero or more times (using *) to make the pattern work for whatever <alt_text>, <url> and <optional_title> happen to be:

/!\[.*\]\(.* ".*"\)/
    --    --  --

Now that's close, but there's a small flaw. The problem is that the title should be optional, yet with the above it isn't!

To make it optional we wrap the title match in parens ( and ) to group it as a single unit, then append a ? to match exactly zero or one times.

/!\[.*\]\(.*( ".*")?\)/  # Unescaped parens mean "group".
            -     --

Now it's pretty close, except for a problem of greediness...

Greediness and laziness

The greediness or laziness factor can be hard to visualise, so lets consider what's going on in this particular case.

What we intend:

# The example:
# ![<alt_text>](<url> "<optional_title>")

/!\[.*\]\(.*( ".*")?\)/
          -- -----
         /        \
the <url>      the <optional_title>

The URL is intended to match by the .* pattern, but this means zero or more times, and more means as many as possible. Due to this, it will match on all the characters including the title, all the way up to )!

This is known as "greedy" matching: match as much as possible while satisfying the rest of the regular expression, which is possible here since the title group is optional! This is really important when trying to extract the matched text.

The solution is to stop being greedy! We can do that by appending a ?:

/!\[.*\]\(.*?( \".*\")?\)/
            -

Now that we're using .*? we've made the * lazy, and it will match as few repetitions as possible for the <url> while still matching overall.

Readability

We should aim for code that's easy to read at a glance, since developers spend a lot more time reading and understanding code than writing it.

One of the tricks Ruby gives us is the ability to break up long regular expressions and even add comments for complex expressions:

IMAGE_REGEXP = /
  !\[.*\]      # The alt text of the Markdown.
  \(           # Open paren for image URL and optional title.
    .*?        # The image URL (the '?' makes it NOT greedy).
    (\ \".*")? # The optional title (here the '?' means limit to 0 or 1).
  \)           # Close paren.
/x

Notice that trailing x on the last line? It's called "free-spacing" mode and helps tremendously. In free-spacing mode spaces are ignored, and you can insert normal Ruby comments. Just be careful and escape your spaces (with \) or they will be completely ignored.

Summary

The process used above is a useful one to go through when building up your regular expression:

  1. Write the match as you require it literally.
  2. Escape any meta-characters with a /.
  3. Use wildcards and repetition meta-characters as required.
  4. Think about greediness.
  5. Try to stay sane!

Here's the full list of repetition meta-characters:

  • * - Zero or more times.
  • + - One or more times.
  • ? - Zero or one times (optional).
  • {n} - Exactly n times.
  • {n,} - n or more times.
  • {,m} - m or less times.
  • {n,m} - At least n and at most m times.

The docs at ruby-doc.org include quite good explanations and references for all the meta-characters.

Latest Articles by Our Team

Our expert team of designers and developers love what the do and enjoy sharing their knowledge with the world.

We Hire Only the Best

reinteractive is Australia’s largest dedicated Ruby on Rails development company. We don’t cut corners and we know what we are doing.

We are an organisation made up of amazing individuals and we take pride in our team. We are 100% remote work enabling us to choose the best talent no matter which part of the country they live in. reinteractive is dedicated to making it a great place for any developer to work.

Free Community Workshops

We created the Ruby on Rails InstallFest and Ruby on Rails Development Hub to help introduce new people to software development and to help existing developers hone their skills. These workshops provide invaluable mentorship to train developers, addressing key skills shortages in the industry. Software development is a great career choice for all ages and these events help you get started and skilled up.

  • Webinars

    Webinars

    Webinars are our online portal for tips, tricks and lessons learned in everything we do. Make the most of this free resource to help you become a better developer.

    Learn more about webinars

  • Installfest

    Installfest

    The Ruby on Rails Installfest includes a full setup of your development environment and step-by-step instructions on how to build your first app hosted on Heroku. Over 1,800 attendees to date and counting.

    Learn more about Installfest

  • Development Hub

    Development Hub

    The Ruby on Rails Development Hub is a monthly event where you will get the chance to spend time with our team and others in the community to improve and hone your Ruby on Rails skills.

    Learn more about Development Hub

Get the “reinteractive Review” Monthly Email