Skip to content
PLAY VIDEO PLAY VIDEO PLAY VIDEO
By Adam Davies

Capturing matches in Ruby Regular Expressions (bonus)

In this post we follow up from part 1 and part 2 by looking at how to capture what was matched in Markdown image links:

markdown = <<-END
I have a graph showing incredible statistics.
You will be amazed by the clarity it brings!
Behold: ![Incredible Graph](/the_graph.png "Graph")

And yet another:
![Graph2](/other_graph.png)
END

Capturing the matches

In the previous post, the '?' was shown as dual purpose:

1) It can be used to mark a greedy repetition to be lazy: .*?, and 2) It can be used to mark a group as optional: (.*)?

IMAGE_REGEXP = /
  !\[.*\]      # The alt text of the Markdown.
  \(           # Open parentheses for the image URL and optional title.
    .*?        # The image URL (the '?' makes it NOT greedy).
    (\ \".*")? # The optional title (here the '?' means limit to 0 or 1),
  \)           # Close parentheses.
/x

In the complicated Markdown regular expression above we used grouping parentheses to apply the ? to the optional title. Another very useful feature for grouping matches is for capturing, meaning whatever strings we capture, we can extract for each match. Furthermore, the groups can be given names to make them easier to lookup.

First, let's apply grouping to allow us to capture each interesting piece of data:

IMAGE_REGEXP = /
  !\[(.*)\]       # The alt text of the Markdown.
  \(              # Open parentheses for the image URL and optional title.
    (.*?)         # The image URL (the '?' makes it NOT greedy).
    (\ \"(.*)\")? # The optional title (here the '?' means limit to 0 or 1).
  \)              # Close parentheses.
/x

Now we can scan for matches:

markdown.scan(IMAGE_REGEXP)
=> [["Incredible Graph", "/the_graph.png", " \"Graph\"", "Graph"], ["Graph2", "/other_graph.png", nil, nil]]

Oops, we have matched title twice because of the grouping we did earlier in order to make it optional. That group is returned by scan first and shows up with a leading space and quotes.

This is corrected by grouping without capturing, via the prefix ?::

IMAGE_REGEXP = /
  !\[(.*)\]         # The alt text of the Markdown.
  \(                # Open parentheses for image URL and optional title.
    (.*?)           # The image URL (the '?' makes it NOT greedy).
    (?:\ \"(.*)\")? # The image URL (the '?' makes it NOT greedy, and not capturing outer group due to ?: prefix).
  \)                # Close parentheses.
/x

Now we get:

markdown.scan(IMAGE_REGEXP)
=> [["Incredible Graph", "/the_graph.png", "Graph"], ["Graph2", "/other_graph.png", nil]]

Naming the captures

The above is useful, but we can do one more thing: name the captures. This is done in a similar way as marking groups that don't capture. In this case, we use a prefix of ?<my_name> to name the capture:

IMAGE_REGEXP = /
  !\[(?<alt_text>.*)\]      # The alt text of the Markdown.
  \(                        # Open parentheses for image URL and optional title.
    (?<url>.*?)             # The image URL.
    (?:\ \"(?<title>.*)\")? # The optional title.
  \)                        # Close parentheses.
/x

Now with these named captures we can get a more readable result using #match:

match = markdown.match(IMAGE_REGEXP)
=> #<MatchData "![Incredible Graph](/the_graph.png \"Graph\")"
               alt_text:"Incredible Graph"
               url:"/the_graph.png"
               title:"Graph">

match[:url]
=> "/the_graph.png"

You may have noticed that a call to #match only returns the first match found, since the return MatchData is like a Hash that contains captured matches. This is in contrast to #scan, which returns arrays of matches.

We can do the following little trick to combine these:

named_captures = IMAGE_REGEXP.names
=> ["alt_text", "url", "title"]

array_of_matches = markdown.scan(IMAGE_REGEXP)
=> [["Incredible Graph", "/the_graph.png", "Graph"], ["Graph2", "/other_graph.png", nil]]

array_of_matches.map {|match| Hash[named_captures.zip(match)] }
=> [{"alt_text"=>"Incredible Graph", "url"=>"/the_graph.png", "title"=>"Graph"},
    {"alt_text"=>"Graph2", "url"=>"/other_graph.png", "title"=>nil}]

StringScanner

As a last trick, we'll look at StringScanner. It's a useful class that provides a more object-oriented imperative style of matching, since it maintains state. With it, we can search for matches, then continue on from the last position.

Here's an example, iterating through the matches one at a time:

require 'strscan'

scanner = StringScanner.new(markdown)

while scanner.scan_until?(IMAGE_REGEXP)
  puts "We have #{scanner[:alt_text]}: #{scanner[:url]}"
end

# Output:
We have Incredible Graph: /the_graph.png
We have Graph2: /other_graph.png

Final thoughts

Regular expressions are tricky to learn, but don't be discouraged: even the most experienced developers need to look up their syntax every now and then. I hope these few posts have helped teach you the basics, or find the answer to that bug you've been banging your head on your desk over. Thanks for reading!

Latest Articles by Our Team

Our expert team of designers and developers love what the do and enjoy sharing their knowledge with the world.

We Hire Only the Best

reinteractive is Australia’s largest dedicated Ruby on Rails development company. We don’t cut corners and we know what we are doing.

We are an organisation made up of amazing individuals and we take pride in our team. We are 100% remote work enabling us to choose the best talent no matter which part of the country they live in. reinteractive is dedicated to making it a great place for any developer to work.

Free Community Workshops

We created the Ruby on Rails InstallFest and Ruby on Rails Development Hub to help introduce new people to software development and to help existing developers hone their skills. These workshops provide invaluable mentorship to train developers, addressing key skills shortages in the industry. Software development is a great career choice for all ages and these events help you get started and skilled up.

  • Webinars

    Webinars

    Webinars are our online portal for tips, tricks and lessons learned in everything we do. Make the most of this free resource to help you become a better developer.

    Learn more about webinars

  • Installfest

    Installfest

    The Ruby on Rails Installfest includes a full setup of your development environment and step-by-step instructions on how to build your first app hosted on Heroku. Over 1,800 attendees to date and counting.

    Learn more about Installfest

  • Development Hub

    Development Hub

    The Ruby on Rails Development Hub is a monthly event where you will get the chance to spend time with our team and others in the community to improve and hone your Ruby on Rails skills.

    Learn more about Development Hub

Get the “reinteractive Review” Monthly Email