Regex Improvements in the New Ruby 3.2

Last Christmas day (25/12/2022), the Ruby 3.2.0 version was released. I’m highlighting improvements to regular expressions and ReDoS (Regular expression Denial of Service) attacks for this blog post. However, you can see the complete list of features and performance advances announced here.

Background

Let me start with an introduction to the issue that several languages are dealing with and trying to address. The ReDos attack is an Denial of Service attack that exploits the Regular Expression vulnerability to stop your service and make it unavailable for your end users.

In this context, two things are essential to know. Firstly, regex is a potent tool for matching, searching and manipulating text strings. So your code repositories and the code repositories of your favourite gems are full of them. And also, the web is Regex-based, which makes the probabilities of attacks massive.

regex-based web

Secondly, we need to know why regex brings this risk. Giving you an example, suppose you have an inefficient regex used to match an extensive input. Since there may be many possible following states, the algorithm implementation can work very slowly because of its nature. So an attacker can explore an extreme situation like that and put your web application down. You can see a more detailed explanation here.

Improvements in Ruby 3.2.0 (25/12/2022)

The Ruby development community has been actively working to address this issue in the language. As we can see, they have introduced two improvements in this last version that significantly mitigate ReDoS.

Regex Matching Algorithm

First thing, they improved the matching algorithm significantly using a cache-based approach. Their experiments showed that for 90% of regexes, the matching time is linear to the input size. So after upgrading to ruby 3.2.0, most of the regexes in your application will match in a safe amount of time for the ReDoS vulnerability. Take a look at this example:


# This match takes 10 sec. in Ruby 3.1, and 0.003 sec. in Ruby 3.2
/^a*b?a*$/ =~ "a" * 50000 + "x"

This new matching implementation may consume memory proportional to the input length, but this memory allocation is delayed and does not represent a problem.

Regex Timeout

The second change was the timeout feature. You can configure a timeout value as a fallback measure for the 10% remaining cases where the optimisation above is not applied. These remaining cases are:

Regexp using some extensions (back-reference and subexpression call, look-around, atomic, absent operators)
A bounded or fixed times repetition nesting in another repetition (e.g. /(a{2,3})*/)
A too-large bounded or fixed times repetition (e.g. /(a|b){100000,200000}/)

If you want a global timeout configuration, you can use the code below:


Regexp.timeout = 1.0

/^a*b?a*()\1$/ =~ "a" * 50000 + "x"
#=> Regexp::TimeoutError is raised in one second

If you want to use different timeout settings for a specific regex, you can use like in the code below:


# This regex has a specific timeout
special_regex = Regexp.new('^a*b?a*()\1$', timeout: 2)

special_regex =~ "a" * 50000 + "x"
#=> Regexp::TimeoutError is raised in two seconds

Conclusion

I hope you found it informative and helpful to know about these last changes in Ruby’s language version 3.2.

The matching algorithm changes, and the timeout feature substantially address ReDoS (Regular Expression Denial of Service) attacks.

Keeping your software updated is an excellent practice to ensure your applications’ security and performance.