Tail Call Optimisation in Ruby (MRI)

I have recently discovered that MRI implements Tail Call Optimisation (TCO). It is a common optimisation technique used in many programming languages. Let me show you a short summary of what I found about the TCO in MRI.

Background: What is Tail Call Optimisation?

Tail Call Optimisation refers to a form of optimisation, or program transformation commonly used in programming languages with optimising-compilers. It optimises tail-recursive functions by transforming (or rewriting) them in loops, which avoids the overhead of recursive function calls. As the name suggests, it can only be applied to tail-recursive functions. A tail-recursive function is a function in which the final action to be performed is the call to the function itself.

For example, here we have a non-tail-recursive version and a tail-recursive version of reverse which reverses an array:

```ruby def arr_reverse(array) return [] if array.empty? return arr_reverse(array[1..-1]) + [array[0]] end

def arr_reverse_tail_recursive(array, accumulator = []) return accumulator if array.empty? return arr_reverse_tail_recursive(array[1..-1], [array[0]] + accumulator) end ```

They both return reversed arrays:

2.1.2 :029 > arr_reverse([1,2,3,4]) => [4, 3, 2, 1] 2.1.2 :030 > arr_reverse_tail_recursive((1..10).to_a) => [10, 9, 8, 7, 6, 5, 4, 3, 2, 1] 2.1.2 :031 > arr_reverse_tail_recursive(arr_reverse [1,2,3]) => [1, 2, 3]

The first example makes the recursive call, takes the result, appends the element, and finally returns the resulting array. The second example simply returns the return value from the call to itself. The first example is not tail-recursive because the final action is not the call to itself. The second example is tail-recursive because the final action is the recursive call to itself. The second version is, of course, the version that can have TCO applied.

Factorial Example

Consider this example for calculating the factorial:

```ruby # fact_simple.rb

def fact(n) raise InvalidArgument, “negative input given” if n < 0 return 1 if n == 0 return fact(n-1) * n end ```

This is a simple recursive implementation. Let us save this as fact_simple.rb.

2.1.2 :001 > require './fact_simple' => true 2.1.2 :002 > fact(10) => 3628800 2.1.2 :003 > fact(5) => 120

One problem here is that when we run it with sufficiently large input, we will get an error:

2.1.2 :007 > fact(10000) SystemStackError: stack level too deep

This happens because each recursive call grows our call stack with a new stack frame, and our runtime environment is running out of space.

We can rewrite this function in a tail-recursive manner, and execute with the TCO to solve the problem.

Let us start with rewriting the code first. Here, a tail-recursive version might look like this:

```ruby # fact_tail_call.rb

def fact(n, accumulator = 1) raise InvalidArgument, “negative input given” if n < 0 return accumulator if n == 0 return fact(n-1, accumulator * n) end ```

Let us save this as fact_tail_call.rb. This is a tail-recursive version of the factorial. However, just running this will result in no improvement.

2.1.2 :001 > require './fact_tail_call' => true 2.1.2 :002 > fact(10000) SystemStackError: stack level too deep

This is because the TCO is not enabled by default in MRI. We can enable the TCO by doing this instead:

2.1.2 :001 > program = RubyVM::InstructionSequence.new File.open('./fact_tail_call.rb').read, nil, nil, nil, tailcall_optimization: true, trace_instruction: false => <RubyVM::InstructionSequence:<compiled>@<compiled>> 2.1.2 :002 > program.eval => :fact

Now we have the same function with the TCO enabled. At this point, we can run fact(10000) successfully. However, since it will output a very large number, let us just check the number of digits instead:

2.1.2 :003 > fact(10).to_s.length => 7 2.1.2 :004 > fact(100).to_s.length => 158 2.1.2 :005 > fact(1000).to_s.length => 2568 2.1.2 :006 > fact(10000).to_s.length => 35660 2.1.2 :007 > fact(100000).to_s.length => 456574

The last one takes a while but still does not result in an error.

Possible Problem

The TCO is a great tool if it is used appropriately. However, we have to remember that it transforms the program, and it can affect the semantics.

Consider this example where we implement a simple linked list using Hash, and we have a simple function to print all values. It will exit with an error message if the recursive calls is not terminating:

```ruby node1 = { value: 1 } node2 = { value: 2, next: node1 } node1[:next] = node2 # looping list

def print_values(node) puts node[:value] print_values(node[:next]) if node[:next] end

begin print_values(node1) rescue SystemStackError puts “Your list is too long or it is looping somewhere” end ```

If we execute this without the TCO, it will terminate with the message "Your list is too long or it is looping somewhere".

... 2 1 Your list is too long or it is looping somewhere => true 2.1.2 :019 >

However, with the TCO, it will not terminate. In this example, the execution relies on the stack level too deep error. This is not a good programming style but if we happened to be working on code like this, the TCO cannot be applied. This is an unrealistic example, and in real life, it will probably be more subtle than this.

Final Thoughts

MRI supports tail call optimisation, but it is not enabled by default. While we may not need this so often, it still is a useful thing to know as a programmer in case we have to design a linked list or tree traversal algorithm for potentially very large input data.