Skip to content
By Yuji Yokoo

Improving CSV processing code with laziness

Enumerator::Lazy

Recently, I have had to analyse a very large CSV file and look for some lines containing specific values. In this post, I will explain how the lazy enumerator let me write simpler code.

Enumerator::Lazy is still relatively new to many Rubyists including myself, but in short, the lazy enumerator lets us evaluate its elements as needed. This behaviour is quite different from the regular enumerator where the evaluation happens upfront, or eagerly. This lets us write simpler and still efficient CSV processing code using the enumerator methods.

Example Problem

Suppose we have a large CSV like this:

ID,SKU ID,NAME,AVAILABILITY,MANUFACTURER,LINK,IMAGE LINK
"1","123ABC","Product 1","2","Foo Products","http://localhost/123abc","http://localhost/img/123abc.jpg"
"2","23456","Shoes 1","5","Foobar Shoes","http://localhost/23456","http://localhost/img/23456.jpg"
"3","123ABC-2","Product 2","0","Foo Products","http://localhost/123abc-2","http://localhost/img/123abc-2.jpg"

Let's pretend we had a lot more lines like these. Also, let's say we want to look for rows with "Foobar Shoes" as the "MANUFACTURER" in this CSV, and print its "NAME" with its "SKU ID", and we only need the first 5 occurrences of this. Although we could use other tools like grep and awk, we will focus on doing it with Ruby.

In Ruby, we might do this:

require 'csv'

rows = []
CSV.new(File.open('input.csv','r'), :headers => true).each do |row|
  rows << "#{row['SKU ID']} - #{row['NAME']}" if row['MANUFACTURER'] == "Foobar Shoes"
  break if rows.size >= 5
end

p rows

This is okay and works fine, but it would be easier to read if we expressed this in select, map, and take.

We could do it this way:

require 'csv'

rows = CSV.new(File.open('input.csv','r'), :headers => true).select do |row|
  row['MANUFACTURER'] == "Foobar Shoes"
end.map do |row|
  "#{row['SKU ID']} - #{row['NAME']}"
end.take(5)

p rows

Although this is not any shorter, each step is now in a spearate block, which makes it easier to read and maintain. However, there is a problem; it is eager and loads every row in memory. So, it takes a lot more time to run, and takes up more memory than the previous example. This is a serious problem if your CSV file contains many rows, like 300,000 lines.

Introducing Laziness

This is exactly the type of problem we should be using the lazy enumerator for. In order to be lazy, all we have to do is to call lazy on the CSV object. By calling lazy on the CSV object here, we can get a lazy enumerator and use the lazy version of map and select. We also have to call force at the end, since it remains unevaluated without it.

require 'csv'

rows = CSV.new(File.open('input.csv','r'), :headers => true).lazy.select do |row|
  row['MANUFACTURER'] == "Foobar Shoes"
end.map do |row|
  "#{row['SKU ID']} - #{row['NAME']}"
end.take(5).force

p rows

The only differences are calling lazy on the CSV object and force at the end, but this version does not load every row in memory at the same time, and it runs much more efficiently than the last example.

Final Thoughts

The concept of laziness is more commonly seen in functional programming, and may be unfamiliar to some Rubyists, but if we are using Ruby 2.0 or later, we should remember that it exists, and use it when it is appropriate.

Latest Articles by Our Team

Our expert team of designers and developers love what the do and enjoy sharing their knowledge with the world.

We Hire Only the Best

reinteractive is Australia’s largest dedicated Ruby on Rails development company. We don’t cut corners and we know what we are doing.

We are an organisation made up of amazing individuals and we take pride in our team. We are 100% remote work enabling us to choose the best talent no matter which part of the country they live in. reinteractive is dedicated to making it a great place for any developer to work.

Free Community Workshops

We created the Ruby on Rails InstallFest and Ruby on Rails Development Hub to help introduce new people to software development and to help existing developers hone their skills. These workshops provide invaluable mentorship to train developers, addressing key skills shortages in the industry. Software development is a great career choice for all ages and these events help you get started and skilled up.

  • Webinars

    Webinars

    Webinars are our online portal for tips, tricks and lessons learned in everything we do. Make the most of this free resource to help you become a better developer.

    Learn more about webinars

  • Installfest

    Installfest

    The Ruby on Rails Installfest includes a full setup of your development environment and step-by-step instructions on how to build your first app hosted on Heroku. Over 1,800 attendees to date and counting.

    Learn more about Installfest

  • Development Hub

    Development Hub

    The Ruby on Rails Development Hub is a monthly event where you will get the chance to spend time with our team and others in the community to improve and hone your Ruby on Rails skills.

    Learn more about Development Hub

Get the “reinteractive Review” Monthly Email