Improving CSV processing code with laziness
Enumerator::Lazy
Recently, I have had to analyse a very large CSV file and look for some lines containing specific values. In this post, I will explain how the lazy enumerator let me write simpler code.
Enumerator::Lazy is still relatively new to many Rubyists including myself, but in short, the lazy enumerator lets us evaluate its elements as needed. This behaviour is quite different from the regular enumerator where the evaluation happens upfront, or eagerly. This lets us write simpler and still efficient CSV processing code using the enumerator methods.
Example Problem
Suppose we have a large CSV like this:
ID,SKU ID,NAME,AVAILABILITY,MANUFACTURER,LINK,IMAGE LINK "1","123ABC","Product 1","2","Foo Products","http://localhost/123abc","http://localhost/img/123abc.jpg" "2","23456","Shoes 1","5","Foobar Shoes","http://localhost/23456","http://localhost/img/23456.jpg" "3","123ABC-2","Product 2","0","Foo Products","http://localhost/123abc-2","http://localhost/img/123abc-2.jpg"
Let's pretend we had a lot more lines like these. Also, let's say we want to look for rows with "Foobar Shoes" as the "MANUFACTURER" in this CSV, and print its "NAME" with its "SKU ID", and we only need the first 5 occurrences of this. Although we could use other tools like grep
and awk
, we will focus on doing it with Ruby.
In Ruby, we might do this:
require 'csv' rows = [] CSV.new(File.open('input.csv','r'), :headers => true).each do |row| rows << "#{row['SKU ID']} - #{row['NAME']}" if row['MANUFACTURER'] == "Foobar Shoes" break if rows.size >= 5 end p rows
This is okay and works fine, but it would be easier to read if we expressed this in select
, map
, and take
.
We could do it this way:
require 'csv' rows = CSV.new(File.open('input.csv','r'), :headers => true).select do |row| row['MANUFACTURER'] == "Foobar Shoes" end.map do |row| "#{row['SKU ID']} - #{row['NAME']}" end.take(5) p rows
Although this is not any shorter, each step is now in a spearate block, which makes it easier to read and maintain. However, there is a problem; it is eager and loads every row in memory. So, it takes a lot more time to run, and takes up more memory than the previous example. This is a serious problem if your CSV file contains many rows, like 300,000 lines.
Introducing Laziness
This is exactly the type of problem we should be using the lazy enumerator for. In order to be lazy, all we have to do is to call lazy
on the CSV object. By calling lazy
on the CSV object here, we can get a lazy enumerator and use the lazy version of map
and select
. We also have to call force
at the end, since it remains unevaluated without it.
require 'csv' rows = CSV.new(File.open('input.csv','r'), :headers => true).lazy.select do |row| row['MANUFACTURER'] == "Foobar Shoes" end.map do |row| "#{row['SKU ID']} - #{row['NAME']}" end.take(5).force p rows
The only differences are calling lazy
on the CSV object and force
at the end, but this version does not load every row in memory at the same time, and it runs much more efficiently than the last example.
Final Thoughts
The concept of laziness is more commonly seen in functional programming, and may be unfamiliar to some Rubyists, but if we are using Ruby 2.0 or later, we should remember that it exists, and use it when it is appropriate.
Latest Articles by Our Team
Our expert team of designers and developers love what the do and enjoy sharing their knowledge with the world.
-
No app left behind: Upgrade your application to Ruby 3.0 and s...
-
A look forward from 2020
-
Testing Rails applications on real mobile devices (both design...
We Hire Only the Best
reinteractive is Australia’s largest dedicated Ruby on Rails development company. We don’t cut corners and we know what we are doing.
We are an organisation made up of amazing individuals and we take pride in our team. We are 100% remote work enabling us to choose the best talent no matter which part of the country they live in. reinteractive is dedicated to making it a great place for any developer to work.
Free Community Workshops
We created the Ruby on Rails InstallFest and Ruby on Rails Development Hub to help introduce new people to software development and to help existing developers hone their skills. These workshops provide invaluable mentorship to train developers, addressing key skills shortages in the industry. Software development is a great career choice for all ages and these events help you get started and skilled up.