If your web app needs to import thousands of data on a regular basis, importing data normally (N+1) isn’t a good idea. It is important to speed up the process as much as possible and ensure that the load on the servers is lessened.
activerecord-import
We can use the activerecord-import gem. What this does is it reduces the SQL insert statements into 1 or as minimal as possible.
Normally, 100 authors with 100 books would mean 10,100 total SQL insert statements, 100 for authors and 100 books x 100 authors (10,000). Instead of that, you could use 2 SQL insert statement instead with this gem; 1 for authors and 1 for books.
Scenario
Let’s say we’re making a productivity web app, and we’d like to break down how much your company is spending on what work your employees are working on. So for this app, you’ll need to time-tracking data for the employees who are working on multiple projects. Turns out your company currently has a way to track time, such as Harvest. You’ll need to import data coming from this app.
Let’s try to import data.
In our scenario, we have 1,000 employees and they record 100 time entries per week. The employees might have some final edits to their time entries, so we import weekly during the weekends. Running this would be done by sidekiq and sidekiq-scheduler. Despite using great tools to offload and schedule our import tasks, we still need to ensure we’re using our resources with efficiency.
Now, we’re testing what we could use to import the data. I’ve benchmarked the data comparing the normal (N+1) importing and the activerecord-import way of importing.
Benchmark
We have created our sample app with 2 models, Employee
and TimeEntry
, wherein Employee
has many TimeEntry
.
The code below shows 1000 employees and 100 time entries per employee. For the N+1 import, there’ll be 101,000 SQL insert statements, compared to 2 insert statements for the activerecord-import; 1 for employees and 1 for time entries.
Please look at the code below to see how this is being done.
require 'benchmark'
module Benchmark
class Import
def call
Benchmark.bm do |x|
x.report('normal') { use_normal }
x.report('activerecord-import') { use_activerecord_import }
end
end
def use_normal
1000.times do |i|
employee = Employee.create(first_name: 'Employee', last_name: "Number #{i}")
100.times do
TimeEntry.create(employee:, duration: 60, notes: '1 hour work done')
end
end
end
def use_activerecord_import
import = []
1000.times do |i|
import << Employee.new(first_name: 'Employee', last_name: "Number #{i}")
end
Employee.import(import)
import = []
Employee.all.each do |employee|
100.times do
import << TimeEntry.new(employee:, duration: 60, notes: '1 hour work done')
end
end
TimeEntry.import(import)
end
end
end
Results
user | system | total | real | |
---|---|---|---|---|
normal | 62.396507 | 13.035225 | 75.431732 | ( 136.7433) |
activerecord-import | 14.384649 | 0.2068390 | 14.591488 | ( 16.49668) |
Normal (N+1) importing finished after more than 2 minutes while activerecord-import importing finished after 16.5 seconds. The latter took only 12.06% of the time it took the former to finish. So activerecord-import is 8.3x faster than the normal (N+1) way. You can see that using the activerecord-import gem would truly be beneficial for our app.
Other useful options
There are many other useful options with this gem. 2 options I generally use are on_duplicate_key_ignore and on_duplicate_key_update. They are usually written like this:
TimeEntry.import(import, on_duplicate_key_ignore: true)
TimeEntry.import(import, on_duplicate_key_update: {conflict_target: [:id], columns: [:duration]})
For the first one, whenever a record you’re importing has the same data as a record in your database, it will ignore the record. For the second one, whenever a record you’re importing has the same data as a record in your database, it will update that record, and using the columns key it will only touch the specified column/attribute. You can imagine the scenarios wherein you’ll need these options in your application like ensuring that the data you have in your db is updated with Harvest using the on_duplicate_key_update.
There are many more things you could do with this gem. You can check it out here.
Conclusion
Ruby on Rails presents us with various ways to solve problems and be more efficient with our code through gems. activerecord-import gem is one of those gems that can truly help with certain scenarios. Try using this gem as one of your tools to solve your problems. As always, happy coding!