Blog

Importing Data Faster with activerecord-import

Miko Dagatan
Miko Dagatan
November 10, 2023

If your web app needs to import thousands of data on a regular basis, importing data normally (N+1) isn’t a good idea. It is important to speed up the process as much as possible and ensure that the load on the servers is lessened.

activerecord-import

We can use the activerecord-import gem. What this does is it reduces the SQL insert statements into 1 or as minimal as possible.

Normally, 100 authors with 100 books would mean 10,100 total SQL insert statements, 100 for authors and 100 books x 100 authors (10,000). Instead of that, you could use 2 SQL insert statement instead with this gem; 1 for authors and 1 for books.

Scenario

Let’s say we’re making a productivity web app, and we’d like to break down how much your company is spending on what work your employees are working on. So for this app, you’ll need to time-tracking data for the employees who are working on multiple projects. Turns out your company currently has a way to track time, such as Harvest. You’ll need to import data coming from this app.

Let’s try to import data.

In our scenario, we have 1,000 employees and they record 100 time entries per week. The employees might have some final edits to their time entries, so we import weekly during the weekends. Running this would be done by sidekiq and sidekiq-scheduler. Despite using great tools to offload and schedule our import tasks, we still need to ensure we’re using our resources with efficiency.

Now, we’re testing what we could use to import the data. I’ve benchmarked the data comparing the normal (N+1) importing and the activerecord-import way of importing.

Benchmark

We have created our sample app with 2 models, Employee and TimeEntry, wherein Employee has many TimeEntry.

The code below shows 1000 employees and 100 time entries per employee. For the N+1 import, there’ll be 101,000 SQL insert statements, compared to 2 insert statements for the activerecord-import; 1 for employees and 1 for time entries.

Please look at the code below to see how this is being done.


require 'benchmark'

module Benchmark
  class Import
    def call
      Benchmark.bm do |x|
        x.report('normal') { use_normal }
        x.report('activerecord-import') { use_activerecord_import }
      end
    end

    def use_normal
      1000.times do |i|
        employee = Employee.create(first_name: 'Employee', last_name: "Number #{i}")
        100.times do
          TimeEntry.create(employee:, duration: 60, notes: '1 hour work done')
        end
      end
    end

    def use_activerecord_import
      import = []
      1000.times do |i|
        import << Employee.new(first_name: 'Employee', last_name: "Number #{i}")
      end
      Employee.import(import)

      import = []
      Employee.all.each do |employee|
        100.times do
          import << TimeEntry.new(employee:, duration: 60, notes: '1 hour work done')
        end
      end

      TimeEntry.import(import)
    end
  end
end

Results

  user system total real
normal 62.396507 13.035225 75.431732 ( 136.7433)
activerecord-import 14.384649 0.2068390 14.591488 ( 16.49668)


Normal (N+1) importing finished after more than 2 minutes while activerecord-import importing finished after 16.5 seconds. The latter took only 12.06% of the time it took the former to finish. So activerecord-import is 8.3x faster than the normal (N+1) way. You can see that using the activerecord-import gem would truly be beneficial for our app.

Other useful options

There are many other useful options with this gem. 2 options I generally use are on_duplicate_key_ignore and on_duplicate_key_update. They are usually written like this:


TimeEntry.import(import, on_duplicate_key_ignore: true)

TimeEntry.import(import, on_duplicate_key_update: {conflict_target: [:id], columns: [:duration]})

For the first one, whenever a record you’re importing has the same data as a record in your database, it will ignore the record. For the second one, whenever a record you’re importing has the same data as a record in your database, it will update that record, and using the columns key it will only touch the specified column/attribute. You can imagine the scenarios wherein you’ll need these options in your application like ensuring that the data you have in your db is updated with Harvest using the on_duplicate_key_update.

There are many more things you could do with this gem. You can check it out here.

Conclusion

Ruby on Rails presents us with various ways to solve problems and be more efficient with our code through gems. activerecord-import gem is one of those gems that can truly help with certain scenarios. Try using this gem as one of your tools to solve your problems. As always, happy coding!