Blog tutorial-series-for-experienced-rails-developers

A Closer Look at N + 1

Paolo-_-Veñegas
Paolo Veñegas
August 8, 2023

Introduction

N + 1 queries are a common occurrence when using ORMs like Rails’ ActiveRecord, and they make good candidates for optimisation. These issues can easily be detected by tools like bullet, but fixing them is not always straightforward. In this article, we look into some scenarios and how solutions compare.

The Classic N + 1

A common scenario might be an index page that lists records of a certain type, plus additional information from associated records. Say for example we have a forum application with a User model that has an associated AccessLevel:


# app/models/user.rb
class User < ApplicationRecord
  belongs_to :access_level
end

# app/models/access_level.rb
class AccessLevel < ApplicationRecord
  has_many :users
end

The users controller’s index action and its corresponding view template might look like this:


# app/controllers/users_controller.rb
class UsersController < ApplicationController
  def index
    @users = User.all.paginate(page: params[:page])
  end
end

<% # app/views/users/index.html.erb %>
...
<% @users.each do |user| %>
  <%= tag.div "Name: #{user.name}" %>
  <%= tag.div "Access Level: #{user.access_level.name}" %>
<% end %>
...

  User Load (1.2ms)  SELECT "users".* FROM "users" LIMIT $1  [["LIMIT", 50]]
  AccessLevel Load (0.7ms)  SELECT "access_levels".* FROM "access_levels" 
WHERE "access_levels"."id" = $1 LIMIT $2  [["id", 302310379], ["LIMIT", 1]]
  AccessLevel Load (0.3ms)  SELECT "access_levels".* FROM "access_levels" 
WHERE "access_levels"."id" = $1 LIMIT $2  [["id", 302310379], ["LIMIT", 1]]
  AccessLevel Load (0.1ms)  SELECT "access_levels".* FROM "access_levels" 
WHERE "access_levels"."id" = $1 LIMIT $2  [["id", 141366183], ["LIMIT", 1]]
  AccessLevel Load (0.1ms)  SELECT "access_levels".* FROM "access_levels" 
WHERE "access_levels"."id" = $1 LIMIT $2  [["id", 283116315], ["LIMIT", 1]]
  AccessLevel Load (0.1ms)  SELECT "access_levels".* FROM "access_levels" 
WHERE "access_levels"."id" = $1 LIMIT $2  [["id", 302310379], ["LIMIT", 1]]
  ...

In the view, the app has to fetch each user’s access level separately, so if there are 50 records to a page, it ends up making 51 database queries—one for the users, and one for each associated access level. Even with paging enabled, this is suboptimal.

includes to The Rescue

Fortunately, we can easily use includes when fetching users for this use case:


logger.debug('with includes')
User.all.limit(50).includes(:access_level).each do |user|
  user.access_level.name
end

Looking at the debug log, we can clearly see that fewer queries are generated:


with includes
  User Load (0.6ms)  SELECT "users".* FROM "users" LIMIT $1  [["LIMIT", 50]]
  AccessLevel Load (0.2ms)  SELECT "access_levels".* FROM "access_levels" 
WHERE "access_levels"."id" IN ($1, $2, $3)  [["id", 302310379], ["id", 141366183], 
["id", 283116315]]

This solution works by eager-loading related access levels when users are fetched, so the app only makes one additional query for all of them instead of 50. Much better, right?

Dealing with Aggregations

But how about if we need to show aggregated information from multiple associated records? For example, say a User has many Post records and we want everyone’s total post count:


# app/models/user.rb
class User < ApplicationRecord
  has_many :posts
end

# app/models/post.rb
class Post < ApplicationRecord
  belongs_to :user
end

Let’s see how this performs with includes:


logger.debug('with includes')
User.all.limit(50).includes(:posts).each do |user|
  user.posts.count
end

with includes
  User Load (2.1ms)  SELECT "users".* FROM "users" LIMIT $1  [["LIMIT", 50]]
  Post Load (2.1ms)  SELECT "posts".* FROM "posts" 
WHERE "posts"."user_id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, 
$13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, 
$28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, 
$43, $44, $45, $46, $47, $48, $49, $50)
ORDER BY "posts"."created_at" ASC  [["user_id", 1066737858], 
["user_id", 1066737857], ["user_id", 1066737853], ["user_id", 1066737852], 
["user_id", 1066737851], ["user_id", 1066737850], ["user_id", 1066737849], 
["user_id", 1066737848], ["user_id", 1066737847], ["user_id", 1066737846], 
["user_id", 1066737845], ["user_id", 1066737844], ["user_id", 1066737843], 
["user_id", 1066737842], ["user_id", 1066737841], ["user_id", 1066737840], 
["user_id", 1066737839], ["user_id", 1066737838], ["user_id", 1066737837], 
["user_id", 1066737836], ["user_id", 1066737835], ["user_id", 1066737834], 
["user_id", 1066737833], ["user_id", 1066737832], ["user_id", 1066737831], 
["user_id", 1066737830], ["user_id", 1066737829], ["user_id", 1066737828], 
["user_id", 1066737827], ["user_id", 1066737826], ["user_id", 1066737825], 
["user_id", 1066737824], ["user_id", 1066737823], ["user_id", 1066737822], 
["user_id", 1066737821], ["user_id", 1066737820], ["user_id", 1066737819], 
["user_id", 1066737818], ["user_id", 1066737817], ["user_id", 1066737816], 
["user_id", 1066737815], ["user_id", 1066737814], ["user_id", 1066737813], 
["user_id", 1066737812], ["user_id", 1066737811], ["user_id", 1066737810], 
["user_id", 1066737809], ["user_id", 1066737808], ["user_id", 1066737807], 
["user_id", 1066737806]]
  Post Count (0.4ms)  SELECT COUNT(*) FROM "posts" 
WHERE "posts"."user_id" = $1  [["user_id", 1066737858]]
  Post Count (0.2ms)  SELECT COUNT(*) FROM "posts" 
WHERE "posts"."user_id" = $1  [["user_id", 1066737857]]
  Post Count (0.2ms)  SELECT COUNT(*) FROM "posts" 
WHERE "posts"."user_id" = $1  [["user_id", 1066737853]]
  Post Count (0.2ms)  SELECT COUNT(*) FROM "posts" 
WHERE "posts"."user_id" = $1  [["user_id", 1066737852]]
  Post Count (0.2ms)  SELECT COUNT(*) FROM "posts" 
WHERE "posts"."user_id" = $1  [["user_id", 1066737851]]
  ...

So what’s happening here? As it turns out, eager-loading with includes doesn’t help when the query involves aggregate functions like count. ActiveRecord can’t anticipate aggregations, so each call to count does another computation in the database engine, again resulting in an N + 1 scenario.

Since the associations are already in memory however, you can simply use #size, which performs the computation in Ruby. This is discussed more in ActiveRecord Optimisation: Utilising .exists?, .any? and .size.


User.all.limit(50).includes(:posts).each do |user|
  user.posts.size
end

For many use cases this will be good enough, but although it’s faster than the N + 1 scenario, it’s still loading a lot of data that ends up unused just to do a simple count.

SQL-fu Workarounds

Rails’ built-in tools have limits, but we still have options for optimisation if you don’t mind some manual SQL.

Using JOIN and GROUP BY


# app/models/user.rb
class User < ApplicationRecord
  has_many :posts

  def self.with_post_count
    left_outer_joins(:posts)
      .group('users.id')
      .select('users.*, COUNT(posts.user_id) AS post_count')
  end
end

We can manually eager-load what we need with this with_post_count class method. Basically it performs a join with the posts table, groups posts with the same user_id, counts posts within each group, and makes the result available as post_count. Using left_outer_joins instead of joins (which performs an INNER JOIN by default) means we include users that don’t have associated posts.

Class methods can be invoked like ActiveRecord scopes. Note that instead of user.posts.count, we just call user.post_count directly:


logger.debug('with join')
User.all.limit(50).with_post_count.each do |user|
  user.post_count
end

with join
  User Load (13.3ms)  SELECT users.*, COUNT(posts.user_id) 
AS post_count FROM "users" LEFT OUTER JOIN "posts" 
ON "posts"."user_id" = "users"."id" GROUP BY "users"."id" 
ORDER BY "users"."created_at" DESC LIMIT $1 [["LIMIT", 50]]

That’s looking way better. Now we’re down to a single query!

Caveats

Some care is needed when composing queries like this. For one thing, the manual select provides post_count like an attribute, but it only exists on this relation and cannot be used like normal attributes, such as in where, having, or order clauses. Nor can it be assigned a value like virtual attributes.

Also, class methods on ActiveRecord classes may behave similarly to scopes, but there are key differences. Scopes are guaranteed to return an ActiveRecord::Relation on which further scopes can be invoked, so scope calls can be chained safely. This isn’t always true for class methods.

Let’s say we want to count user DMs in addition to posts.


# app/models/user.rb
class User < ApplicationRecord
  has_many :posts
  has_many :direct_messages

  def self.with_post_count
    left_outer_joins(:posts)
      .group('users.id')
      .select('users.*, COUNT(posts.user_id) AS post_count')
  end

  def self.with_message_count
    left_outer_joins(:direct_messages)
      .group('users.id')
      .select('users.*, COUNT(direct_messages.user_id) AS message_count')
  end
end

When chained, the query looks like this:


logger.debug('with chained joins')
User.all.limit(50).with_post_count.with_message_count.each do |user|
  user.post_count
  user.message_count
end

with chained joins
  User Load (28.5ms)  SELECT users.*, COUNT(posts.user_id) 
AS post_count, users.*, COUNT(direct_messages.user_id) 
AS message_count FROM "users" LEFT OUTER JOIN "posts" 
ON "posts"."user_id" = "users"."id" LEFT OUTER JOIN "direct_messages" 
ON "direct_messages"."user_id" = "users"."id" GROUP BY "users"."id" 
LIMIT $1  [["LIMIT", 50]]

We end up with a single query, but although ActiveRecord and the database accept it without errors, it gives incorrect results. In this case, the generated query ends up with two COUNT calls on separately joined tables but a single GROUP BY. Each COUNT is effectively called on the Cartesian product of the joined tables, giving much higher results. Let’s look at the results for a single user:


User.where(id: 5623)
    .with_post_count.with_message_count.each do |user|
  puts 'Expected:'
  puts user.posts.count
  puts user.direct_messages.count
  puts
  puts 'Results:'
  puts user.post_count
  puts user.message_count
end

Output:


Expected:
20
223

Results:
4460
4460

Using Subqueries

The class methods above work correctly when called individually. The problem is in the way their results are combined when chained.

We can write another query that reuses our class methods in subqueries with their own GROUP BY and COUNT, and then joins results afterwards.


# app/models/user.rb
class User < ApplicationRecord
  def self.with_post_and_message_count
    select('users.*, post_count, message_count')
      .joins('LEFT OUTER JOIN (%s) AS a ON users.id = a.id' %
             with_post_count.to_sql)
      .joins('LEFT OUTER JOIN (%s) AS b ON users.id = b.id' %
             with_message_count.to_sql)
  end
end

Now let’s run the same test:


User.where(id: 5623)
    .with_post_and_message_count.each do |user|
  puts 'Expected:'
  puts user.posts.count
  puts user.direct_messages.count
  puts
  puts 'Results:'
  puts user.post_count
  puts user.message_count
end

Expected:
20
223

Results:
20
223

This gives us correct results, but it’s not as intuitive as chaining, and we end up with yet another unchainable class method.

If we really want safe chaining and don’t mind maintaining SQL, we can do away with joins and just use subqueries for the class methods in the first place:


Expected:
# app/models/user.rb
class User < ApplicationRecord
  def self.with_subquery_post_count
    select <<~SQL
      users.*, (
        SELECT COUNT(*) FROM posts
        WHERE posts.user_id = users.id
      ) AS post_count
    SQL
  end

  def self.with_subquery_message_count
    select <<~SQL
      users.*, (
        SELECT COUNT(*) FROM direct_messages
        WHERE direct_messages.user_id = users.id
      ) AS message_count
    SQL
  end
end

Writing the methods this way allows us to chain them directly and still produce a single database call:


logger.debug('with subqueries')
User.all.limit(50)
    .with_subquery_post_count
    .with_subquery_message_count.each do |user|
  user.post_count
  user.message_count
end

with subqueries
  User Load (6.0ms)  SELECT users.*,
(
  SELECT COUNT(*) FROM posts
  WHERE posts.user_id = users.id
) AS post_count
, users.*,
(
  SELECT COUNT(*) FROM direct_messages
  WHERE direct_messages.user_id = users.id
) AS message_count
FROM "users" ORDER BY "users"."created_at" DESC LIMIT $1  [["LIMIT", 50]]

Conclusion and Benchmarks

Each of the solutions above offers pros and cons. I encourage readers to explore the tradeoffs and choose an approach that works for your use case and constraints.

For those who prioritise performance, here’s a comparison using Ruby’s Benchmark library and the benchmark-memory gem:


def run_benchmarks(bm)
  bm.report('N + 1') do
    User.all.each do |user|
      user.posts.count
      user.direct_messages.count
    end
  end

  bm.report('includes and size') do
    User.all.includes(:posts, :direct_messages).each do |user|
      user.posts.size
      user.direct_messages.size
    end
  end

  bm.report('with joins') do
    User.all.with_post_and_message_count.each do |user|
      user.post_count
      user.message_count
    end
  end

  bm.report('with subqueries') do
    User.all
        .with_subquery_post_count
        .with_subquery_message_count.each do |user|
      user.post_count
      user.message_count
    end
  end
end

puts 'Runtime'
Benchmark.bm(20) { |bm| run_benchmarks(bm) }

puts 'Memory use'
Benchmark.memory { |bm| run_benchmarks(bm) }

Runtime
                           user     system      total        real
N + 1                  1.087636   0.119499   1.207135 (  1.892550)
includes and size      0.241659   0.010494   0.252153 (  0.288259)
with joins             0.036628   0.001048   0.037676 (  0.056034)
with subqueries        0.062692   0.002064   0.064756 (  0.092025)

Memory use
               N + 1    94.345M memsize (   398.231k retained)
                         1.424M objects (     3.149k retained)
                        50.000  strings (    50.000  retained)

   includes and size    33.337M memsize (    14.207M retained)
                       413.804k objects (   108.024k retained)
                        50.000  strings (    50.000  retained)

          with joins     6.378M memsize (     4.659M retained)
                        55.897k objects (    30.903k retained)
                        50.000  strings (    50.000  retained)

     with subqueries     8.144M memsize (     4.642M retained)
                        68.278k objects (    30.776k retained)
                        50.000  strings (    50.000  retained)

As you can see, eager-loading and using #size is much faster than the N + 1 scenario, and computing counts in the database performs even better by an order of magnitude and uses much less memory. Optimising queries for specific use cases can result in significant performance gains, which directly translate to lower operational costs.

That’s it for now, happy hacking!