Introduction
N + 1 queries are a common occurrence when using ORMs like Rails’ ActiveRecord, and they make good candidates for optimisation. These issues can easily be detected by tools like bullet, but fixing them is not always straightforward. In this article, we look into some scenarios and how solutions compare.
The Classic N + 1
A common scenario might be an index page that lists records of a certain type, plus additional information from associated records. Say for example we have a forum application with a User
model that has an associated AccessLevel
:
# app/models/user.rb
class User < ApplicationRecord
belongs_to :access_level
end
# app/models/access_level.rb
class AccessLevel < ApplicationRecord
has_many :users
end
The users controller’s index
action and its corresponding view template might look like this:
# app/controllers/users_controller.rb
class UsersController < ApplicationController
def index
@users = User.all.paginate(page: params[:page])
end
end
<% # app/views/users/index.html.erb %>
...
<% @users.each do |user| %>
<%= tag.div "Name: #{user.name}" %>
<%= tag.div "Access Level: #{user.access_level.name}" %>
<% end %>
...
User Load (1.2ms) SELECT "users".* FROM "users" LIMIT $1 [["LIMIT", 50]]
AccessLevel Load (0.7ms) SELECT "access_levels".* FROM "access_levels"
WHERE "access_levels"."id" = $1 LIMIT $2 [["id", 302310379], ["LIMIT", 1]]
AccessLevel Load (0.3ms) SELECT "access_levels".* FROM "access_levels"
WHERE "access_levels"."id" = $1 LIMIT $2 [["id", 302310379], ["LIMIT", 1]]
AccessLevel Load (0.1ms) SELECT "access_levels".* FROM "access_levels"
WHERE "access_levels"."id" = $1 LIMIT $2 [["id", 141366183], ["LIMIT", 1]]
AccessLevel Load (0.1ms) SELECT "access_levels".* FROM "access_levels"
WHERE "access_levels"."id" = $1 LIMIT $2 [["id", 283116315], ["LIMIT", 1]]
AccessLevel Load (0.1ms) SELECT "access_levels".* FROM "access_levels"
WHERE "access_levels"."id" = $1 LIMIT $2 [["id", 302310379], ["LIMIT", 1]]
...
In the view, the app has to fetch each user’s access level separately, so if there are 50 records to a page, it ends up making 51 database queries—one for the users, and one for each associated access level. Even with paging enabled, this is suboptimal.
includes
to The Rescue
Fortunately, we can easily use includes
when fetching users for this use case:
logger.debug('with includes')
User.all.limit(50).includes(:access_level).each do |user|
user.access_level.name
end
Looking at the debug log, we can clearly see that fewer queries are generated:
with includes
User Load (0.6ms) SELECT "users".* FROM "users" LIMIT $1 [["LIMIT", 50]]
AccessLevel Load (0.2ms) SELECT "access_levels".* FROM "access_levels"
WHERE "access_levels"."id" IN ($1, $2, $3) [["id", 302310379], ["id", 141366183],
["id", 283116315]]
This solution works by eager-loading related access levels when users are fetched, so the app only makes one additional query for all of them instead of 50. Much better, right?
Dealing with Aggregations
But how about if we need to show aggregated information from multiple associated records? For example, say a User
has many Post
records and we want everyone’s total post count:
# app/models/user.rb
class User < ApplicationRecord
has_many :posts
end
# app/models/post.rb
class Post < ApplicationRecord
belongs_to :user
end
Let’s see how this performs with includes
:
logger.debug('with includes')
User.all.limit(50).includes(:posts).each do |user|
user.posts.count
end
with includes
User Load (2.1ms) SELECT "users".* FROM "users" LIMIT $1 [["LIMIT", 50]]
Post Load (2.1ms) SELECT "posts".* FROM "posts"
WHERE "posts"."user_id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12,
$13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27,
$28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42,
$43, $44, $45, $46, $47, $48, $49, $50)
ORDER BY "posts"."created_at" ASC [["user_id", 1066737858],
["user_id", 1066737857], ["user_id", 1066737853], ["user_id", 1066737852],
["user_id", 1066737851], ["user_id", 1066737850], ["user_id", 1066737849],
["user_id", 1066737848], ["user_id", 1066737847], ["user_id", 1066737846],
["user_id", 1066737845], ["user_id", 1066737844], ["user_id", 1066737843],
["user_id", 1066737842], ["user_id", 1066737841], ["user_id", 1066737840],
["user_id", 1066737839], ["user_id", 1066737838], ["user_id", 1066737837],
["user_id", 1066737836], ["user_id", 1066737835], ["user_id", 1066737834],
["user_id", 1066737833], ["user_id", 1066737832], ["user_id", 1066737831],
["user_id", 1066737830], ["user_id", 1066737829], ["user_id", 1066737828],
["user_id", 1066737827], ["user_id", 1066737826], ["user_id", 1066737825],
["user_id", 1066737824], ["user_id", 1066737823], ["user_id", 1066737822],
["user_id", 1066737821], ["user_id", 1066737820], ["user_id", 1066737819],
["user_id", 1066737818], ["user_id", 1066737817], ["user_id", 1066737816],
["user_id", 1066737815], ["user_id", 1066737814], ["user_id", 1066737813],
["user_id", 1066737812], ["user_id", 1066737811], ["user_id", 1066737810],
["user_id", 1066737809], ["user_id", 1066737808], ["user_id", 1066737807],
["user_id", 1066737806]]
Post Count (0.4ms) SELECT COUNT(*) FROM "posts"
WHERE "posts"."user_id" = $1 [["user_id", 1066737858]]
Post Count (0.2ms) SELECT COUNT(*) FROM "posts"
WHERE "posts"."user_id" = $1 [["user_id", 1066737857]]
Post Count (0.2ms) SELECT COUNT(*) FROM "posts"
WHERE "posts"."user_id" = $1 [["user_id", 1066737853]]
Post Count (0.2ms) SELECT COUNT(*) FROM "posts"
WHERE "posts"."user_id" = $1 [["user_id", 1066737852]]
Post Count (0.2ms) SELECT COUNT(*) FROM "posts"
WHERE "posts"."user_id" = $1 [["user_id", 1066737851]]
...
So what’s happening here? As it turns out, eager-loading with includes
doesn’t help when the query involves aggregate functions like count
. ActiveRecord can’t anticipate aggregations, so each call to count
does another computation in the database engine, again resulting in an N + 1 scenario.
Since the associations are already in memory however, you can simply use #size
, which performs the computation in Ruby. This is discussed more in ActiveRecord Optimisation: Utilising .exists?, .any? and .size.
User.all.limit(50).includes(:posts).each do |user|
user.posts.size
end
For many use cases this will be good enough, but although it’s faster than the N + 1 scenario, it’s still loading a lot of data that ends up unused just to do a simple count.
SQL-fu Workarounds
Rails’ built-in tools have limits, but we still have options for optimisation if you don’t mind some manual SQL.
Using JOIN and GROUP BY
# app/models/user.rb
class User < ApplicationRecord
has_many :posts
def self.with_post_count
left_outer_joins(:posts)
.group('users.id')
.select('users.*, COUNT(posts.user_id) AS post_count')
end
end
We can manually eager-load what we need with this with_post_count
class method. Basically it performs a join with the posts
table, groups posts with the same user_id
, counts posts within each group, and makes the result available as post_count
. Using left_outer_joins
instead of joins
(which performs an INNER JOIN
by default) means we include users that don’t have associated posts.
Class methods can be invoked like ActiveRecord scopes. Note that instead of user.posts.count
, we just call user.post_count
directly:
logger.debug('with join')
User.all.limit(50).with_post_count.each do |user|
user.post_count
end
with join
User Load (13.3ms) SELECT users.*, COUNT(posts.user_id)
AS post_count FROM "users" LEFT OUTER JOIN "posts"
ON "posts"."user_id" = "users"."id" GROUP BY "users"."id"
ORDER BY "users"."created_at" DESC LIMIT $1 [["LIMIT", 50]]
That’s looking way better. Now we’re down to a single query!
Caveats
Some care is needed when composing queries like this. For one thing, the manual select
provides post_count
like an attribute, but it only exists on this relation and cannot be used like normal attributes, such as in where
, having
, or order
clauses. Nor can it be assigned a value like virtual attributes.
Also, class methods on ActiveRecord
classes may behave similarly to scopes, but there are key differences. Scopes are guaranteed to return an ActiveRecord::Relation
on which further scopes can be invoked, so scope calls can be chained safely. This isn’t always true for class methods.
Let’s say we want to count user DMs in addition to posts.
# app/models/user.rb
class User < ApplicationRecord
has_many :posts
has_many :direct_messages
def self.with_post_count
left_outer_joins(:posts)
.group('users.id')
.select('users.*, COUNT(posts.user_id) AS post_count')
end
def self.with_message_count
left_outer_joins(:direct_messages)
.group('users.id')
.select('users.*, COUNT(direct_messages.user_id) AS message_count')
end
end
When chained, the query looks like this:
logger.debug('with chained joins')
User.all.limit(50).with_post_count.with_message_count.each do |user|
user.post_count
user.message_count
end
with chained joins
User Load (28.5ms) SELECT users.*, COUNT(posts.user_id)
AS post_count, users.*, COUNT(direct_messages.user_id)
AS message_count FROM "users" LEFT OUTER JOIN "posts"
ON "posts"."user_id" = "users"."id" LEFT OUTER JOIN "direct_messages"
ON "direct_messages"."user_id" = "users"."id" GROUP BY "users"."id"
LIMIT $1 [["LIMIT", 50]]
We end up with a single query, but although ActiveRecord and the database accept it without errors, it gives incorrect results. In this case, the generated query ends up with two COUNT calls on separately joined tables but a single GROUP BY. Each COUNT is effectively called on the Cartesian product of the joined tables, giving much higher results. Let’s look at the results for a single user:
User.where(id: 5623)
.with_post_count.with_message_count.each do |user|
puts 'Expected:'
puts user.posts.count
puts user.direct_messages.count
puts
puts 'Results:'
puts user.post_count
puts user.message_count
end
Output:
Expected:
20
223
Results:
4460
4460
Using Subqueries
The class methods above work correctly when called individually. The problem is in the way their results are combined when chained.
We can write another query that reuses our class methods in subqueries with their own GROUP BY and COUNT, and then joins results afterwards.
# app/models/user.rb
class User < ApplicationRecord
def self.with_post_and_message_count
select('users.*, post_count, message_count')
.joins('LEFT OUTER JOIN (%s) AS a ON users.id = a.id' %
with_post_count.to_sql)
.joins('LEFT OUTER JOIN (%s) AS b ON users.id = b.id' %
with_message_count.to_sql)
end
end
Now let’s run the same test:
User.where(id: 5623)
.with_post_and_message_count.each do |user|
puts 'Expected:'
puts user.posts.count
puts user.direct_messages.count
puts
puts 'Results:'
puts user.post_count
puts user.message_count
end
Expected:
20
223
Results:
20
223
This gives us correct results, but it’s not as intuitive as chaining, and we end up with yet another unchainable class method.
If we really want safe chaining and don’t mind maintaining SQL, we can do away with joins and just use subqueries for the class methods in the first place:
Expected:
# app/models/user.rb
class User < ApplicationRecord
def self.with_subquery_post_count
select <<~SQL
users.*, (
SELECT COUNT(*) FROM posts
WHERE posts.user_id = users.id
) AS post_count
SQL
end
def self.with_subquery_message_count
select <<~SQL
users.*, (
SELECT COUNT(*) FROM direct_messages
WHERE direct_messages.user_id = users.id
) AS message_count
SQL
end
end
Writing the methods this way allows us to chain them directly and still produce a single database call:
logger.debug('with subqueries')
User.all.limit(50)
.with_subquery_post_count
.with_subquery_message_count.each do |user|
user.post_count
user.message_count
end
with subqueries
User Load (6.0ms) SELECT users.*,
(
SELECT COUNT(*) FROM posts
WHERE posts.user_id = users.id
) AS post_count
, users.*,
(
SELECT COUNT(*) FROM direct_messages
WHERE direct_messages.user_id = users.id
) AS message_count
FROM "users" ORDER BY "users"."created_at" DESC LIMIT $1 [["LIMIT", 50]]
Conclusion and Benchmarks
Each of the solutions above offers pros and cons. I encourage readers to explore the tradeoffs and choose an approach that works for your use case and constraints.
For those who prioritise performance, here’s a comparison using Ruby’s Benchmark library and the benchmark-memory gem:
def run_benchmarks(bm)
bm.report('N + 1') do
User.all.each do |user|
user.posts.count
user.direct_messages.count
end
end
bm.report('includes and size') do
User.all.includes(:posts, :direct_messages).each do |user|
user.posts.size
user.direct_messages.size
end
end
bm.report('with joins') do
User.all.with_post_and_message_count.each do |user|
user.post_count
user.message_count
end
end
bm.report('with subqueries') do
User.all
.with_subquery_post_count
.with_subquery_message_count.each do |user|
user.post_count
user.message_count
end
end
end
puts 'Runtime'
Benchmark.bm(20) { |bm| run_benchmarks(bm) }
puts 'Memory use'
Benchmark.memory { |bm| run_benchmarks(bm) }
Runtime
user system total real
N + 1 1.087636 0.119499 1.207135 ( 1.892550)
includes and size 0.241659 0.010494 0.252153 ( 0.288259)
with joins 0.036628 0.001048 0.037676 ( 0.056034)
with subqueries 0.062692 0.002064 0.064756 ( 0.092025)
Memory use
N + 1 94.345M memsize ( 398.231k retained)
1.424M objects ( 3.149k retained)
50.000 strings ( 50.000 retained)
includes and size 33.337M memsize ( 14.207M retained)
413.804k objects ( 108.024k retained)
50.000 strings ( 50.000 retained)
with joins 6.378M memsize ( 4.659M retained)
55.897k objects ( 30.903k retained)
50.000 strings ( 50.000 retained)
with subqueries 8.144M memsize ( 4.642M retained)
68.278k objects ( 30.776k retained)
50.000 strings ( 50.000 retained)
As you can see, eager-loading and using #size
is much faster than the N + 1 scenario, and computing counts in the database performs even better by an order of magnitude and uses much less memory. Optimising queries for specific use cases can result in significant performance gains, which directly translate to lower operational costs.
That’s it for now, happy hacking!