Blog icon

Frankenstein's ActiveRecord: How to stitch together complex ActiveRecord queries from simple parts

By Daniel Heath,
Daniel Heath
Scroll down to read

Abstraction: use scope to encapsulate database changes

Active record makes it easy to write simple queries anywhere in your application. The key phrase here is anywhere in your application: When database queries are scattered throughout your application, simple database changes may require modifications to your controllers, your views, helpers, mailers, etc.

ActiveRecord provides you with scope which allows you to hide the details of your schema so that database changes don't affect the entire application.

Here is an example of using a scope to look up a user by their email address:

class User < ActiveRecord::Base
  has_many :emails

  scope :by_email, -> (email) {
    joins(:emails).where(emails: {address: email})
  }
end

User.by_email('john@example.org')

# SELECT "users".* FROM "users"
# INNER JOIN "emails"
# ON "emails"."user_id" = "users"."id"
# WHERE "emails"."address" = 'john@example.org'

When you use a scope, if you change the schema, you can update the User model while everything else remains the same.

Find matching records using LIKE and ILIKE

Let's say we have been getting spam comments coming from the same email domain and we want to get rid of all comments from all users who have that particular domain. Here our scope needs to join to the emails table:

scope :by_email_domain, -> (domain) {
    joins(:emails).where("emails.address like ?", "%@#{domain}")
}

# Find all users who have ever had an email address at example.org
User.by_email_domain('example.org')

# SELECT "users".* FROM "users"
# INNER JOIN "emails"
# ON "emails"."user_id" = "users"."id"
# WHERE (emails.address like '%@example.org')

like will retrieve all records where the email address includes the value returned by #{domain}. You can also use ilike for a case-insensitive version of like.

Eliminate duplicates by replacing a join with a sub-select

There is one gotcha with the above code: if there are multiple users at the same domain, the scope will return multiple records. To eliminate duplicates, you can replace the join with a sub-select. This has no impact on performance (at least if you are using Postgres).

Performance tip #1: The sub-select is built by calling #to_sql on ActiveRecord::Relation to generate the sub-select. This avoids loading the records.

Performance tip #2: It also avoids name collision issues that arise when joining to the same table more than once.

scope :by_email_domain, -> (domain) {
  user_ids = Email.by_domain(domain).select(:user_id)
  where "users.id in (#{user_ids.to_sql})"
 }

User.by_email_domain('example.org')

# SELECT "users".* FROM "users"
# WHERE (users.id in (
#    SELECT "emails"."user_id" FROM "emails" WHERE (address like '%@example.org')
# ))

In the above example, we first execute a query that generates all the user ids (Email.by_domain(domain).select(:user_id)). Until we actually try to use that result (by iterating through it, for example), it is just an ActiveRecord relation. It hasn't been fetched from the database yet.

This is important because what we do on the next line is call to_sql on it. This will return the SQL query that would have been run to fetch those records, but the user ids are not actually fetched.

Doing it this way maintains the performance advantages of having a single query but, because there is no join, you don't have any duplicate records.

Filter multiple values by passing in an array

To filter by multiple domains at once, you can pass in an array:

class Email
  scope :by_domain, -> (domain) {
    where <<-SQL, Array(domain).map {|e| "%#{e}"}
      emails.address LIKE ANY (select unnest(ARRAY[?]))
    SQL
  }
end

Email.by_domain(['example.com', 'example.org'])

# SELECT  "emails".* FROM "emails"
# WHERE (
#   emails.address LIKE ANY (
#     select unnest(ARRAY['%example.com','%example.org'])
#   )
# )

Note: Array(domain) converts the domain parameter to an array if one item if it isn't an array already.

When you are comparing a column value with values in an array, you use LIKE ANY.


Ask Daniel to write about it if he wants it included here documents in ruby <<-SQL is a multi-line string


Combine conditions from multiple queries using Merge

ActiveRecord has a merge method on active relation that allows you to combine conditions from multiple queries.

Let's say you have a product model and you want to search either by the single search term "mazda", or on multiple search terms such as "mazda, door, and handle".

You would do this in both cases by using Product.matching_term. When there are multiple terms, you can call Product.matching_term on each of the terms one by one and then merging the results.

This can be useful in a number of scenarios, such as when you are creating an admin interface and you need to build up complex filtering.

# All products matching 'mazda'
Product.matching_term("mazda")

# Products matching 'mazda', 'door' and 'handle'
terms = %w( mazda door handle )
products = terms.map {|t| Product.matching_term(t) }.reduce(&:merge)

# select * from products
# where name ilike '%mazda%'
#   and name ilike '%door%'
#   and name ilike '%handle%'

Generate an OR query in ActiveRecord using Union

Union is great because it allows you to generate an OR query in ActiveRecord, which is something that is surprisingly difficult to do.

UNION will find everything in each of the component queries and will remove duplicates. If you do not want to remove duplicates, use UNION ALL instead.

mazdas = Product.matching_search('Mazda')

mitsubishis = Product.matching_search("Mitsubishi")

mazdas_and_mitsubishis = Product.from(mazdas.to_sql + ' UNION ' + mitsubishis.to_sql)

# select * from products
# where name ilike '%Mazda%'
# UNION
# select * from products
# where name ilike '%Mitsubishi%'

Build up complex queries using common table expressions

Common table expressions allow you to progressively build up complex queries.

For example, here is a scope that adds a calculated field to Campaign. It ranks campaigns into 1..5 according to how much money they have raised.

class CampaignInstanceUser < ApplicationRecord
  scope :with_quintile, -> {
    from <<-SQL
      ( with "quintile_table" as (
          select
            *,
            ntile(5) over (order by total_raised_cents desc) as quintile
          from "campaign_instance_users"
        ) select * from "quintile_table"
      ) as "campaign_instance_users"
    SQL
  }
end

CampaignInstanceUser.with_quintile.where(quintile: 5)

This code uses ntile, which is a Postgres function that tells you where your record appears when the record set is sorted a certain way. ntile(5) will give every record a number from 1 to 5 according to whether it is in the top 20%, the next 20%, and so on.

From the code example above, where quintile = 1 will give you the top 20% of campaigns according to how much money they have raised.

To find out how reinteractive can turn your web application vision into reality, get in touch with us through our contact form or call us on +61 2 8019 7252.