Skip to content
PLAY VIDEO PLAY VIDEO PLAY VIDEO
By Mikel Lindsaar

Rescuing an exploding Rails App with Skylight.io

Recently, a new client approached us with a performance problem on their existing Ruby on Rails application; they were experiencing massive growth with over 50,000 new users per day signing up, and their app was receiving over 400 requests per second (and growing).

The rapidly increasing load was leading to big problems, with their existing Rails application experiencing frequent outages and causing sleepless nights for their team. They asked reinteractive to investigate and find out how we could get the app stable as fast as possible.

As time was of the essence, our first move was to install our performance tool of choice skylight.io (referral link, with a $50 credit).

Within 15 minutes we already had enough data flowing into Skylight to use its helpful (and aptly named) Agony Index to get an idea of what was going on.

Skylight application trace showing DevicesController#index as highest agony score at 463ms

It was clear that a lot of performance could be gained by digging into the DevicesController#index action. It had a 463ms "problem" response and was typically around 144ms. The key issue was this endpoint was being hit a LOT. It was the most hit action in the system getting close to 400 requests per second, so any change we made to it would have a massive impact.

When we dug deeper, we learned that the main web servers were being hammered, causing load time to increase and further degrading performance. So getting this action performant would give a lot of wins across the whole system.

Opening up this action we found the following trace:

Skylight method trace showing DevicesController#index action

Some things jumped out immediately. There was a large spread across the response time distribution, all they way up to 400ms. This indicated that the server was taking variable amounts of time to service the request, which usually points towards a loaded down server with not enough available capacity.

Looking at the trace, we could see that there was a SQL select query to a settings table, as well as three separate select queries to the devices table. Each of these queries were taking about 12ms to complete, so they were good targets for optimising.

Looking at the code, we found that there was a Setting model and that a Setting record was being called on each request to configure the app. After confirmation with the client it was found that this setting data barely ever changed (about once every few months) and so implementing this in a cache would be totally fine. Now, looking at this from a Skylight trace it is completely obvious to not make a DB request 400 times per second to see if any application wide configuration had happened to have changed, but this loading of settings data was buried within the application_controller and pretty much everyone had forgotten it was being called. Seeing it bright as day in the trace though got everyone's attention, and so this could be cached out instantly.

This was soon done and not only did it immediately drop the response time to the DevicesController#index action by 12ms, but also it removed the SQL query for every other request landing on the app, saving 12ms by about 1.5 billion requests a month. This one small change saved over 400 HOURS of computing time a month! Not a bad saving from what worked out to be about 3 lines of code.

Next up were the three select queries against the Devices table.

The first call was loading up the device in question by ID. As Rails already by default creates a unique index on the devices table primary key, not much could be done about this.

But the other two calls were checking for uniqueness on a validation; that is, the Device model had unique validation requirements on two attributes that looked something like this:

validates_uniqueness_of :first_attribute
validates_uniqueness_of :second_attribute

What this meant was that before the Rails app could save the devices model, it had to check to make sure that no other record in the table had either of these attributes, and the only way to do that was to fire two SQL queries before any create or update action that looked like:

SELECT 1 FROM devices WHERE devices.id = "?"
                      AND devices.first_attribute != "?"
SELECT 1 FROM devices WHERE devices.id = "?"
                      AND devices.second_attribute != "?"

These two queries had to traverse the devices table and were taking another 12ms each, or about 24ms for the pair.

As there were already unique indices on the devices table for both the first_attribute and the second_attribute, we were able to make a change in the Rails code to only reverify the uniqueness of the attributes if the record had changed. So we made the following change to the code:

validates_uniqueness_of :first_attribute,
                        if: :first_attribute_changed?
validates_uniqueness_of :second_attribute,
                        if: :second_attribute_changed?

This removed the need to check on data updates where the attributes had not changed, since if the attributes were still the same value that was retrieved from the database, they were already guaranteed to be unique.

After making the above two changes, the new trace looked like this:

Skylight method trace showing DevicesController#index action

Which showed a number of things that all pointed to success.

Firstly, the clustering of the response times were much tighter, indicating that the server had more breathing time to handle the requests (this was also in part to us helping the client boot up some more application servers) and secondly, the three select queries were no longer in the trace, saving 30-40ms per request, and also reducing the load on the database by about 900 queries per second.

The client app is now stable and we are helping them move to our OpsCare Service over the coming months. If you have a Ruby on Rails application that needs some performance tuning, please get in touch.

Latest Articles by Our Team

Our expert team of designers and developers love what the do and enjoy sharing their knowledge with the world.

We Hire Only the Best

reinteractive is Australia’s largest dedicated Ruby on Rails development company. We don’t cut corners and we know what we are doing.

We are an organisation made up of amazing individuals and we take pride in our team. We are 100% remote work enabling us to choose the best talent no matter which part of the country they live in. reinteractive is dedicated to making it a great place for any developer to work.

Free Community Workshops

We created the Ruby on Rails InstallFest and Ruby on Rails Development Hub to help introduce new people to software development and to help existing developers hone their skills. These workshops provide invaluable mentorship to train developers, addressing key skills shortages in the industry. Software development is a great career choice for all ages and these events help you get started and skilled up.

  • Webinars

    Webinars

    Webinars are our online portal for tips, tricks and lessons learned in everything we do. Make the most of this free resource to help you become a better developer.

    Learn more about webinars

  • Installfest

    Installfest

    The Ruby on Rails Installfest includes a full setup of your development environment and step-by-step instructions on how to build your first app hosted on Heroku. Over 1,800 attendees to date and counting.

    Learn more about Installfest

  • Development Hub

    Development Hub

    The Ruby on Rails Development Hub is a monthly event where you will get the chance to spend time with our team and others in the community to improve and hone your Ruby on Rails skills.

    Learn more about Development Hub

Get the “reinteractive Review” Monthly Email