Background Job Processing 101

If you have a web application of modest complexity, sometimes you will want your app to perform a potentially long running or unreliable task initiated by your users. At the same time, you want your web application to always respond quickly and reliably to requests from your user’s web browser. What is the solution?

Asynchronous Processing

A good technique to ensure quick and reliable responses from your web application is to identify any potentially long running or unreliable actions and to queue them up for processing by a background worker like SideKiq. This is also known as processing asynchronously.

This is a good technique for dealing with potentially unreliable processes (such as communicating with a third party). If the job is processed into a background queue, if it fails to complete it can remain stored in the queue until it can be rerun successfully.

A real life example

Let’s look at a real life example of pushing a collection of images to Facebook as a photo album. This is an ideal candidate for background processing because it can take a long time and can be unreliable (imagine a network issue, or a Facebook outage).

In this example, a user selects 20 x 10MB images that currently reside in a private s3 bucket. They click “publish to Facebook. Now, our application needs to send 200MB of data to Facebook. Even with the high bandwidth that S3 and Facebook bring, transferring 200MB can take a while, and when every millisecond counts we should look for a better way.

So, instead of processing the photo transfer job in the request (synchronously), we create a set of background jobs, one for each image, and place them on a queue. The application can then continue fulfilling the browser request and respond to the user telling them “the photos will be published to Facebook soon”. Now the background worker swings into action. It is constantly monitoring the queue for new jobs and when one arrives it picks up the job and starts pushing the photos out to the Facebook API. If something goes wrong (e.g. Facebook is down, network issues) the job can be retried in the future until it is successful. Because we created a separate job for each image, if any particular image upload fails (perhaps a single image is too large and is rejected by the Facebook API), it won’t stop the publishing of the other images. In addition to this, we won’t waste time retrying the upload of images that have already been successfully transmitted.

Application Simplification

One of the benefits from this approach is that it also simplifies our code as we can take advantage of features of the more advanced background job systems. Handling Facebook image upload failures in the request would be quite complex as we would need to inform the user about which images failed to upload and which were successful. We would have to ask the user to retry the upload, or perhaps we would create our own hand-rolled job solution for tracking past failures and kicking off retries. And this is just for one example, if we introduce another long running task, would we be able to easily abstract our solution for use in new situations? Much better to use someone else’s well designed and battle hardened code.

We use this architecture constantly

So, having a background worker in the system can improve perceived performance and reliability when communicating to outside services. At reinteractive, we use this architecture in almost every application that we work on and it is considered best practice in the industry. i.e. we really believe in it and so do many others. This is a very powerful tool for improving your applications performance and reliability and most systems can benefit from it. Once you have introduced this concept into your system, you’ll find yourself identifying all sorts of places it becomes useful.