Skip to content
PLAY VIDEO PLAY VIDEO PLAY VIDEO
By Glen Crawford

Using Ruby and Tesseract to Recognise Text in an Image

At reinteractive we have recently completed a project calling for us to use OCR (Optical Character Recognition) technology to recognise printed text from photographs. It's a fun problem to solve, and so here is a brief post on how you can also set up your Rails app with OCR capabilities.

Tesseract

Tesseract is one of the most popular OCR libraries. It's free and open source, runs on multiple platforms, supports a lot of languages, and its ongoing development is sponsored by Google. It is primarily a command line tool (although there are third-party projects that supply a GUI), and, luckily for us, there are a couple of Ruby gems out there allowing us to interact with it from a Ruby/Rails app. For this post, we will use https://github.com/meh/ruby-tesseract-ocr.

Set up

First, you will need to install Tesseract. Tesseract is up to version 4.0.0, however this gem is only compatible up to version 3.02.02, so you will need to install that version at the latest. You can do this with your favourite package manager, such as Homebrew (brew install tesseract).

Next, add the ruby gem to your app using Bundler. Add gem 'tesseract-ocr' to your Gemfile, and then run bundle install.

Code

At its most basic, this is all you need to do to OCR an image:

tesseract = Tesseract::Engine.new do |config|
  config.language = :eng
end

# You can also pass an IO object, or even an ImageMagick image.
# Tesseract allows any image format supported by the Leptonica library.
tesseract.text_for('path/to/image.jpg')

The text_for method is the simplest way of using the gem; it simply returns all the text that it can find as a single string. However, you can also interact with it at varying levels of granularity (ie, blocks, paragraphs, lines, words, and symbols). There are accessor methods supplied for each level of granularity (each_paragraph, each_line, etc) and they all work the same way. Once you have decided which level of granularity you are going to go with (in the below example we will use lines) there are two ways to get the results:

You can execute a block for each paragraph/line/etc:

tesseract.image = 'path/to/image.jpg'

tesseract.each_line do |line|
  line.text
end

Or you can get an array of each paragraph/line/etc:

tesseract.image = 'path/to/image.jpg'

tesseract.lines.each do |line|
  line.text
end

Once you have the results, whether yielded or returned, you can inspect them to see how accurate the OCR was (there are more methods than just these three, but these are the most important ones):

# The OCRd text.
> line.text
=> "Lorem ipsum dolor sit amet..."

# The coordinates of the element on the image. You can get the position and size with methods such as left, width, etc.
> line.bounding_box
=> #<BoundingBox(20, 62): 1421x558>

# How confident Tesseract is that the text is correct.
> line.confidence
=> 47.571746826171875

Accuracy

The above is all you need to get results from Tesseract. However, the real issue is accuracy. The accuracy of the results will depend on a number of factors, such as the quality of the image (is it a photograph or a scan?), shadows, rotation, etc. You may need to do some preprocessing of the image in order to increase the accuracy of the output. You might find it helpful to use RMagick and ImageMagick to crop, rotate, or resize images before running them through Tesseract. For example, in my use case, I needed to OCR labels issued by hospitals with patient information on them, rather than standard documents with lines and paragraphs of text. I found it helpful to crop out only the fragments of the labels that I needed, it order to prevent Tesseract from getting thrown off by barcodes and other odd symbols.

Your experience with Tesseract will thus be dependent on the quality of your input images, and how well you are able to clean them up prior to running them through Tesseract. However, if your inputs are good, then the excellent OCR capabilities provided by Tesseract and the simple API provided by this gem should make recognising text from your images a breeze.

Latest Articles by Our Team

Our expert team of designers and developers love what the do and enjoy sharing their knowledge with the world.

We Hire Only the Best

reinteractive is Australia’s largest dedicated Ruby on Rails development company. We don’t cut corners and we know what we are doing.

We are an organisation made up of amazing individuals and we take pride in our team. We are 100% remote work enabling us to choose the best talent no matter which part of the country they live in. reinteractive is dedicated to making it a great place for any developer to work.

Free Community Workshops

We created the Ruby on Rails InstallFest and Ruby on Rails Development Hub to help introduce new people to software development and to help existing developers hone their skills. These workshops provide invaluable mentorship to train developers, addressing key skills shortages in the industry. Software development is a great career choice for all ages and these events help you get started and skilled up.

  • Webinars

    Webinars

    Webinars are our online portal for tips, tricks and lessons learned in everything we do. Make the most of this free resource to help you become a better developer.

    Learn more about webinars

  • Installfest

    Installfest

    The Ruby on Rails Installfest includes a full setup of your development environment and step-by-step instructions on how to build your first app hosted on Heroku. Over 1,800 attendees to date and counting.

    Learn more about Installfest

  • Development Hub

    Development Hub

    The Ruby on Rails Development Hub is a monthly event where you will get the chance to spend time with our team and others in the community to improve and hone your Ruby on Rails skills.

    Learn more about Development Hub

Get the “reinteractive Review” Monthly Email