In a previous post, we brought Tesseract to Ruby land. Doing so enabled us to combine the capability to perform OCR with the robust Ruby on Rails web framework - from within our web server. However, there are good use cases for running it on client’s as well, such as on mobile phones. We needed to do this in one of our projects, and this post is a modest attempt to share our learnings.
At reinteractive, we love to work with React Native, which is a framework for developing native mobile apps with JavaScript. The motto “Learn once, write anywhere” saves us a lot of time when developing for multiple platforms, regarding our proficiency with web technologies. Not to mention the balance that the framework brings between speed of development and native performance. We will use this framework to bring Tesseract to mobile.
In summary, this post will start off from the Javascript runtime (the starting point of every React Native app), we’ll then cross a bridge to reach the native iOS runtime, run Tesseract, and bring back the data to JavaScript.
- The JavaScript runtime
- React Native Bridge
- The Native runtime
- Implementing scanForText method
- Scanning text with Tesseract
- Back to the JavaScript runtime
By reading through this post you will learn how to open communication between React Native’s JavaScript runtime and the native iOS runtime, and do a basic Tesseract setup on mobile.
This post assumes that there is already an existing React Native app. If you don’t have one, you can clone this repo - which contains the code in the following sections.
The JavaScript runtime
I initially thought “there’s no way we can run Tesseract just from within JavaScript”. It turned we can, albeit indirectly. There is a limited number of libraries we can use for React Native and Tesseract integration. They’re good if they suit your use cases, but if they don’t, then you need to do it yourself, as we did. We will start off with this JavaScript code.
```js // App.js import {NativeModules, Image} from ‘react-native’; import SampleImage from ‘./sample_image.jpg’;
const OCR = NativeModules.OCR;
const source = Image.resolveAssetSource(SampleImage); const coordinates = { x: 0, y: 0, width: 300, height: 300};
const scanResult = await OCR.scanForText(source.uri, coordinates); console.log(scanResult) // Some text from the image. ```
The brief JavaScript code above makes use of React Native’s NativeModules
. By referencing it, we got access to OCR
and OCR.scanForText
. At the very least, these are the constructs we need on the JavaScript side to integrate with Tesseract. We’ll learn more about them later. The others are just for loading the image that we intend to scan. Onto the bridge.
React Native Bridge
React Native uses a ‘Bridge’ to allow Javascript code to reach out to some native code. For iOS this means we can reference and call Swift and Objective-C classes and methods, and for Android, Java and Kotlin code. This bridge exposes an array of capabilities and possibilities to our limited JavaScript runtime. Anything that is reachable by our native code is now reachable by Javascript; we will use this to pursue our endeavour.
For an exhaustive guide about the bridge, which discusses its capabilities as well its limitations, here’s one from the React Native team. For our purposes, we will just touch on the minimum.
The Native runtime
Being a “bridge”, it has to hook to something on both sides. We’ve seen the JavaScript code that consumes NativeModules.OCR
. NativeModules
is an Object provided by React Native itself. But what about OCR
? It is a class that we will implement on the native side ourselves.
For the purposes of this post, we will be working with iOS and Objective-C. Some of the code from the repo (mentioned above) has been omitted for brevity.
```objective-c // OCR.h file #import <React/RCTBridgeModule.h>
// Class declaration that implements RCTBridgeModule protocol
@interface OCR : NSObject
In Objective-C, classes are declared in two files. The first file is the header(.h) file, which contains the public interface declarations of a class. The code above declares the OCR
class which we’ve already seen in our JavaScript code.
In order to be accessible via the bridge, it implements the RCTBridgeModule protocol. ‘RCT’ stands for ReaCT.
```objective-c // OCR.m file #import “OCR.h”
@implementation OCR
RCT_EXPORT_MODULE();
// Method available in JavaScript runtime. RCT_EXPORT_METHOD(scanForText:(NSString *)url coordinates:(NSDictionary *)coordinates resolver:(RCTPromiseResolveBlock)resolve rejecter:(RCTPromiseRejectBlock)reject) {
// We will run Tesseract from here. } @end ``` The second file is the implementation(.m) file, which contains, as the name suggests, the implementation details of the class.
The implementation(.m) file above calls the RCT_EXPORT_MODULE
and RCT_EXPORT_METHOD
macros, which are the other requirements for React Native. And yes, as you may have guessed, the RCT_EXPORT_METHOD
is a macro call that exposes the scanForText
method to JavaScript (formally called scanForText:coordinates:resolver:rejecter
method in Objective-C). If this was a normal Objective-C method (without the RCT_EXPORT_METHOD macro), it would look like this:
```objective-c - (NSString *)scanForText:(NSString *)url coordinates:(NSDictionary *)coordinates resolver:(RCTPromiseResolveBlock)resolve rejecter:(RCTPromiseRejectBlock)reject {
// We will run Tesseract from here. } ```
So in summary, to call native code from within our JavaScript runtime, we need to have the following:
RCTBridgeModule
protocol declaration in the header(.h) fileRCT_EXPORT_MODULE();
macro call on the implementation(.m) fileRCT_EXPORT_METHOD();
macro call on the implementation(.m) file
All that’s left now is to implement the scanForText
method body.
Implementing scanForText method
In the above example, we had this line on our JavaScript code.
js
const scanResult = await OCR.scanForText(source.uri, coordinates);
This method accepts two parameters: the image URL (can be a local file location) and coordinates (within the image specifying where to look for text).
By nature, the React Native Bridge is asynchronous. Thus we can’t get a hold of the return value unless we make use of a callback or return a Promise Object. Hence the await
keyword; JavaScript’s async/await is a more intuitive way of working with JavaScript promises.
```objective-c RCT_EXPORT_METHOD(scanForText:(NSString *)url coordinates:(NSDictionary *)coordinates resolver:(RCTPromiseResolveBlock)resolve rejecter:(RCTPromiseRejectBlock)reject) {
// Run the block in a different thread. dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
// Load the image.
NSData *imageData = [NSData dataWithContentsOfURL:[NSURL URLWithString:url]];
UIImage *image = [UIImage imageWithData:imageData];
// Scan the image for text.
NSString *result = [self scanImageForText:image onCoordinates:coordinates];
dispatch_async(dispatch_get_main_queue(), ^{
// Resolve the method with the scan result (Returns a Javascript Promise object).
resolve(result);
}); }); } ```
As you can see in this Objective-C code, the scanForText
method has four parameters (in contrast to the two parameters that we passed in in the JavaScript code). The last two parameters (resolver
and rejecter
) are meant to resolve/reject the Promise Object that we’re expecting in the JavaScript runtime.
The dispatch_async
calls are threading code. We use them so we don’t block the UI from responding as we do the Tesseract scanning. The lines within it prepares the image before we pass it to the actual Tesseract call.
objective-c
NSString *result = [self scanImageForText:image onCoordinates:coordinates];
Scanning text with Tesseract
We wrap our Tesseract call inside a scanImageForText:onCoodinates
method.
```objective-c - (NSString *)scanImageForText:(UIImage *)image onCoordinates:(NSDictionary *)coordinates {
// Initialize Tesseract.
G8Tesseract *tesseract = [[G8Tesseract alloc] initWithLanguage:@"eng"];
// Give the image to Tesseract.
[tesseract setImage:image];
// Give the scan area coordinates - must be within the image.
CGFloat x = [[coordinates objectForKey:@"x"] floatValue];
CGFloat y = [[coordinates objectForKey:@"y"] floatValue];
CGFloat width = [[coordinates objectForKey:@"width"] floatValue];
CGFloat height = [[coordinates objectForKey:@"height"] floatValue];
CGPoint origin = CGPointMake(x,y);
CGSize size = CGSizeMake(width,height);
tesseract.rect = CGRectMake(origin.x, origin.y, size.width, size.height);
// Scan for text.
[tesseract recognize];
// Return the result.
return tesseract.recognizedText; } ```
We’re lucky that someone has bundled Tesseract for use in iOS. In our case, we used Tesseract OCR iOS. The easiest way to include this module to a React Native project is via Cocoapods. It is a package management tool for iOS and macOS development that is being used by React Native for iOS builds.
To install it as dependency, you can follow the Get Started section here. Remember that the iOS code in a React Native project is inside app_folder/ios
. For this setup, you will also need to bundle the English “tessdata” files with your build (In XCode see Build Phases > Copy Bundle Resources section of the repository above).
The first line in our method body is Tesseract’s initialization.
objective-c
G8Tesseract *tesseract = [[G8Tesseract alloc] initWithLanguage:@"eng"];
G8Tesseract provides various ways to initialise Tesseract. We can pass different configuration options depending on the image and type of text that we are trying to extract. Some configurations are more accurate with whole paragraphs of text, some are more accurate for single-line phrases. In this case, we’re just using the default behaviour and telling Tesseract that the words we are trying to extract are in English.
The succeeding lines set the image and coordinates of where to perform the text scan.
```objective-c [tesseract setImage:image];
// Give the scan area coordinates - must be within the image.
CGFloat x = [[coordinates objectForKey:@"x"] floatValue];
CGFloat y = [[coordinates objectForKey:@"y"] floatValue];
CGFloat width = [[coordinates objectForKey:@"width"] floatValue];
CGFloat height = [[coordinates objectForKey:@"height"] floatValue];
CGPoint origin = CGPointMake(x,y);
CGSize size = CGSizeMake(width,height);
tesseract.rect = CGRectMake(origin.x, origin.y, size.width, size.height); ```
As you can see in the code, the coordinates are combined to form a rectangle (CGRect
). This is a common pattern in iOS/MacOS programming when dealing with 2D coordinates.
And the last two lines are just the calls to perform the scan [tesseract recognize]
and return the values.
Drop these header(OCR.h) and implementation(OCR.m) files in your React Native’s iOS folder, and you’re good to go with your OCR module.
Back to the JavaScript runtime
```js // App.js import {NativeModules, Image} from ‘react-native’; import SampleImage from ‘./sample_image.jpg’;
const OCR = NativeModules.OCR;
const source = Image.resolveAssetSource(SampleImage); const coordinates = { x: 0, y: 0, width: 300, height: 300};
const scanResult = await OCR.scanForText(source.uri, coordinates); console.log(scanResult) // Some text from the image. ```
Summary
We’ve completed a round trip so we can use Tesseract in our mobile app. By calling OCR.scanForText(source.uri, coordinates)
, we’ve reached into the native runtime, ran Tesseract, and returned the values back to our JavaScript runtime (which is the text of the image within the given coordinates).
The learnings that we had in this previous post still apply when running Tesseract on mobile. The result of your scans primarily depend on the quality of image; clearer (e.g. without shadows, text properly aligned / no rotation, text only, etc) and higher resolution images will surely yield better results. This explains why we’re using coordinates instead of scanning for the whole image.
Before starting out with this integration, one question I had was the performance of Tesseract on mobile. Of course, this will vary greatly on the hardware capabilities of the device and the image that we’re scanning. Overall, with the following setup:
- High resolution images (coming from iPhone cameras)
- Scans targeted to a specific area of the image
- And running on iPhone 6 and up
It was magic to see that a scan finished in a fraction of a second. We were happy with the performance and concluded that it will result in a good user experience. Performing scans in succession, paired with image preprocessing, didn’t pose a problem as well.
That all being said, the code above is not entirely suitable for production use. Handling for edge and error cases will still need to be done. Nevertheless, with what we have, we’ve opened a lot of possibilities. Much more can be done. Hopefully this will be of help to your future Mobile - Tesseract integration.