Optical character recognition (OCR): Algorithms and use cases

Feb 01, 2021 · 8 min read

Does your mobile app ask people to type their ID or payment details to complete some tasks? That must be one of the things your users like the least about your app. Entering important data on the go, on a small screen, double-checking for errors – a sure path to frustration for many.

Is there a way around that? Sure. Embracing optical character recognition to let your app users scan data with their smartphone cameras and get the required fields to autocomplete.

Users of banking, medicine, transportation, and other apps requiring accurate data entry appreciate OCR features very much. Let’s consider what OCR is and how it works to decide if you should add it to your app.

What is optical character recognition?

Optical character recognition (OCR) is the transformation of machine printed or handwritten text from its two-dimensional image representation into machine-readable text. It allows mobile and web applications to extract text from every possible image, be it an ID document, receipt, invoice, ticket, or a photo with car number plates or wall graffiti on it.

The first commercial application of OCR was a paper-to-computer text conversion program used for digitizing printed documents and uploading their textual versions onto searchable online databases. The technology-enabled public and private organizations to change their paper archives for electronic ones.

Now, many businesses add OCR features to their web and mobile apps. The technology is widely used in banking, insurance, hospitality, transportation, logistics, retail, and other sectors. It helps companies to streamline identification, information extraction, and data entry to improve their employee and the customer experience in various situations.

Uses of OCR

What are the common uses of OCR in apps?

If you need to convert text-containing images to editable text documents, you’ll find a number of optical character recognition apps serving this sole purpose. Apart from that, OCR empowers the identification and payment-related functionality in more complex software. Let’s consider several popular examples of OCR usage:

Customer onboarding in mobile banking

Mobile banking apps use OCR to implement a customer-centric sign-up flow. People scan their ID card with their smartphone camera instead of entering data manually. In a matter of seconds, they get their personal information extracted, processed, verified against databases, and entered in their account details.

Payment details entry in mobile payments

When it comes to mobile payments, the manual entry of account numbers and other data required for transactions is a pain for customers. Using built-in OCR features, people can get all necessary data extracted from a paper invoice or a plastic card and automatically entered in the right fields in a payment form. Such solutions reduce the risk of entering incorrect data, which saves the payers a lot of time and nerves.

Data entry for VAT refund claims

Optical character recognition programs help businesses collect information needed for claiming VAT refund on employee business travel expenses. An accountant can use OCR to quickly process a pile of VAT receipts even though they are written in a foreign language, badly printed, or damaged. AN OCR that specializes in reading receipts makes VAT reclaim a less tedious and faster procedure.

Check-in automation in hospitality

Adding OCR features to property management systems (PMS) allows hoteliers to simplify check-in for their guests. Instead of adding people’s ID information in the PMS manually, a receptionist now can capture data from ID documents using their tablet camera. Using OCR speeds up check-in, reduces mistyping errors, and makes it easier for receptionists to check guests against their records to identify their patrons or blacklisted guests.

Freight management in supply chains

Transportation companies implement automatic container code recognition systems that leverage OCR to help workers scan and recognize container codes. OCR allows logistics managers to accurately extract container codes even under challenging working conditions and enter data accurately and effortlessly to ERP and WMS systems. and real-time cargo tracking.

Your application could benefit from OCR in a similar or a new way. Let’s make a short overview of how optical character recognition technology works.

What does optical character recognition do?

To enable features for converting images to text in your app, you’ll need to integrate an OCR engine into it. The engine will be responsible for several automatic sub-processes that altogether substitute the optical character recognition pipeline:

  • Image preprocessing may include a range of manipulations needed to raise chances of successful information extraction, such as rotating, aligning, cleaning artifacts, removing shadows, and converting the photo or scan into a binary image.
  • Text localization involves detecting text areas, blocks, and lines subject to further processing, which is especially important when dealing with texts laid out in columns or scene text.
  • Character segmentation aims to isolate different characters that are linked by image artifacts or, on the opposite, connect parts of one character that were broken.
  • Character recognition uses neural networks and OCR algorithms, such as matrix matching or feature extraction, trained to match parts of the image with known characters, words, or phrases.
  • Post-processing includes correcting mistakes and improvement of the output accuracy using dictionaries, near-neighbor analysis, or other means to finalize the output of the OCR pipeline.

There is a wide range of proprietary and open-source optical character recognition engines that can be incorporated into the software and optimized to solve particular tasks. Depending on the type of input and information extraction requirements, a third-party OCR engine may require customization.

How to add OCR features to an app?

The good news is that you don’t have to develop an OCR engine from scratch if you want to add an optical character recognition system to your app. The bad news is that none of the existing open-source options would be a plug-and-play solution. Considering the market of OCR software, you have two main options to choose from:

  • Integrate with a paid third-party solution that specializes in your type of tasks via optical character recognition API and fine-tune it a little for your app needs.
  • Choose an open-source OCR engine or OCR software development kit, like Tesseract, and hire a developer who will build a custom solution for you based on reusable code packages.

The first option can potentially save you time, however may be too costly to use. The second option will require hiring an experienced software developer who can implement a free optical character recognition algorithm using C/C++, C#, Java, or Python.

We asked Amir Yousefi, Senior Python Developer at Proxify, to share his experience in implementing OCR features. Amir has built several apps that used OCR to scan and recognize payment card numbers, data from receipts, and smartphone IMEI numbers. He described one of the use cases in detail.

“I believe one of the best programming languages to use for OCR is Python. It has great libraries and packages, such as pytesseract. In addition, Python can easily integrate with other tools.” – Amir Yousefi, Senior PHP Developer

Amir’s client who owned an eCommerce company selling mobile devices was struggling with adding inventory to their warehouse management system (WMS). Every time they received a new batch of smartphones, it took them too much time to read IMEI numbers from boxes. Using barcode scanners didn’t make the process fast enough.

“One of my clients wanted to build a solution that would allow them to take pictures of multiple boxes, read data from the box labels, and add it to their WMS via API. It took me 20 hours to solve their problem.” – Amir Yousefi, Senior Python Developer

The project was completed in four stages and required building the following features:

  • API for uploading images to the OCR service (1h). The warehouse operator takes a picture of a twenty-box stack using their smartphone and uploads it in the app. Then through our API, we get the image and save it for the next steps.
  • Integrate OpenCV library for text localization (4h). OpenCV helps locate shipping labels on the picture that contains up to 20 boxes. Apart from that it allows doing some image preprocessing, such as resizing or sharpening to improve the accuracy of the subsequent text processing.
  • Set up OpenCV to enhance text characters (5h). To help the OCR engine process the characters, you should convert the text image to black and white and create its binary version. As a result, you’ll get all text characters in full black and white out the surrounding area of the image.
  • Implement Tesseract for character recognition (10h). Tesseract OCR engine supports over 100 languages and can easily cope with typed texts on product boxes. This case didn’t require the machine learning algorithm to get any additional training, which is why the character recognition functionality was developed very quickly.

“Tesseract OCR is one of the best libraries. Although it was written in C++, there are a lot of wrappers that allow using this library in projects developed in other programming languages. For example, you can use Python-tesseract and have all the Tesseract OCR features in your Python project.” – Amir Yousefi, Senior Python Developer

Plan on using OCR in your project?

To build an app with OCR features, you’ll need a C/C++/C# or Python developer who dealt with optical character recognition algorithms before. If you are looking for one right now, just send us your talent request. We’ll match you with the right candidate from our pool of vetted specialists. With Proxify.io you’ll be able to engage a senior developer in your project within the next two weeks at rates starting from 29€ / h.