OCR – AI Image Recognition

What is an OCR?

OCR stands for Optical Character Recognition

This a program that takes an image of text from a physical document and converts it into text that is machine-readable.

Some OCR’s will export text while others can allow editing of text in image directly. [Callaghan 2021]

How OCR Works

Scans whole document

In this stage, the image is refined. Edges of letters and characters and smoothed, and the software removes imperfections.

Binarization

This process converts colors on the image to black and white only. It helps the software decipher between the text and the background.

Identify characters

This compares the pixels of each scanned character to an existing database to identify the closest match.

Ensure Accuracy

It reduces errors by using internal dictionaries.

Produce a digital text file

Brief History

Emanuel Goldberg is one of the most important figures in the history of OCR technology. He developed a machine that could read and convert characters into telegraph code during WWI.

Around the same time, Goldberg invented his machine, Edmund Fournier d’Albe invented the optophone which produced sounds that corresponded to letters or characters on a page. This device aided the visually impaired.

In 1974, Ray Kurzweil founded Kurzweil Computer Products Inc. which commercialized the use of Omni-font OCR. This technology could read text in different fonts.

Implementations of OCR

Foreign language translation
- Ex: Google Translate
  - Convert an image with text in one language to another language
Assistance for the blind
- Scan printed text and have it spoken in synthetic speech
Healthcare
- Adoption of the electronic healthcare record
- Digitize insurance forms, ID cards, doctor’s notes, etc.