Recognition of English Handwriting and Typed from Images using Tesseract on Android Platform

Shubhendu Banerjee, Sumit Kumar Singh, Atanu Das, Rajib Bag

Shubhendu Banerjee, Sumit Kumar Singh, Atanu Das, Rajib Bag

Abstract

With the advent of digitalization in most spheres of human pursuit, conversion of digitized text from images has gained considerable momentum over the years, despite the fact that the concept of image character recognition essentially dates back to the period before the invention of the computer. This paper is an endeavor to put forth the experimental workflow of recognizing text from image using Google’s open source Optical Character Recognition (OCR) Engine Tesseract. Here, Tesseract has been trained in a manner so as to recognize handwritten and typed texts in English script and produce outputs with various levels of observed accuracy. The method is supported by an added characteristic of externally induced image quality augmentation prior to text extraction. The paper is predominantly aimed at building a resourceful Android application that would enable the user to digitize text from images even on the small screen. The research analysis asserts a precision level up to 93% in case of handwritten text and 98% for typed characters, which is an attempt towards advancement over existing methods. Apart from image to text conversion, it also includes the text to speech translation feature, which renders its significance among the visually impaired mass.