Improved Handwritten Offline Urdu Characters Recognition System using Machine Learning Techniques
Purpose: Character Recognition has been one of the most striking features of the machine learning, various researches have been carried out on different languages and varied accuracy has been achieved. Urdu language is one of the important languages of South Asia with 250 million native speakers. Given its complex writing pattern researchers had taken less interest in handwritten Urdu text recognition. This paper presents a model for off-line Urdu characters recognition using machine learning techniques. Methods: In this research we used Edge Histogram Descriptor, ColourLayout and Binary Pyramid for feature extraction of off-line handwritten Urdu characters. About 80 features of the Handwritten Urdu character images were generated using Edge Histogram Descriptor, the best among the three. These features were used as input to various machine learning classifiers viz Multi-Layer Perceptron, SVM, SMO and SimpleLogistic for recognition purposes. We divided the dataset into three categories multi-stroke, single-stroke and digits. Results: After carrying out experimentations on different categories using the identified classifiers results are indicative of SVM in combination with Edge Histogram Descriptor as the best model for Offline Urdu character recognition with an accuracy rate of 98.60%. Conclusion: The current research tested four machine learning classifiers with three different feature extraction techniques for improving the accuracy rate for Off-line Handwritten Urdu characters. The results achieved in this work are an improvement over the work that had been reported in literature.