 Character Segmentation and Identification Methods for Japanese Document Images Using K-Nearest Neighbor Method

K.Sheikdavood et al.

K.Sheikdavood et al.

Abstract

Now-a-days there are many languages exist in our country. Scope of the Japanese language learning increasing in our country. In general the scripts are affected by their arrangement, style, low print feature and intermixed content like device printed and manuscript. In order to overcome these drawbacks we are using character segmentation and character identification algorithm. Initially the character segmentation algorithm will choose the segmentation line by structural property. Maximum curvature method is used to separate the merged character in a document. Then SVM classifier is used in last step to segment the image. Next the character identification algorithm is used. In this algorithm the geometrical features are calculated. Based on the center pixel character the first and second features are formed. The nearest pixel around the center pixel will help us to calculate the third feature. The character identification algorithm will use the K-NN classifier. The SVM and K-NN classifier is more accurate in segmentation process when compared with other segmentation techniques. The accuracy of this classifier will be 99% when compared with other classifiers.