Authorship Identification for Tamil Poem Using K-Nearest Neighbor (KNN)

  • Pandian A and Ragavi R

Abstract

 AI - Author Identification is considered as a text mining application. It involves identification of unknown author pertaining to e-text (electronic text). The text can be either poem or any composition and AI helps to identify author of the same. To carry out author identification his writing style is taken into account. The text employed by the author while composing any poem is known as his writing style or his stylometry. This writing style can be analyzed by detecting the textual features used by the author while composing the text. Authorship analysis helps in identifying authors that being a concern for text data mining clustering and prediction. Concerning this, the research aims in offering standpoint of various studies in handling authorship analysis seeking significant research input. The KNN (K-Nearest Neighbor) technique is proposed for author identification pertaining to an unidentified text/poem. Neighbor samples are identified automatically by making use of clustering techniques (KNN). Once the training set is partitioned, there is deciding of labels of cluster centers. In case of new test sample, the class labels are determined by using class label of closest/adjacent cluster prototype. Following are the stages of the proposed mechanism: poem data collection, pre-processing, feature extraction, clustering, prediction and identification. Remarkable improvement is achieved in terms of time complexity and accuracy.

Published
2020-04-21
How to Cite
Pandian A and Ragavi R. (2020). Authorship Identification for Tamil Poem Using K-Nearest Neighbor (KNN). International Journal of Advanced Science and Technology, 29(8s), 759 - 764. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/10817