Tamil Text Document Categorization Based On Singular Value Decomposition With Multiclass Support Vector Machine
In this paper we present a novel method for Tamil language text categorizer by using Multiclass Support Vector Machine (MSVM) with Bag of Word and possibly Singular value Decomposition (SVD) features to get a lemma solution. The proposed method consists of six stages. In initial stage we perform tokenization; then we find stopping words in second stage followed by lemmatization in third stage; then bag of words are implemented in the fourth stage; then feature extraction is carried out in fifth stage; finally, classification is taken place in sixth stage. The entire process is implemented by using 1800 articles, that is taken from four different tamil newspapers based on five different categories. The proposed method obtained 94% of accuracy outperforms when compare with state of art Tamil lemmatizers.