MACHINE LEARNING BASED SENTIMENT CLASSIFICATION USING WORD EMBEDDING APPROACH
Sentiment analysis system for text analysis combines natural language processing (NLP) and machine learning techniques to assign weighted sentiment scores to entities, topics, themes and categories within a sentence or phrase. The result of this analysis can be used in computing customer satisfaction metrics, marketing, contextual, advertising, suggestion system based on the user likes and rating, recommendation systems etc. The huge textual data used to extract sentiments are unstructured in nature. Hence the data is transformed into document term matrix using Binary, Term Frequency, Term Frequency-Inverse Document Frequency, Word Embedding scores. Preprocessing tasks like tokenization, case normalization and punctuation removal are performed to prepare the data for further analysis. Stop words are removed as they do not convey any sentiment. The word vector of each individual word after preprocessing is initially identified. These word vectors are then used for the document term matrix construction which is given as input to the classifiers. Machine learning algorithms like decision tree and Support Vector Machine (SVM) are used to classify the human sentiments present in the reviews as positive and negative. The accuracy of different methods are critically examined with the help of accuracy as the performance metric.