News Categorization of NYT Articles using POS Tagging and TF-IDF Approach

  • Sonam , Dr. Madhavi Devaraj

Abstract

As much as there is a need for more data in the quest of seeking intuitions in the advancements of our world today, there is also a need to structurize the data so as to provide an instinctive and a more reliable source to refer upon. Various domains have different means in organizing data, may it be in the field of automotive, transport, academe, and the like. The proposed study focuses on organizing data in the field of media, specifically the news. The NYT news articles are classified into Sports, National, and Business groups. Categorization is done using preprocessing of news headlines, followed by filtering through POS combinations and TF-IDF feature weighting. SVM, Naïve Bayes, and Logistic regression classifiers are utilized for categorization. The results imply that SVM outperforms other classifiers. In addition, the combination of nouns and adjectives prove to contain the most amount of information content needed for appropriate classification of news headlines.

Published
2020-06-01
How to Cite
Sonam , Dr. Madhavi Devaraj. (2020). News Categorization of NYT Articles using POS Tagging and TF-IDF Approach. International Journal of Advanced Science and Technology, 29(7), 4938-4945. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/23543
Section
Articles