Multi-Label Text Classification of News Article

  • V.Srividhya, P.Megala


The paper entitled “Multi-Label Text Classification of News Articles”. The significant goal of current paper is classify the text of News articles using machine learning algorithms - Logistic Regression, Random Forest, XG Boost and Naive Bayes algorithms and compare them to discover the most suitable approach. Multi-Label Text Classification is a classification task which consists of more than two groups; each label are mutually exclusive. The use of Text classification models is to classify text into sort out as groups. Text classification is also known as text tagging or text categorization. In general, an automatic document classification algorithm provides a predetermined label to the text documents (test dataset) on the basis of classifier model progressed using the supervised machine learning algorithm. This work focuses on Logistic Regression, Random Forest, XG Boost and Naive Bayes algorithms which are part of supervised machine learning algorithms in automatic classification of text documents. This work is consists with the five stages. The initial part is loading the data sets and explores them. The second phase is Pre- processing. The third phase is Vector Space modeling which is used to represent documents in the format as vectors of identifiers and model fitting is fourth part with four machine learning algorithms. The fifth phase is Performance measure that describes the accuracy algorithms to find which algorithms works best. To carry out this work, BBC news article dataset is collected form Insight Resources.