Real Time News Classification Using Machine Learning
With the existence of a number of sources on the internet generating immense amount of daily news, there is a necessity to classify the news articles to make the information available to users quickly and effectively. So the task of news classification starts by collecting real time news articles from news websites through web scraping and then automatically classifying it using various classification algorithms. Thus news classification is a way to identify topics of untracked news as well as make Individual suggestions based on the user’s prior interest. This paper discusses various steps of news classification and implements a few algorithmic approaches including Naïve Bayes, Logistic Regression, SVM, Decision tree and Random forest for automatic classification of news articles into topics using the BBC News dataset that contains articles belonging to five different categories (Business, Entertainment, Politics, Sport, Technology). The paper reviews the results of different classification algorithms and compares them as per various performance measures. It also shows the performance of our news classifier on real time news articles crawled from news websites.