News Article Summarization: Analysis and Experiments on Basic Extractive Algorithms

Tameem Ahmad, Sayyed Usman Ahmed, Nesar Ahmad, Areeba Aziz, Lakshita Mukul

Tameem Ahmad, Sayyed Usman Ahmed, Nesar Ahmad, Areeba Aziz, Lakshita Mukul

Abstract

A web is an information system that stores an enormous number of documents and other online resources. We generally access these documents and resources by Uniform Resource Locators (URLs) over the Internet. There was a time when people used to wait for newspapers in order to catch the previous day’s happenings but thanks to the internet, all the latest information is now available with a click of a button. However, the information is much larger than one can manage quickly and efficiently. Also today everyone wants to gain more information in less time. Instead of reading large documents and then getting the insight, it is better to read a summary that gives the core information about the topic and helps in gaining more information in less time. In this paper we have implemented three techniques for generating the extractive summary of news articles of the two benchmark datasets- CNN-corpus and the BBC (these datasets have both the article and its summary) on different retention rates in order to know what would be the best retention rate for generating the summary which would contain most of the information of the original text without much affecting the connectivity among the sentences since, readability and connectivity are the two prime factors because of which most of the people still rely on man written summaries.