Extractive summarization approach for news articles based on selective features
The data is growing in an unprecedented volume. According to Forbes, 90 percent of the world’s data was generated in last 2 years. This data includes anything and everything like news articles, medical records, sales data etc. If we consider news articles alone the domain still remains vast including news from politics to sports, technology to arts and recreation and health to entertainment and reviews. It is so Big-data that the entire world’s population will take hundreds of years to complete reading it once. Thus we need summaries of the data which interests us as an individual. This applies to the news domain as well. People go for multiple articles on the same events from different sources, to gain insight of it completely. Summarizing all these articles would be saving a lot of time for its readers. We are going to use a feature based extractive text summarization approach to generate text summaries in the news article domain. A generalized set of features are commonly used to build a text summary model. But the efficiency of the model can be improved if features are selected based on the application (kind of data available). The model would be evaluated on BBC datasets using the ROUGE metric.