A Fused Extractive Summarization Approach for Kannada text Documents
Text summarization is an application of natural language processing in the field of data mining, used to generate the summary of document. Summarization can be defined as a procedure which reduces the contents of the given input text into shorter structure that contains significant data which is useful for the user in different real-life applications. A lot of methods have been developed to summarize the English text documents yet only a small number of methods have been developed for Kannada text because of lack of resources and tools available for Kannada language. This paper discusses the fused technique for extractive text summarization which selects main sentences from the Kannada document. The proposed approach uses the combination of results of two methods to generate the final extractive summary of Kannada document. The two different methodologies used for Kannada text document extractive summarization are: (a) Summarization based on Clustering (b) Summarization based on the Latent Semantic Analysis. In the clustering approach, Term- Frequency/Inverse Sentence Frequency (TF/ISF) is used to compute the sentence score first and then sentences are grouped by means of clustering algorithm called K-means to produce the extractive summary. In the second approach, latent semantic analysis based on singular value decomposition is used to generate the summary of the document. The consequences of the proposed model are assessed utilizing ROUGE toolkit to measure the performance dependent on three evaluation metrics – Precision, Recall and F-score. The experimentation is performed on the custom-built dataset containing fifty text documents (Kannada) to generate the extractive summaries. The extractive summaries produced by the system are acceptable when compared to reference summaries.