Multi-Document Abstractive Text Summarization System by incorporating Various Linguistic Features
Abstract
Summarization aims to represent source documents by a shortened passage. Since existing multi-document summarization takes its input from more than one document which speaks about the same context the generated summary will have overlapping of information content stored in various documents. Hence the summary generated contains redundant information. The proposed framework employs POS tagging to obtain syntactic arguments of the sentence and uses WuPalmer semantic similarity measures to calculate SSS among sentences. Based on the score threshold value is defined and the overlapping data obtained on combining the multi-document sentences are eliminated. With the key sentences left summary sentences are generated.
Keywords: Multi-Document Set (MDS), Parts Of Speech (POS), Least Common Subsumer (LCS), Semantic Similarity Score (SSS), WuPalmer semantic similarity measures.