Extraction of Bag of Words from Cyber Crime Legal Documents Using Python
Abstract
In this paper proposes a frame work to retrieve Bag-of-Words or keywords from legal documents. For these analysis considers 2500 cyber crime judgment documents of Supreme Court of India. These are available to download in the form of Portable Document Format. From these documents case note texts are considered to retrieve the Bag-of-Words. A python programming is used to develop algorithm as well as presented and implemented for this purpose. Different preprocessing steps for proposed using python Count and Hashing Vectorization, TF-IDF, Lemmatization, Stemming, Parsing. A threshold value has been used to finding word frequencies. This Bag-of-Words may be utilizes to improve search engine of judgments on Data Mining applications as Knowledge Discovery.