Extraction of Bag of Words from Cyber Crime Legal Documents Using Python

  • Radha Mothukuri, Dr B Basaveswara Rao, Suneetha Bulla

Abstract

In this paper proposes a frame work to retrieve Bag-of-Words or keywords from legal documents. For these analysis considers 2500 cyber crime judgment documents of Supreme Court of India. These are available to download in the form of Portable Document Format. From these documents case note texts are considered to retrieve the Bag-of-Words. A python programming is used to develop algorithm as well as presented and implemented for this purpose. Different preprocessing steps for proposed using python Count and Hashing Vectorization, TF-IDF, Lemmatization, Stemming, Parsing. A threshold value has been used to finding word frequencies. This Bag-of-Words may be utilizes to improve search engine of judgments on Data Mining applications as Knowledge Discovery.

Published
2020-05-28
How to Cite
Radha Mothukuri, Dr B Basaveswara Rao, Suneetha Bulla. (2020). Extraction of Bag of Words from Cyber Crime Legal Documents Using Python. International Journal of Advanced Science and Technology, 29(05), 9303-9313. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/19025