DOCUMENT CLASSIFICATION USING MACHINE LEARNING

Shipra Trivedi, Komal Malawat, Namrata Yerawar, Shivani Chaudhary, Asmita Kamble

Shipra Trivedi, Komal Malawat, Namrata Yerawar, Shivani Chaudhary, Asmita Kamble

Abstract

Document classification is a problem in information and computer science. It is basically the process of categorizing documents in certain categories correctly. It is considered as one of the key techniques used for organizing the data by automatically assigning a set of documents into predefined categories based on their content. Recent advances in computer and technology resulted into ever increasing set of documents. The need is to classify the set of documents according to the type. So, the classification is widely used to classify the text into different classes. This paper proposes a document classification system to identify the domain of the document. This classification is going to be performed by using Naive-Bayes approach which is one of the machine learning algorithm. It consists of a set of phases and each phase can be accomplished using various techniques. Selecting the proper technique that should be used in each phase affects the efficiency of the text classification performance.