Creation of Bag of words using Map Reduce technique for Ham Email Classification
Email communication has become integral part of business communication. Educational organization is also started using emails. In an educational organization, emails are sent to students, parents, faculty, and staff and to communicate with external people also. These emails are used to send notices, alerts or information on various administrative fronts. The inbox of users are getting cluttered with the number of emails receiving in the inbox. Hence, it is necessary to classify email into various categories such as email related to Admission, Finance, Examination and Research. The classification of email is based on the body of text. The most significant words (features) of each category play an important role in email classification. In this paper, we describe the novel approach of feature selection to build a bag of words using the map-reduce technique. Further, these bags of words are used for email classification.