Identification of Minimal Email Header Features for Discovery of Email Attachments

  • Priti Kulkarni , Jatinderkumar R. Saini

Abstract

Email has become an important part of the day to day business as well as personal communication.  Many of the emails contain attachments and many of these are spam emails containing viruses or links embedded by the mail senders with malicious intention.  It is therefore imperative to deploy a monitoring mechanism to curb the menace of non-ham emails in the business as well as personal communication, specifically for emails containing attachments.  The present paper intends to do so with identification of a minimal number of email header features.  The results of present work could be used by the organisations to enhance their control mechanisms with minimum load on the servers.  Technically, we used Chi-square (CHI2), Information Gain (IG) and Correlation based feature selection technique (Cbfs) for selection of minimal features from a set of 31 email header features.  The experimental results show that only 3 features are sufficient for the task.  In order to prove the robustness of results, we used Naïve Bayes, K-Nearest Neighbour (Lazy.IBK) and Decision Tree (J48)

Keywords: Email, Email attachment, Email header, Spam email

Published
2020-05-20
How to Cite
Priti Kulkarni , Jatinderkumar R. Saini. (2020). Identification of Minimal Email Header Features for Discovery of Email Attachments. International Journal of Advanced Science and Technology, 29(06), 6242 - 6247. Retrieved from https://sersc.org/journals/index.php/IJAST/article/view/19909