Identification Of Malicious Websites With HTML And URL Based Features Using Machine Learning

Shaik Irfan Babu, Dr.M.V.P.Chandra Sekhara Rao

Shaik Irfan Babu, Dr.M.V.P.Chandra Sekhara Rao

Abstract

In recent years, with the rapid development of the Internet and the continuous growth of network services, the threat to user's privacy and security has increased. This is mainly because of malicious web pages. Malicious webpage detection technology as a core security technology to resist network attacks, can help users effectively avoid security threats caused by malicious webpage’s and ensure network security. This paper aims to assess and identify malicious websites by building a malicious site identification model with the help of machine learning algorithms. The present work used the URL and HTML based features to identify malicious websites. It is found that both URL and HTML based features are effective in analyzing and classifying malicious URLs. Most of the samples in this study were taken from PhishTank and Alexa. Further, it is seen that there is huge improvement in classification precision using proposed approach and SVM (Support Vector Machine) ends up being the best classifier offering the accuracy of 91.8% with FPR and FNR as 0.90 and 0.82 respectively.