Phishing Website URL Detection using Machine Learning

  • Dipayan Sinha, Dipayan Sinha, Prof. Anitha Sandeep

Abstract

Phishing is a breach of information security through which attackers can gain access to sensitive user
credentials by using counterfeit websites closely resembling legitimate websites. Phishing attacks are
the most common form of cyber-attacks achieved by cleverly disguising website URLs to trick
credulous users. With increasing number of new phishing attacks, the use of machine learning
algorithms to classify websites as phishing and legitimate has been proposed in this paper. The dataset
for this study comprises of 96,018 URLs comprising of both phishing and legitimate websites. The
URLs have been parsed using Pandas and Urllib in order to extract useful features that could help in
phishing detection. Different ML algorithms such as Random Forest, Decision Tree, Adaboost, Fuzzy
Pattern Trees etc. have been implemented on the data and a comparison is drawn between them.
Random Forest Algorithm proved to be the most accurate algorithm with 95.82% accuracy.

Published
2020-05-20
How to Cite
Dipayan Sinha, Dipayan Sinha, Prof. Anitha Sandeep. (2020). Phishing Website URL Detection using Machine Learning. International Journal of Advanced Science and Technology, 29(7), 2495-2504. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/18011
Section
Articles