Ensemble Based Classification for Class Imbalanced Credit Card Fraudulent Data

S.Priya, Siddharth Agarwal, Pankaj Thakur, Annie Uthra

S.Priya, Siddharth Agarwal, Pankaj Thakur, Annie Uthra

Abstract

Knowledge extraction from imbalanced data has been receiving increasing interest in recent years. Most of the real world problems including credit card transaction frauds, disease prediction and so on have huge data instances but the number of positive instances in those datasets is far lesser than the number of negative instances. Suppose 99% of the data tuples come from the majority class but only 1% of the instances are from the minority class, and our classifier classifies each data tuple as majority instance, then the accuracy of our model will of course be 99% but this classifier is of no use as it doesn’t detect any minority instance which is the main purpose of the classifier. This is why the data needs to be balanced first in order to train our classifiers. The balanced data can now be used to train our classification model. There are so many classification algorithms present, and each of them has their own pros and cons. So, in order to achieve higher accuracy and better results various individual classification algorithms including Naive Bayes, Decision Trees and Random Forests are used. After getting the results of the above mentioned classifiers, ensemble models have been applied that includes Voting Classifier, XGBoost and ADABoost. It is seen that the efficiency of ensemble classifiers in data classification is much higher than that of individual algorithms.

Keywords: Class Imbalance, Ensemble Classifier, Sampling, Machine Learning.