Predicting Possible Fraud in India using Machine Learning: An Empirical Comparison between Model for Better Prediction
Abstract
Fraud is a global phenomenon and no continent or sector of the economy left apart from the effect of fraud and scandals. This paper explores the application of Machine learning algorithm particularly Gradient Boosting technique which belongs to ensemble family. In this study researcher has implemented a Gradient boosting algorithm in Python 3.7. To carry out the experiment, researcher has collected the financial data of 85 Fraudulent Financial statements (FFS) manufacturing firms and 85 Non-Fraudulent Financial Statement (Non-FFS) manufacturing firms which are listed in Bombay Stock Exchange. Data were pre-processed to handle missing values, to encode categorical label and for feature scaling. Model was trained by training data set and was tested against testing data set. Performance was evaluated with several performance measures like confusion matrix, accuracy, error rate, precision, recall, F1-score and AUROC. To boost the accuracy of the model, importance of feature was measured by backward elimination wrapper method of feature reduction. The most important features were collected and used for making the Gradient Boosting Learner again. It was found that the model with only important features gave the good results as compared to the former model.