Analysis of Credit Card Dataset for Predicting Bankruptcy using Apache Spark
Abstract
Fraud detection is gaining attention now days due to increasing the new types of attacks to steal the data in banking domain. Now a day's fraud detection is a critical problem affecting the large E-commerce application transactions are performed using credit cards or net banking. Day by day data is increasing drastically from Tera bytes to Peta bytes in banking sector due to growth in the credit card transactions. In Finance sector, fraud detection is growing complex problem and many researchers proposed many techniques to detect and solve the unlawful activities. Data Mining is the suitable and trusted approach to apply on financial databases to analyze and perform detection approaches on large volume of finance data. Fraud in finance industry is growing and dangerous problem with various techniques. Many researchers proposed various approaches to identify and resolve frauds in finance industry. Data mining can be used in finance industry to analyze large and complex volume of data. Various data mining techniques can be applied on credit card online transaction data to determine fraud detection. It is not an easy task to apply data mining techniques due to its highly sensitive data and behavior will changing frequently with fraudulent. We implemented various machine learning techniques to detect frauds in the credit card data to provide customer satisfaction results. The performance of the techniques is evaluated based on accuracy, sensitivity, and specificity, precision. The proposed model is implemented and tested in the Hadoop environment using open source Hadoop eco-system tools to read the data and analyze and provide fraud detection results