Comparative Study of Resampling Techniques of Imbalanced Dataset

  • Chitra Mehra, Dr. Rashmi Agrawal


To extract the meaningful and important information from the data that are implied, not knowing earlier, and potentially useful, we use data mining. The versatile field of Data Mining (DM) begins with the combination of machine learning and statistics. With the help of Data Mining, information contained in a database can be analyzed as well as understood. Predictive and Descriptive Data Mining and Predictive Data Mining are two major categories of Data Mining.  A predictive and supervised learning technique used in Data Mining is classification. Class labels of different categories can be predicted with the help of classification algorithms. The problem of imbalance classification and solutions for handling the same will be discussed in this study. When one of the two classes in the given dataset is not in equal proportion i.e. the number of samples of one class is very less in contrast to the other class. In such kind of situations, the results of prediction are always influenced by majority class, but not the minority class. As a result, model is not advisable for predicting the result. An open source machine learning software weka is used to assess various classification algorithms. Algorithms used in this study are Decision Tree(J48), Naïve Bayes and Multilayer Perceptron. Here in this paper, Pima Indian Diabetes Dataset which is an imbalance dataset is used for the evaluation purpose. To handle class imbalance problem, we used sampling based approach. These all algorithms evaluated for each of minority class on the basis of evaluation metrics such as Precision, Recall, F1-Scoreand Receiver Operating characteristics curve (ROC). Under WEKA, we can use sampling filters in preprocessing tab to make an artificial balance for an imbalanced dataset.

How to Cite
Chitra Mehra, Dr. Rashmi Agrawal. (2020). Comparative Study of Resampling Techniques of Imbalanced Dataset. International Journal of Advanced Science and Technology, 29(3), 12699 - 12710. Retrieved from