Comparison of SMOTE and Resample for Imbalanced Classification Missing Data using Random Forest for Database Migration Modeling

  • Sumitra Nuanmeesri, Shutchapol Chopvitayakun

Abstract

The goal of the study is to compare data recovery techniques for predicting the missing data caused by transferring large amounts of data from the old database system to the database on the new system, which is different working processing and data structure. The research result shows that the resample technique is more effective in improving the missing data than the synthetic minority oversampling technique. The resample technique is capable of improving a large amount of dataset from 3,278 to 32,780 records. When creating a model to predicting the missing data in database transfer applying random forest techniques. The results show that the efficiency of the model testing from the dataset with the 10-fold cross-validation method gave the accuracy of the SMOTE approach to the higher than Resample approach for all data ranges. The research findings reveal that the simulation model can be used as the prototype of the missing data-suggestion simulation model for the database transfer.

Published
2020-03-30
How to Cite
Sumitra Nuanmeesri, Shutchapol Chopvitayakun. (2020). Comparison of SMOTE and Resample for Imbalanced Classification Missing Data using Random Forest for Database Migration Modeling. International Journal of Advanced Science and Technology, 29(3), 13646 - 13660. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/31701
Section
Articles