Missing values analysis techniques in Data mining: Review

Mohammed Sharik U Zama et al.

Mohammed Sharik U Zama et al.

Abstract

Missing data is a prevalent problem in data ana- lytics. Researches and surveys often have missing data in their observations. Having missing data in the data set affects the quality of the data set dramatically. In real-world databases and data warehouses, the data is inaccurate, incomplete and inconsistent. There can be numerous reasons behind this such as human or computer errors in the data entry procedure, purposefully submitting incorrect answers, faulty measurements, and many more. Missing data can have several negative effects on the knowledge discovery process such as biased results, invalid conclusions, and so on. Analyzing the data becomes an arduous task when there are missing data in the dataset. The main reason being, data mining algorithms primarily perform well on dataset that is consistent and complete. Luckily, this problem can be solved with the help of several techniques that can be employed in the data preprocessing stage to handle missing data. The purpose of this research paper is to compare and classify methods to handle missing data. Results from this study are the comparison, classification and contrasting of methods to handle missing data along with the advantages and disadvantages of each method.