Missing Data Imputation by Principal Component Analysis (PCA) and Fuzzy C Means (FCM)

Vishal Goyal et al.

Vishal Goyal et al.

Abstract

From medical dataset, hidden knowledge’s are extracted by computing missing attributes. In breast cancer’s medical dataset, it is an interesting as well as difficult process. Missing values are computed and imputed by various Data mining researchers. For fixation of missing values, a novel imputation method with k-means clustering is proposed in previous works.In that, dimensionality reduction of the records becomes major important issue. Since the dimensionality of the features have increased time complexity and thus results in reduced accuracy. To solve this issue in this work, dimensionality reduction is performed by using principal component analysis (PCA) in classification of the medical records in breast cancer dataset. In the proposed work consists of three major steps: dimensionality reduction, clustering and classification. Initially PCA is proposed then secondly clustering is performed by using Fuzzy C means (FCM). Finally classification is performed by using K Nearest Neighbour (KNN) classifier to impute missing data. Those results of KNN and traditional classifier are measured using the metrics like accuracy, F-measure, Recall and Precision. The results of methods are experimented and implemented via the use of Wisconsin Breast Cancer Dataset which is collected from University of California, Irvine (UCI) dataset.