Multiobjective Optimization Based Adaptive Genetic Algorithm K-Means Clustering in Big Data: A Case Study of Suicide Rate Analysis
Abstract
In this work, a multi-objective optimization (MOO) based adaptive genetic algorithm (AGA) K-means clustering model is developed. The model is applied for Hadoop MapReduce computation to perform clustering over suicide rate analysis. Considering data heterogeneity attribute, mapping was performed to map each element of the data into readable numerical values. It enabled swift computation over large scale datasets. Because of unbalanced data, principle component analysis (PCA) followed by genetic algorithm (GA) was used for feature selection that enabled computationally efficient clustering. Being a MOO solution, the proposed AGA-K-Means model applied Silhouette coefficient value as objective function to achieve both optimal centroid(s) as well as the best number of clusters. It achieved higher accuracy as well as computational efficiency over classical K-Means MapReduce approaches. Simulation results revealed that the proposed model achieved optimal performance in terms of accuracy (92.93%), precision (92.81%), recall (96.70%) and F-Measure (91.17%) with PCA and GA selected features. Additionally, our proposed MOO-AGA-K-Means clustering model exhibited better average time-efficiency (148s), which was significantly lower than GA-K-Means (159s) and classical K-Means algorithm (389.7s). The overall performance affirms suitability of the proposed MOO-AGA-K-Means clustering model for BigData analytics.