HADOOP BASED PARADIGM USING HYBRID FEATURE SELECTION APPROACH TO ENHANCE PERFORMANCE IN SUPERVISED LEARNING CLASSIFICATION
Many disciplines involves with big datasets and in curse of high dimensionality. Feature selection methods focused on noise, redundant elimination which may affect the accuracy of the classifier. This research work presents hybrid feature selection approach that uses Map Reduce paradigm to obtain subset of features from big data sets. The algorithm split the original data set in to block of instances to apply hybrid feature sub set selection approach then the reduce phase merges the obtained results into a final vector of feature weights. The proposed hybrid approach uses the filter method as Modified conditional mutual information maximization and wrapper method as genetic algorithm with novel fitness function to enhance the overall classification performance and speed up the search process to identify the essential features. After tenfold cross fold validation, data set separated into training and testing subset. Then supervised learning classification algorithms such as SVM, KNN, and Decision Tree are applied. The experiment uses five bench mark data sets which is high dimensional in its feature. From the experimental result, the hadoop based paradigm with proposed hybrid feature selection approach improving the accuracy of the classification and its efficiency through run time.