Big Data Classification Using Hybrid Parallel Lfr-Cm at Cloud Environment

Bhallamudi Ravikrishna, Dr. Harsh Pratap Singh

Bhallamudi Ravikrishna, Dr. Harsh Pratap Singh

Abstract

With the scalable nature of data, big data applications are processed along with MapReduce programming model. A MapReduce based framework is designed for integrating multiple partial solutions aiming at improving classification accuracy. While transferring the information, processing time is improved by developing Random Forest Classifier. However, the nature of data is imbalanced where learning process gets complicated and it results in class imbalance problem. The application of rule-based models in such datasets is not straightforward. Hence a new class of linguistic model that addresses huge storage and processing capacity of cloud environments is required. A framework named as parallel LFR-CM is proposed for big data classification and information distribution in cloud environment. The parallel processing mechanism ensures minimum runtime by using linguistic fuzzy rules based on the MapReduce parallel programming model. Then canopy shuffle algorithm is applied to the resultant linguistic fuzzy rules to train a different sample set that accelerates classification accuracy. Similarly, the convergence rate of canopy fuzzy MapReduce algorithm is accelerated. Finally, a hybrid classification model is developed to improve the classification time and classification accuracy in parallel manner based on fuzzy knowledge base and canopy fuzzy MapReduce algorithm.