Optimizing MapReduce Model for Big Data Analytics Using Subtractive Clustering Algorithm

  • Nabeel Zanoon, Khalid Alkharabsheh, Mohammad Hashem Ryalat

Abstract

Big data is an important concept for researchers, academics and industries in software engineering as well as other domains. Analyzing and processing big data demand a lot of efforts, tools, and equipment's. Hadoop framework software uses the MapReduce model to perform large scale data analysis through parallel processing for obtaining the outputs as quickly as possible. The whole process goes through three phases namely, mapping, shuffling, and reducing. Transferring data from the mapping phase to reducing through shuffling takes a long time, which is an indicator of the MapReduce performance. In this paper, the subtractive clustering algorithm is used to improve the MapReduce performance in two directions: Perform a number of procedures that will reduce the amount of transferred data between phases and shorten the recurring time periods through a simultaneous mechanism. The results evaluated by Wilcoxon Rank-sum test, that showed a better performance is obtained by the subtractive clustering algorithm. Also, the amount of transferred data and the execution time between the MapReduce phases has been reduced.

Published
2020-06-06
How to Cite
Nabeel Zanoon, Khalid Alkharabsheh, Mohammad Hashem Ryalat. (2020). Optimizing MapReduce Model for Big Data Analytics Using Subtractive Clustering Algorithm. International Journal of Advanced Science and Technology, 29(04), 4106 - 4119. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/24789