Factors Influencing the Efficiency of Big Data Management on Hadoop Framework: An Implementation Analysis

Jahangir Kamal, Dr. Meenu Dave

Jahangir Kamal, Dr. Meenu Dave

Abstract

Innovation of advanced data generating technologies in the world has resulted in production of huge volumes of data with high variety and velocity popularly known as Big Data. For efficient management, processing, and analysis of this huge amount of data, an open source framework has been developed by Apache known as hadoop framework. Hadoop mainly deals with the reliable management of large volumes of data through HDFS (Hadoop Distributed File System) on cluster of nodes. MapReduce programming framework is one of the methods used by hadoop framework for processing Big Data distributed over a cluster of nodes. Hadoop framework being an open-source project encourages the addition of some new features or changes for the sake of efficient Big Data management and processing. This research paper focuses on the various parameters related to hadoop and MapReduce, which have great impact irrespective of any type of job and dataset and thus giving an opportunity to any processing job and cluster configuration to enhance the performance. The main aim of this study is to increase the performance of hadoop clusters in different scenarios of handling Big Data. In this paper, firstly a summary of problem statement has been provided followed by importance and relevance of this study. Then the architecture of hadoop framework has been explained and finally experimentation methodology and the result analysis have been explained.