Tuning Hadoop Parameters for Heterogeneous Multi-node Cluster

  • Gurwinder Singh, Dr. Anil Sharma


Hadoop, an open source implementation of MapReduce, turns out to be de-facto platform which is appropriate for storage of data in distributed as well as local machines to analyze and process huge amount of information on commodity hardware. It provides a wide range of parameters with default and common configuration settings for single-node as well as multi-node clusters and applications. If allows the user to alter the configuration according to requirements via modifying xml files. Tuning parameters of a Hadoop is a challenging task as to execute even a simple program requires the alteration of different parameters. Therefore, optimum parameters tuning can improve Data Locality, amount of data processed as well as enhances the utilization of Network, Processor and input/output. This paper attempts to throw a light on the literature associated with customization of parameters for better tuning and optimal utilization of resources by proposing a framework to suggest and modify the parameters to enhance Hadoop performance in heterogeneous multi-node cluster.