Resource Aware Execution of Speculated Tasks in Hadoop with SDN
Abstract
Software defined networking (SDN) is a new approach to network paradigm that sets apart data and control plane and is aimed to address the network requirements where data is required to be transferred to and fro massively. Owing to the generation of substantial-sized data, Hadoop has become a defacto standard to handle it. Hadoop has a computation engine termed MapReduce to process this data. One important issue in MapReduce is how to identify and address the performance deterioration in slow running tasks. This deterioration is handled automatically by scheduling slow task on another node which has an empty slot. However, this might not improve performance as backed up tasks are launched on nodes anonymously without knowing their computational details. In this paper we discuss an approach on handling speculated tasks by scheduling it on nodes that have performed better by profiling the set of nodes in the cluster and improving the network resources by prioritizing the corresponding set of flow entries of nodes with the help of SDN. Further care is taken to schedule tasks which have data skew. Experiments conducted in the paper demonstrate that there is an improvement of about 10-15 % in the completion time of the job.