Sentimental Analysis Using Various Analytical tools from Hadoop Eco System

Dr. Amanpreet Kaur, Dr. Gurpreet Singh, Dr. Rohan Gupta

Dr. Amanpreet Kaur, Dr. Gurpreet Singh, Dr. Rohan Gupta

Abstract

Apache Spark engine is used for processing large scale data in parallel by making use of hundreds and thousands of compute engines. Such a technique allows expanding the capability of processor on compute engines. Spark has the ability to take over several data processing jobs comprising of complicated analytics involving data, streaming and graphics. This may comprise massive amount of data which could be of size of Terabytes, Zettabytes or even farther high. Data Frames operations of Spark can programmatically be compiled by using API like Scala or Java. Hadoop ecosystem contains various tools like Pig, HIVE, Map Reduce, Apache Spark. Hive is query language similar to SQL and works on large data sets. Pig is also a scripting language for exploring the huge data frameworks. There is another programming model called as MapReduce which can be coupled with HDFS to process and manage unstructured data of big data using parallel and distributed algorithms. This paper has been written to compare the various analytical tools for sentiment Analysis using Spark, Pig, MapReduce and HIVE on large data sets.