A Study of Big Data Analytics with Two Fatal Diseases Using Apache Spark Framework
Abstract
Nowadays, data is increasing at a very fast rate. Everyday new data is generating from various sources like facebook , instagram , twitter , e-commerce sites etc and many more which are uncountable. To handle such amount of data cannot be done by a traditional database system so we have come up with the new technologies for eg. Hadoop, Spark, Cassandra etc. Here we would discuss about spark. Data is generating in different formats like structured, unstructured and semi-structured. As mapreduce was not interactive and also was not suitable for iterative jobs so the spark has come up for solving such problems. In March 2010 Spark has become an open source and in 2013 it has came under Apache. In this research paper we have taken a dataset of TB and AIDS/HIV for analysis purpose where we would evaluate the cases of AIDS and TB.
Keywords: AIDS/HIV, Big Data, Spark, Tuberculosis.