Spark Framework for Streaming and Generating Predictive Business Intelligence

  • Madadi Vijayakamal, D. Vasumati


Apache Spark is one of the stream processing frameworks that can be associated with cloud computing. Real time streaming data is processed with machine learning and natural language processing. Apache Spark is used to explore process mining as well. Process mining is for discovering business processes and diagnose the difference between real processes and processes discovered from event logs. This kind of business intelligence can help improve business processes in real world applications. In presence of very huge number of business processes, usage Spark provides scalability and performance in real time. The data given as input is divided into batches with a time window for processing. In the process, the framework discovers business intelligence as per the algorithms defined. An empirical study made with Spark and its performance is compared with that of Apache Flink. The empirical results revealed that the performance of Apache Spark is better than that of Flink.