Big Data Stream Processing: Latency and Throughput
In recent years, continuously arriving big data streams needs to be processed and respond instantaneously. In several crucial applications, it is expected to investigate and evaluate such streaming data in real time. One of the basic task of any streaming application is processing arriving data from scattered sources and generate an output promptly. The key deliberations for that desired task is: Latency and Throughput. Hence Dealing with stream imperfections such as late data, lost data and out of order data, becomes a significant research in big data stream processing. We have performed experiments for prediction on the stock market data, along with considering the price of US dollar, oil and gold as essential dependent parameters. Since the source of these dependent parameters are distributed, delay in any parameter introduce different types of latency and hence lower down the throughput of stream processing system. In this paper, we have presented the way to deal with latency and throughput with the use of appropriate pipeline and watermark in big data stream processing.