Model for the Multi-Source Data Analysis of Big Data Set Tools and Methods for a Customizable Healthcare System
Abstract
The challenge of "big data" is to change how we collect, store, analyze, and learn from the data. How we "mined" data from possible multiple sources effectively and persuasively and obtained useful information is a critical question. Increased research has been focused on the mining of medical data with the aim of improving the quality of care. The human body is complex and the data collected in its treatment are also involved. Data noise, frequently introduced through the collection, makes the construction of models for data mining a problematic task. The objective of this survey is to study the Big Data domain, to provide an overview of free biomedical databases available and to use the technology for the selected databases. The patient-and-hospital-generated data can be collected from a high-performance computer, and cloud synchronization collects both medical history and genetic data. In order to analyze the data and apply MapReduce algorithms in HPC to build a structured database, we proposed a probabilistic data acquisition scheme. The system contains an interactive information collection warehouse that offers a two-way interaction between HPC and the cloud. We present a forecast algorithm to predict an illness in this research, which is done on cloud servers. For analytical prediction analyses, we use Random Forest, SVM, C5.0, Naive Bayes, and Artificial Neural Network.