Towards Conceptual Predictive Modeling for Big Data Framework
Predictive modeling is the process of creating a statistical model from data with the purpose of predicting future behavior. In recent years, the amount of available data has increased exponentially and “Big Data Analysis” is expected to be at the core of most future innovations. Due to the rapid development in the field of data analysis, there is still a lack of consensus on how one should approach predictive modeling problems in general. Another innovation in the field of predictive modeling is the use of data analysis competitions for model selection. This competitive approach is interesting and seems fruitful, but one could ask if the framework provided by for example Gane Project based on big data framework gives a trustworthy resemblance of real-world predictive modeling problems. In this thesis, we will state and test a set of hypotheses about predicative modeling, both in general and in the scope of data analysis competitions. We will then describe a conceptual big data framework for approaching predictive modeling problems. To test the validity and usefulness of this framework, we will participate in a series of predictive modeling competitions on the platform provided by Gane, and describe our approach to these competitions.