A Comparative Study on Decision Tree and Random Forest using Konstanz Information Miner (KNIME)
Abstract
With vast amounts of data floating around everywhere, it is imperative to comprehend and draw meaningful insights from the same. With the proliferation of Internet and Information Technology, data has been increasing exponentially. The 5 Vs of data i.e. Value, volume, Velocity, variety and veracity will only make sense if we are able to examine the data and uncover the hidden, yet meaningful insights. With large data becoming a norm, a lot of data mining algorithms are available that help in data mining. We have tried to compare two classification algorithms, primarily Decision trees and Random forest. A total of 10 datasets have been taken from UCI Repository and Kaggle and with the help of Konstanz Information Miner (KNIME) workflows, a comparative performance has been made pertaining to the accuracy statistics of Random Forest and decision Tree. The results show that Random Forest gives better and accurate results for a dataset as compared to decision trees.
Key Words: Data Mining, classification, Decision Tree, Random Forest, KNIME, Accuracy statistics, Confusion Matrix.