Ensembled Based Machine Learning Model for Customer Lead Prediction
Abstract
This paper proposes an Ensemble-based Machine Learning prediction algorithm using the boosting technique which comprises AdaBoost, GradBoost and XGBoost algorithms. Lead prediction is a challenging issue due to multiple parameters associated with it. Manual lead conversion prediction is a time-consuming process and could not provide the accurate result. We carried out the experiment on 6 different Machine learning classification algorithms including Random Forest, Logistic Regression, K-nearest-neighbor, Decision Tree, Support Vector Machine and Naïve Bayes. The algorithms were trained after successfully selecting the best 11 features from the dataset using feature importance in Python. We later implemented a comprehensive additive Ensemble model on all the classification algorithms usingthe 3 Ensembled-based boosting techniques on the same dataset to mergepredictions from the different algorithms and compare their performance with the Machine learning classification algorithms. The proposed ensemble-based Machine Learning prediction algorithms out-performed all the 6 Machine learning classification algorithms used for this research. Not a single algorithm could match this performance. The testing results showed that the proposed Ensemble-based prediction algorithm achieved an average accuracy of 92%, exceeding the accuracies of Random Forest, Logistic Regression, K-nearest-neighbor, Decision Tree, Support Vector Machine and Naïve Bayes which are 91%, 90%, 90%, 89%, 89% and 83% respectively. Lastly, the proposed Ensembled-based Machine learning prediction algorithms adopted in this paper are reliable and useful in the retention of customerswhich is the ultimate goal of any financial institution.
Keywords: Customer, Ensemble model, Lead prediction, AdaBoost, Random forest, Logistic
Regression.