eXtreme Gradient Boosting in Application to Bankruptcy Prediction

Ravtej Singh Sandhu

Ravtej Singh Sandhu

Abstract

Bankruptcy prediction is an important issue in man- agement and decision making for policy makers and economists. Therefore, increase in accuracy of its prediction is strived for. Increasingly, machine learning techniques are being used by researchers around the world. This paper explores the applica- tion of various classification methodologies, specifically gradient boosted decision trees, in prediction of bankruptcy using partially annotated data-set. The data sets have been derived from Emerging Markets Information Service. Additionally, some clustering techniques have been used to predict the labels of the unannotated data-set. Using the data-set (partial and the predicted labels), a final prediction of bankruptcy is performed. The results have been evaluated on the test data-set using the Matthews Correlation Coefficient (MCC) . The results shows a superior performance in predicting bankruptcy. Further, taking the MCC scores generated by testing the partially labelled data- set as base- line, our results suggest that clustering does not further improve the MCC score.