Analysis of Techniques to Handle Class Imbalance in Road Traffic Prediction
Class Imbalance refers to a scenario where there is a disparity in the amount of data present in each class in a given dataset, i.e. the dataset contains a few classes with a relatively large number of instances compared to the others. Applying such data to predictive models often generates biased outputs. The chances of this problem going unnoticed are higher when accuracy is the only performance metric. In this paper, we conduct a background study on research papers published in the last few years. We focus on studies where handling class imbalance is a major component of research. The domain chosen is related to road traffic: a pertinent real-life application of class imbalance, which needs further exploration and improvement. We also test different algorithmic and data-level methods to handle class imbalance, on our own dataset for Road Traffic Type prediction. Our results are evaluated based on F1-score, precision, recall and the Receiver Operating Characteristic (ROC) curve. Based on our findings, it is seen that the optimal approach is affected by the data used, as well as the metrics employed for evaluation.