Regression Model Based Prediction on COVID-19 Confirmed Cases, Recoveries and Deaths in India

  • Narayana Darapaneni, Anwesh Reddy Paduri, Sourabh Verma, Ankit Namdeo, Siddhartha Khushu, S. Sainath, Tarun Kr. Mittal


To leverage machine learning techniques to effectively develop a regression model-based prediction of COVID-19 confirmed cases, recoveries and deaths in India. This would help to be better prepared for upcoming cases thereby mitigating the impact risk. As part of this study, we have taken input data on COVID-19 in India (for data cleaning) from the following source: website and application programming interface (API) provided by [2]. The time period of data is from 30th Jan 2020 to 30th Jul 2020. The data includes cumulative confirmed cases, deaths and recoveries in India. Considering the nature and scope, supervised learning methods and algorithms were used namely- Polynomial regression and Support Vector Regression (SVR). The regression accuracies of the Polynomial Regression model used for L_Poly_reg_C, L_Poly_reg_R, and L_Poly_reg_D are 98.97%, 98.83% and 97.21% respectively for test data whereas the regression accuracies of the SVR model used for SVM_reg_C, SVM_reg_R and SVM_reg_D are 94.79%, 97.01% and 94.64% respectively for test data. Models trained on only the COVID-19 dataset perform poorly on test data. Also, the Root Mean Square Error (RMSE) values of the Polynomial Regression model used for L_Poly_reg_C, L_Poly_reg_R, and L_Poly_reg_D are 39170, 28263 and 1178 respectively whereas the RMSE values of the SVR model used for SVM__reg_C, SVM_reg_R, and SVM_reg_D are 82790, 42354 and 1634 respectively. Both Polynomial regression & SVR can be used for predicting the COVID-19 Confirmed, Recovery & Death cases for India. But based on the Training / Test Model Accuracies and RMSE values deduced, it seems that Polynomial regression model is slightly better in performance than SVR model for predicting the above data.