Quora Question Pairs Similarity Using Logistic Regression and Support Vector Machine

  • Dr. P V Rama Raju, G. Naga Raju, N. Nikhil, M.Hemanth Gupta, Chandan Akella, S.Kumara Siddarth


Quora is an application in which, questions of several aspects are being posted .The questions that are posted are answered by persons who are having good knowledge about that corresponding aspect. Millions of questions are being posted in quora every day and they are all not necessarily identical. This paper explores the task of Natural Language Understanding by exploring the questions in the Quora dataset .We explored the dataset and used various machine learning models(linear and tree based models). XgBoost models, Support Vector Machine(SVM) models, Logistic regression models are used with TF-IDF and Word2vec algorithms to identify the similarity between questions that are posted on Quora .Our finding was that the TF-IDF neural network along with XgBoost has stood out by giving the best performance, outdoing the other complicated models.