A Review of Different Approaches for Visual Question-Answering Systems

  • Prof. K. P. Moholkar, Ajay Pisharody, Noorul Hasan Sayyed, Rakesh Samanta, Aadarsh Valsange


The ability of a computer system to be able to understand surroundings and elements and to think like a human being to process the information has always been the major point of focus in the field of Computer Science. One of the ways to achieve this artificial intelligence is Visual Question Answering. Visual Question Answering (VQA) is an AI-based system where for a given pair of question and image, the system attempts to give a relevant answer. This is usually a combination of image processing and understanding the question to match both to deliver the output. VQA can be efficiently used for scenarios where identification and classification of the images are needed. This can be done in different forms either as a run-time system constantly identifying its surroundings or a static application for medical sciences to find out important features like presence of cancerous cells, irregularities of an organ. VQA can be also used for surveillance systems, crime scene investigation. In this attempt, we try to build a model based on knowledge or additional information, with fine-tuned neural networks. In this paper, we have compared the results on popular image datasets.