Learnings from the Limitations of Visual Question Answering Models: A Critical Analysis

Himanshu Sharma

Learnings from the Limitations of Visual Question Answering Models: A Critical Analysis

Himanshu Sharma

Abstract

Visual Question Answering (VQA) provides a way answering a question given in natural language related to a given image. These algorithms require image understanding and semantics of the question. Thus, the knowledge of computer vision (CV) field and natural language processing (NLP) domains is needed for performing the task of VQA. Both visual and question features are merged together to generate an answer. In this paper, we discussed the limitations of the few state-of-the art VQA models. Also, we discuss the failure cases of these VQA methods. Thus, this will give future directions to the researches to work on these limitations and improve the accuracy of the VQA models.

Requires Subscription PDF

Published

2020-05-01

How to Cite

Himanshu Sharma. (2020). Learnings from the Limitations of Visual Question Answering Models: A Critical Analysis. International Journal of Advanced Science and Technology, 29(06), 4443 - 4450. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/19318

Download Citation

Issue

Vol. 29 No. 06 (2020): Vol. 29 No. 06 (2020)

Section

Articles