Maximizing Joint Probability in Visual Question Answering Models

  • S. Vidya Sagar Appaji, P.V.Lakshmi, P. Srinivasa Rao

Abstract

Visual Question Answering(VQA) is a task which is an important for the comprehension of humanity that will be useful for effective computer solution for a defined picture. In this article a novel VQA model is proposed which will derive the various low, high-level semantics for VQA by exploiting intermediate CNN layers. The model is Hierarchical Feature Network (HF-Net) where each hierarchical feature combines the attention map and multimodal pooled. This combination occurs at the answer, reasoning stage to get both high and low-level semantics. In the attention regions, the HF-Net is superior which is demonstrated by qualitative experiments. It also achieved the anomaly occurring due to the rephrasing of questions by a state-of-the-art model.

Published
2020-02-27
How to Cite
P. Srinivasa Rao, S. V. S. A. P. (2020). Maximizing Joint Probability in Visual Question Answering Models. International Journal of Advanced Science and Technology, 29(3), 3914 - 3923. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/5146
Section
Articles