Maximizing Joint Probability in Visual Question Answering Models

  • S. Vidya Sagar Appaji, P.V.Lakshmi, P. Srinivasa Rao


Visual Question Answering(VQA) is a task which is an important for the comprehension of humanity that will be useful for effective computer solution for a defined picture. In this article a novel VQA model is proposed which will derive the various low, high-level semantics for VQA by exploiting intermediate CNN layers. The model is Hierarchical Feature Network (HF-Net) where each hierarchical feature combines the attention map and multimodal pooled. This combination occurs at the answer, reasoning stage to get both high and low-level semantics. In the attention regions, the HF-Net is superior which is demonstrated by qualitative experiments. It also achieved the anomaly occurring due to the rephrasing of questions by a state-of-the-art model.

How to Cite
P. Srinivasa Rao, S. V. S. A. P. (2020). Maximizing Joint Probability in Visual Question Answering Models. International Journal of Advanced Science and Technology, 29(3), 3914 - 3923. Retrieved from