Image Captioning using IndRNN

Sukriti Rampal, Sparsh Gupta, Shubhang Verma, Dinesh K. Vishwakarma

Sukriti Rampal, Sparsh Gupta, Shubhang Verma, Dinesh K. Vishwakarma

Abstract

Deep Learning has made a significant impact in the field of Artificial Intelligence by achieving state-of-art results on various subjects important from the perspective of applied computing, one of them being Image Captioning. Describing an image is a task that combines both the aspects of Natural Language Processing and Computer Vision and is being vastly studied over the years. It requires evaluating visual semantic correlations among objects in the image as well as language understanding and the ability to comprehend visual-language interactions, to output sensible representation of the image as sentences. Various models have been proposed and used earlier in this field, producing state-of-art results comparable to the human BLEU-1 score. We have introduced an optimized CNN based en-coder, RNN based decoder model in our image caption generator that replaces the previously used RNN and LSTM architectures by Independently Recurrent Neural Network, IndRNN, which has efficiency in learning longer-term dependencies than the standard LSTMs. The objective of our model is to maximize the probability of occurence of the output sentence, for the image as input along with the previously generated captions. We have trained our model using Flicker8k and Flicker30k datasets and evaluated the model using the BLEU metric.