PHOCs and Fisher Vectors based Image Captioning
To achieve better image captioning, text present in an image can be utilized that represents high-level semantics. However, the presence of this textual information in images can strongly guide the image captioning task. In this work, we deal with the issue of fine-grained image captioning by utilizing text contained in images as the additional information in combination with the visual features of an image. This paper exploits Fisher Vector Encoding that utilizes the text morphology. We demonstrated the usefulness of the method on two publicly available datasets; MSCOCO and Flickr30k.The results explain that the proposed model is comparable to the state-of-the art approaches for generating image captions. At last, we will talk about possible future prospects in image captioning.