LSTM Neural Network based Video Captioning Model

Nishtha Parashar, Arpit Dwivedi, Archit Bansal, Udit Saxena, Anurag Singh Kushwaha, Anurag Singh

Nishtha Parashar, Arpit Dwivedi, Archit Bansal, Udit Saxena, Anurag Singh Kushwaha, Anurag Singh

Abstract

Recent advances in image captioning using long short-term memory (LSTM) has inspired for its exploration in the application field of video captioning. Various real-time event recognition systems were hypothesized and studied that helped us in analyzing activities happening in the video frame. However, it was noted that they all provided a general idea of an event happening in the video, and there is a lack of some information which, if added will help in broader areas like for the visually impaired persons or general-purpose autonomous robots. In this paper, we present a real-time event recognition system using long short-term memory (LSTM) for captioning of an image in a video scene that helps us in determining the event with some more precise data in a particular range from the camera. This system will be able to give all information in a particular range like a known person with his/her name and distance from the camera, closer objects with their name and distance using intelligent technique of neural network keeping in mind that our model gives a good performance in real-time in a crowded traffic scene.