LSTM Neural Network based Video Captioning Model
Abstract
Recent advances in image captioning using long short-term memory (LSTM) has inspired for its exploration in the application field of video captioning. Various real-time event recognition systems were hypothesized and studied that helped us in analyzing activities happening in the video frame. However, it was noted that they all provided a general idea of an event happening in the video, and there is a lack of some information which, if added will help in broader areas like for the visually impaired persons or general-purpose autonomous robots. In this paper, we present a real-time event recognition system using long short-term memory (LSTM) for captioning of an image in a video scene that helps us in determining the event with some more precise data in a particular range from the camera. This system will be able to give all information in a particular range like a known person with his/her name and distance from the camera, closer objects with their name and distance using intelligent technique of neural network keeping in mind that our model gives a good performance in real-time in a crowded traffic scene.