<strong>Paper Title</strong><br>
Smart Gallery and Video Captioning Using Deep Learning<br>
<br>

<strong>Abstract</strong><br>
Video understanding has become extremely crucial since most of the data being generated today is in the form of
videos. Surveillance, social media, and informational videos have become a very common occurrence in our day-to-day
lives. Video captioning offers an easier way to recapitulate the data and use it for various other purposes like indexing and
searching. We provide a method for incorporating existing Deep learning models in an ensemble way for the purpose of
captioning and provide more accurate results in structuring and retrieval of video data. Video captioning in the field of deep
learning aims to generate elucidations for the events in the video automatically according to the visual information of the
given videos. Hence, we solve this with automatic generation of scene based video captions to summarize the data which can
be used for reference later. The result of this paper would be an end-to-end product framework that allows the users to drop
their videos into an envisioned smart gallery system where they can upload as many videos as they please and are able to
retrieve any event from the videos that have been uploaded into the gallery by entering the suitable events as text into the
input stream.
Keywords - Video Captioning, Object Recognition, Image Processing, Understanding and Structuring Video Data, Short-
Boundary-Detection, Smart Video Gallery