Authors: Usha Dhankar, Komal Khatak, Dr. Sweety
Abstract: The exponential growth of video data across domains such as surveillance, aerospace, and digital media has created a significant challenge in efficient content retrieval. Traditional approaches based on manual tagging and low-level visual features fail to capture the contextual semantics of video content. This paper proposes a semantic video discovery framework that integrates deep feature fusion with automated metadata generation. Visual features are extracted using deep learning models such as YOLO and Segment Anything Model (SAM), while textual features are derived using Natural Language Processing (NLP) techniques including FastText and Named Entity Recognition (NER). The fusion of visual and textual embeddings enables context-aware retrieval and improves semantic understanding of video content. Experimental results demonstrate enhanced accuracy, precision, and retrieval efficiency compared to traditional methods.
International Journal of Science, Engineering and Technology