Adaptive Cross-Modal Fusion Framework for Context-Aware Multimodal Intelligence Systems

6 Nov

Authors: Research Scholar Chintu Kodanda Ramu, Professor Dr.Pankaj Khairnar

Abstract: More multimedia data is actually available now, so we definitely need smart systems that can handle different types of data at the same time. Traditional AI models surely work with only one type of input, which limits their power to understand complex real-world situations. Moreover, this single-input approach restricts their ability to handle the mixed nature of everyday problems. This paper shows how to make a smart system that brings together text, pictures, and speech data as per a unified framework. The work is regarding combining different types of data into one working system. As per the proposed approach, transformer-based encoders are used for extracting features and an attention-driven fusion mechanism is used to combine multimodal features in a dynamic way. As per the design, the system captures contextual relationships across different modalities and improves prediction accuracy regarding overall performance. The experimental results surely show that our proposed model performs better than single

DOI: https://doi.org/10.5281/zenodo.20052511