Authors: Assistant Professor Sunil Yadav, Mohan Kadambande, Ashwini Kangane, Nitesh Lad, Urvashi Nahate
Abstract: The advancements in deepfake technology have come swiftly, allowing for the creation of extremely realistic altered images, videos, and audio material. Although there has been considerable progress in unimodal detection in current research, most approaches tend to concentrate on a single modality. This paper analyses more than 20 cutting-edge studies on deepfake detection and pinpoints significant research shortcomings, including the absence of multi-modal frameworks, limitations in datasets, lack of robustness, and insufficient interpretability. To address these issues, we built a prototype detection system based solely on single-modality images that employs two models: a custom Convolutional Neural Network (CNN) and Xception CNN. Our findings underscore the necessity for solutions that incorporate multiple modalities. We suggest an integrated framework for multi-modal detection encompassing images, videos, and audio, which represents the next advancement toward reliable and effective detection systems.
International Journal of Science, Engineering and Technology