Advanced Deepfake Detection Using Machine Learning

27 Apr

Authors: Prof. Pradnya Patange, Atharv Pate, Harsh Lonari, Mayuresh Kshirsagar, Manish Patil

Abstract: The rapid advancement of deepfake technology has introduced significant challenges to digital media authenticity, enabling the creation of highly convincing synthetic images and videos that are difficult to distinguish from genuine content. This paper proposes an advanced deepfake detection framework based on the Temporal Vision-Language Transformer (TVLT), a cutting-edge multimodal deep learning architecture that jointly learns from visual, temporal, and semantic representations. Unlike traditional convolutional or recurrent models that focus solely on spatial or temporal domains, the proposed TVLT-based system integrates cross-modal attention to capture complex correlations among video frames, motion patterns, and audio-text alignment cues. The model efficiently identifies inconsistencies in facial movement, speech synchronization, lighting, and microexpressions — features that deepfake generation methods struggle to replicate authentically. Experimental evaluation on benchmark datasets including FaceForensics++, Celeb-DF, and DFDC demonstrates that the proposed system achieves accuracy exceeding 94%, with high precision and recall, significantly outperforming single-modality detection approaches.