Hybrid Deep Learning For Deepfake Detection: A Systematic Survey

31 May

Authors: Komal Khatak, Sonal Beniwal

Abstract: Advancements in generative AI, especially in generative adversarial networks (GANs) and diffusion models, have enabled greater accessibility for users to produce hyper-realistic synthetic media. This has made deep fake detection tools that work with a single modality more vulnerable to adverse real-world environments. To address this, we have implemented a structured survey of hybrid deep learning frameworks that integrate different types of networks, data modalities and representations of the domain. We present a 6-category taxonomy which covers the following areas: (i) CNN architectural hybrids, (ii) CNN-CNN temporal models, (iii) cross-audio-visual modalities, (iv) spatial-frequency hybrid systems, (v) hybrid systems with a forensics perspective, and (vi) adversarial hybrid systems that integrate explainability and adversarial robustness (separation of the model) systems. For the aforementioned areas, we have analyzed the rationale of the design, the fused strategies, and the performance of the systems in relation to the current benchmarks as well as challenges that still persist. Through the analysis of the 45 studies we have examined, we have determined that hybrid models consistently outperform single stream models, especially under compression, domain shifting, and adversarial attacks. Lastly, we have identified challenges that need to be addressed including the generalization gap, the absence of a benchmarking framework, and poor interpretability and we outline systematic and important methods to direct future research with these challenges

DOI: https://doi.org/10.5281/zenodo.20507777