Deep Fake Audio Detection Using MFCC And NLP

24 Apr

Authors: Mrs. M. Devika, UG Student

Abstract: The recent developments in restrictions on AI have resulted in the ability to create almost realistic-sounding deep fake audio content which has created new threats to online safety, security and privacy with many kinds of identity crimes or fraud, misleading or erroneous information, impersonation. The vast majority of audio recordings made by real people sound different than those produced synthetically with subtle variations in voice characteristics; therefore, it is becoming increasingly difficult for traditional methods to accurately identify any audio as being ‘real’ or ‘fake.’ Consequently, the design of an AI-powered method for detecting Deepfake Audio will utilize Mel Frequency Cepstral Coefficients (MFCCs) for feature extraction (from both samples of voice recordings) using Natural Language Processing (NLP) incorporating an analysis at the level of speech to determine whether or not a sound is legitimate based on their similarities and differences to other sounds and how the two audio files relate to one another within the context of their use. Artificial Intelligence will aid in achieving this objective through the extraction and classification of the spectral and linguistic features contained within the audio files by two separate models: Machine Learning (ML) and Deep Learning (DL). The solution has been demonstrated to improve detection rates and to exploit multiple types of media that can be employed for producing deep fake audio attacks; hencely making it extremely resilient against attacks of all types. Therefore, the technology has the potential for augmenting the safety, reliability and confidence level that individuals have when using voice-based digital communications.