Ai-Enhanced Real Time Speech To Speech Translation

18 Mar

Authors: Barath Raaj S A, D.Parameswari, Varnikhasri S, Udaya Kumar S

Abstract: The demand for effective communication through practical means continues to increase due to globalization. In such areas as education, health care, tourism and multinational cooperation, everyone faces some type of language barrier; therefore, all of these situations rely on an effective communication system to facilitate dialogue between individuals who speak multiple languages in their native language. This paper discusses a system that allows people who speak different languages to interact with each other in real time using a real-time speech-to-speech system (S2ST). It consists of three major parts: automatic speech recognition (ASR), neural machine translation (NMT) and text to speech (TTS) synthesis. ASR is the part of the system that takes the spoken input and converts it into an electronic form of the input as soon as the speaker has finished speaking. ASR uses streaming ASR technology to produce this conversion in near real time while removing background noise using noise reduction and detecting when there is actual voice activity so that the end user receives accurate and robust information under all conditions. The NMT portion of the S2ST system employs a transformer based neural translation model to generate translated electronic forms of the output of the ASR component. The NMT component uses semantic meaning and contextual correctness rather than simply using the word-for-word translations for the output of the ASR. The TTS portion of the S2ST system produces quality, natural sound outputs in order to enable conversational interaction. The experimental study of this system confirms that the system produces high levels of accuracy in translation and barriers and promoting inclusive global communication.

DOI: https://doi.org/10.5281/zenodo.19087958