Google has announced Translatotron, a “speech to speech translation system” that will translate speech to speech while maintaining the speaker’s voice and confidence.
“Translatotron is the first end-to-end model that can directly translate speech from one language into speech in another language,” Google AI wrote in a blog post.
According to Google, Translatotron has basically three parts. First part is automatic speech recognition that converts the source speech to text. The second part is called machine translation that translates the transcribed text into the target language. The last part is text-to-speech synthesis (TTS) to generate speech in the target language from the translated text.
Translatotron is based on a sequence-to-sequence network model that takes source spectrograms as input and generates spectrograms of the translated content in the target language. It then uses neural vocoder and a speaker encoder which used to maintain the speaker’s voice.
“We hope that this work can serve as a starting point for future research on end-to-end speech-to-speech translation systems,” the blog post noted.