An innovative technology, aimed at improving video communication, was tested at the Titanic wreck site
The COVID-19 epidemic has greatly increased the use of video communication, but occasionally, during conferences or meetings, participants find it difficult to tolerate poor transmission quality, dropouts, and connection issues.
Researchers from Carnegie Mellon University (CMU) and Karlsruhe Institute of Technology (KIT) have created a technique for sending video conferencing via extremely low bandwidth connections, enabling such transmissions even under difficult circumstances. The Titanic wreck, which is located in the North Atlantic at a depth of around 4,000 metres, was used to test it.
“Transmitting data from a depth of four kilometres through salt water without any loss is extremely difficult,” explained Professor Alex Waibel, in charge of research on speech translation at CMU and KIT.
Sonar transmission can only be between the submersible and the mother ship under natural circumstances because radio communication is impossible in salt water. The researchers have created artificial algorithms for text conversion from video data. The submersible converts the audio recording into text first, which is sent to the surface via sonar sound pulses, where the text is used to recreate the video.
“The video then features a synthetic voice that is mapped to the voice of the person who is speaking, so that it sounds like the voice of that person. In addition, the video synthesis is controlled in such a way that the lips of the speaker move in sync with the sound,” said Professor Waibel, whose expertise involves speech translation, speech recognition and speech processing “In the future, this will facilitate remote communicate in spoken language”. Moreover, it may also be used to lip-sync videos or to synthesise videos in other languages.
Prof Waibel’s technique was put to the test on the Titanic disaster, building on decades of innovative voice translation research. One of Waibel’s innovations is the Lecture Translator, which is used at KIT to automatically record the lecturer’s voice and convert the speech signals into written English text at the same time. Students can use their laptops, smartphones, or tablets to follow the lecture in this manner.
Sources: TechXplore, Karlsruhe Institute of Technology.