Way back in 1997, AltaVista’s Babel Fish was the go-to destination for language translation online. Named after a creature in Douglas Adam’s Hitchhikers Guide to the Galaxy, the translation system is still around today, embedded within the Bing search engine.
Today, Google Translate dominates online translation. In 2016, Google estimated that it translates over 100 billion words a day and has over 500 million users. The Google Translate smartphone app can recognize words in images and translate on the spot – ideal if you can’t read a menu on your overseas vacation.
Now real-time language translation is breaking into speech.
How does it work?
Older iterations of this technology would listen to speech, convert it into text, and then translate it into the destination language. That’s changed with artificial intelligence (AI).
Now, when a translation engine listens to words being spoken, it will attempt to identify the language it hears and what’s being said. Waveforms of sound are analyzed to identify which parts seem to correspond to translations as it builds. The engine then attempts to translate what it thinks it hears into what it believes to be normal speech in the destination language.
To achieve this, a combination of different machine intelligence technologies is used: pattern matching software to identify sounds, neural networks and deep learning to identify “long-term dependencies” and predict what is being said, and encoders to process all this information. The task is supported by databases of common words, meanings and information learned from previous analyses of millions of documents.
This complex interplay of tech already generates accuracy of around 85 percent with translation taking between two to five seconds. The hope is that as AI evolves, both fluency and speed will improve.
Most existing translation systems rely on cloud-based analysis to work, meaning you may experience a short lag between utterance and translation. This is likely to improve as networks become faster.
Battle of the giants
Siri, Cortana, Alexa and other assistants in our mobile devices have already learned to understand what we say and execute commands in one language. Now they are learning to translate those words into other tongues.
Hardware for near real-time translation already exists:
- Apple’s Siri will soon translate between U.S. English and French, German, Italian, Mandarin Chinese and Spanish
- iTranslate and Bragi already offer Dash Pro, which translates 40 languages
- Google’s Pixel Bud earbuds work with Android devices to translate between up to 40 languages (Google also offers Google Translate and Word Lens, which translates words in photographs)
- Waverley Lab’s pilot system consists of two earpieces shared between two people who don’t speak the same language. Similar devices, including Translate One2One, have also been announced
Real-time speech translation
Microsoft is weaving real-time language translation inside many of its products, including Skype for Business, Skype Meeting Broadcast, live PowerPoint translation and more. Microsoft also provides developers with APIs to foster development of these technologies. Application partners include contact center, business intelligence and media subtitling.
Skype Translator saw use expand 400 percent in its first year of availability, with French to English revealed as the most popular language pair.
There’s certainly a way to use these technologies to enable collaboration with developing markets – Germany to Ghana was the top international Skype Translator calling corridor in 2016. Microsoft claims use of deep learning technologies have already made its translations at least 20 percent better than before.
These improvements are driving real world improvements in collaboration. The head teacher at Chinook Middle School in the state of Washington, U.S. uses Microsoft Translator online to speak to parents in their own language. He speaks in English while the parents follow along in their own tongue.
Microsoft intends to deploy its translation tools inside of its Microsoft Teams environment later this year. This is likely to spark much more use of these technologies in the enterprise, as the company already claims 200,000 organizations use that service.
As machine learning and networking technologies improve, it seems likely that most enterprises will begin to make more use of these real-time language translation tools. They may help them unlock fresh opportunity to build new revenue channels. Connectivity will need be robust, particularly in the corporate conferencing center, to support demand.
Anywhere365 Contact Center and Enterprise Dialogue Management platform from Workstreampeople already offers real-time speech translation, integrating natively into the Lync/Skype for Business platform.
“Translate your World” already offers a collaborative, real-time language translation-enabled web conferencing system. This provides all the tools you might expect to find in such a system, supplemented with real-time translation of what presenters say.
Lingmo International CEO Danny May believes machine intelligence provides the missing link. “Traditional translation apps don’t actually help much in real world situations. They miss too much of the context and nuance, and they’re generally hopeless at dealing with dialects,” he wrote in a blog for IBM. “By harnessing Watson’s AI capabilities, we realized we could achieve our goals much more efficiently,” he said.
Voice translation has uses beyond business. During the Iraq war, U.S. forces were equipped with translation devices, suggesting potential military use for such technologies. Emergency response and public assistance services, NGO’s and humanitarian relief organizations will inevitably find use for these systems in their work. That’s why Lingmo International’s IBM Watson-powered One2One earpiece system was introduced at the UN AI for Good Global Summit.
Google Brain researchers are developing neural networks that may become capable of translating languages without using written reference data. This may enable machines to translate little-known languages when required – there was no translation software available for Haitian Creole when disaster struck there in 2010.
Of course, words aren’t everything; communication is complex. Several studies from UCLA Professor Emeritus Psychology Dr. Albert Mehrabian (a global expert in verbal/non-verbal communication), show that while just 43 percent of communication relates to verbal, 55 percent is non-verbal. The International Committee of the Red Cross stresses that its interpreters must be able to connect and build trust with people and integrate with a country’s culture.
Real-time language translation solutions may help break language barriers, but the task of building trust and understanding requires a much more complex set of behaviors and a respect for cultural differences.
International Social and Media Manager at Orange Business Services. I'm in charge of our International social media and the English language blogs at Orange Business Services. In my spare time I'm literally captain of my own ship, spending my time on the wonderful rivers and canals of England.