the great natural language challenge

February 25, 2015 Giorgio Heiman , Digital Transformation

Natural Language Processing (NLP) is defined as a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. Put more simply, it is the ability of a computer to understand what a human is saying to it.

In 2013, the Hollywood movie ‘Her’ focused on the relationship between a human character and his virtual companion, a Siri-esque voice that inhabited and controlled his smartphone. This type of individualized interaction between humans and technology operating systems is likely to continue to increase as technology evolves. As a part of this, it will be interesting to see how NLP progresses. It is a complicated area and one that appeals greatly to the amateur linguist in me as well as linking to my work in the application of technology.

The challenges of non-native speakers

Those of us who are non-native English speakers regularly encounter challenges similar to those that technologists probably face in teaching sophisticated cognitive systems how to speak English: language is full of idiomatic expressions, small and sometimes completely illogical idiosyncrasies, and subtle nuances. Let’s take some expressions in the English language as examples. Why do we fill in a form by filling it out? Why, when discussing the probability of something succeeding, do we talk about it having a ‘fat chance’ or a ‘slim chance’ but we mean the same thing, while if we refer to a person as a ‘wise man’ or ‘wise guy’ we mean almost the opposite, the first being positive and the second negative? Sometimes the same words can have very different meanings depending on the listener. If you use the expression “table a topic for discussion” with someone speaking British English they will understand “let’s discuss this now”, while an American English speaker will think you want to postpone any such discussion indefinitely.

We can add to this conundrum the classical ‘false friends’ trap. Author and journalist Bill Bryson in his book ‘The Mother Tongue’ offers an almost perfect example: in 1905, the draft treaty between Japan and Russia was written in both French and English, the languages of diplomacy at that time but neither was one of the two countries’ native languages. The English word ‘control’ was translated into the verb ‘contrôler’ in French. The meanings are drastically different, with the English word meaning ‘to dominate or hold power’ and the French word meaning simply ‘to inspect’. As such, the treaty nearly fell to pieces, underlining the importance of recognising nuance when doing translations and recognizing the possible catastrophic consequences resulting from miscommunications between languages.

It is ultimately in English where this complicated process of teaching computers nuance and ambiguity in language begins. English has become the de facto lingua franca of the world today. It is also the most common language used on the internet and in computing in general. But which “English” is being used? Native British English? American English? Australian English? Singaporean or Indian English? Or the non-native English of, for example, an Italians or a Chinese?

How NLP is moving forward

The number of non-native English speakers is growing – the most recent estimates put the total number of native and non-native English speakers at somewhere around 1.8 billion – and this means that the number of linguistic traps and double meanings is geometrically multiplied.

Addressing NLP in the technology sector means designing and building software that can analyse and understand diverse different languages and their nuances and ambiguities, and then ensure that the technology output gives language and vocabulary that is not only coherent and understandable but is commonly used by humans. The objective in the long-term is to be able to talk to a computer in exactly the same way you would talk to a colleague, a friend or loved one, and have the computer understand you in the same way.

But teaching a computer to speak English is not the same as teaching language to a young child. To achieve this technological feat, technologists employ a range of knowledge-based design and engineering tools plus statistical and machine-learning techniques that help the computer distinguish between languages, phrases, idioms, and so on. This is done by parsing language in a detailed way, explaining linguistic structure, knowledge frameworks and key concepts, thus allowing computers to ‘learn’ language more and more quickly.

Will all technology use NLP?

There is already natural language technology present in Apple’s Siri and Google Now, but mobile devices are not yet designed and built around NLP as a core user interface.. As the technology evolves, however, it is likely that NLP will continue to go mainstream and become embedded in more and more devices – basically in any application where a human end-user can benefit by being able to communicate with machines in a natural way, whether it is their computer, their refrigerator, or the control system of a nuclear power plant.

Underlying all this is the question of artificial intelligence (AI). Part of teaching computers to use NLP requires enabling them to understand compositional semantics and narrative understanding, both of which are steps on the path to full AI. While we enjoy the benefits, convenience and fun of interacting with our technology devices on a more natural, ‘human’ level, there remain concerns and inherent risks about trusting AI, not in trusting that computers are capable of performing complex tasks, but in trusting that computers are capable of understanding the breadth, depth and nuance of human language.

The development of NLP will be fascinating to observe.

Giorgio Heiman

Giorgio Heiman has more than twenty years experience in the development, implementation, sales and marketing of global multimedia networks and distributed computing Information Technology solutions. He has served in a variety of roles, including Business Unit Director, Business Development Manager, Outsourcing Practice Manager, Program Manager, Internet Services Product Manager, Principal Consultant, and Network Applications Service Manager.

Giorgio is heading our Global Solutions and Services BU and organization for Emerging Markets and Indirect, covering Middle East and Africa, as well as globally for SITA our channel to the Air Transport Industry. Since his appointment a few years ago, Giorgio has focused the organization to develop innovative services beyond the core Enterprise portfolio, with a specific focus on Smart Cities and HealthCare in Middle East.

Prior to joining Equant in 2004 Giorgio held a variety of positions with SITA, AT&T International, Digital Equipment Corporation and CERN, the birthplace of the World Wide Web.

Giorgio’s expertise include his consulting skills and ability to harness technology to deliver innovative business solutions, as well as leadership and management of multi cultural and geographically dispersed teams.

Giorgio graduated with Honors at the Department of Experimental Physics, University of Bologna, Italy, and co-authored more than fifteen publications in scientific journals.