Artificial intelligence: How to turn Siri into Samantha

13 February 2014

To play this video you need to enable JavaScript in your browser.

This video can not be played

Can virtual assistants do better?

Leo Kelion

Technology writer

"Siri, why do you struggle with conversations?"

"I don't know what you mean - how about a web search for it?"

If you want the latest football scores, to add meetings to your calendar or launch an app, today's virtual assistants are relatively good at understanding your voice and doing what's asked.

But try to have the type of natural conversation seen in sci-fi movies featuring artificial intelligence systems - from HAL in 2001 to the sultry-voiced operating system Samantha in Spike Jonze's Her - and you'll find your device about as smart as a waterproof teabag.

"Google and Apple are painfully aware that their systems are not getting better fast enough because right now Siri and Google Now and the other personal assistant type applications are all programmed by hand," says Steve Young, professor of information engineering at the University of Cambridge.

"If you speak to Siri about baseball it seems relatively intelligent, but if you ask it something much less common it doesn't really do anything except for a web search.

"That's an indication that the programmers have been busy trying to anticipate what people want to ask about baseball but haven't thought about people who ask about, for example, GPU chips because you don't get many queries about that."

Dancing a tango

So what's the alternative?

Microsoft doesn't yet have a virtual assistant on its Windows Phone platform, but the company is experimenting with AI in lifts and reception desks at its headquarters.

Eric Horvitz, managing director of Microsoft's research unit, believes part of the solution involves allowing computers to look beyond questions posed.

"The ability of a system to understand more broadly what the overall context of a communication is turns out to be very important," he told the BBC.

Her still — Image caption,
The movie Her is about a man who falls in love with his smart device's operating system

"There are some critical signals in context. These include location, time of day, day of week, user patterns of behaviour, current modality - are you driving, are you walking, are you sitting, are you in your office. Are you in a place you are familiar with versus one you are not?

"A person's calendar can be a very rich source of context, as is their email."

He adds that for a more natural interaction, software also needs to learn how to simulate the rhythm and beat of the way humans talk to each other.

To do this, he says, computers should be working out their response, external while the person is still speaking, rather than waiting for them to finish.

"It turns out that conversation is more or less like a very, very complex tango - a dance between two people," he explains.

"[It] involves not just a simple turn-taking, like you might see in today's assistants on cellphones, for example.

To play this video you need to enable JavaScript in your browser.

This video can not be played

Microsoft's Eric Horvitz speaks to the BBC's Leo Kelion

"It's actually a very complicated, fluid operation where people are breaking in and starting over again and reflecting and listening, all at the same time sometimes."

Mr Horvitz wouldn't reveal when Microsoft might start offering such capabilities to the public.

But reports suggest that the company could unveil Cortana, external - an app named after the AI system in its Halo video game - in April.

Coughs and tuts

Apple is notoriously secretive. But research by a company whose tech helps power Siri provides clues about how the facility could be improved.

Voice-recognition specialist Nuance says its researchers are currently studying paralinguistics - how users speak rather than what they talk about.

"We're looking at the acoustic elements to be able to detect emotions in speech," reveals John West, a principal solutions architect at the firm.

"The intonation, what's termed the prosody - the tune you use to speak - if you are happy it rolls along quite nicely. If you are sad it's more abrupt - and the language used."

As well as helping clients' AIs work out the best response, he says this can also help them sound more natural.

Image caption,
The forthcoming Microsoft AI app is reported to be based on Cortana - a character in its video game series Halo

"Although I've yet to see it deployed, we do have the capability to put hesitations and other non-verbal audio into an output engine," he says.

"However, they need to be very carefully programmed because you need to understand where to put the pauses, tuts, breathes and possibly a cough."

DIY database

But Prof Young believes a more fundamental change is needed: rather than telling an AI how to respond we should make it learn through a process of trial and error.

This is the basis of a system he is developing called Parlance, external.

An example of conversation it might have would be:

Human: I want to eat a pizza
AI: Sorry, I don't know what a pizza is
Human: OK, well do you know where there's a nice Italian restaurant?
AI: Yes, there's one 20m down the road to your right
Human: Thank you

If the user appears satisfied, Prof Young says, the computer adds an association to its knowledge database.

"It stores this away, not as a rule, but it changes the probabilities in its statistical maps," he explains.

"So, the next time someone asks for a pizza it knows that you get them from an Italian restaurant. And it's not been told that except through the users themselves."

Image caption,
Google Now accepts voice commands but tries to anticipate its users' needs

Google's £400m takeover of British AI developer DeepMind could hasten the rollout of such self-taught systems, improving the quality and breadth of knowledge offered, Prof Young believes.

But both he and Microsoft warn they still won't deliver the kind of sentient presence Hollywood loves to depict.

"When I come in the morning my [AI] assistant on my door recognises me and in a very nice British voice says: 'Good morning Eric' - and I enjoy it even though I know it's artificial," says Mr Horvitz.

"So, I do think that we will be able to come up with very compelling personalities.

"However, unlike the kinds of things we see in the movies, for many years to come there probably won't be anybody home in the way people would expect or desire."

Artificial intelligence: How to turn Siri into Samantha

Dancing a tango

Coughs and tuts

DIY database

More on this story

Related internet links

Destination X