What They Said
With the increasing availability and competition between voice-controlled smart home assistants ([see the October 18, 2016 LRSJ] client registration required), Lux recently interviewed Dawn Brun, Senior Manager of Public Relations from Amazon, about its Alexa platform and its future direction. Dawn said that Alexa, like many other voice-based assistants, relies on four key components to drive its conversational interface – Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Dialogue Management, and Text to Speech (TTS):
- The first step to answering correctly is speech recognition – hearing correctly. ASR is how we “hear” the user’s speech and convert it to text that we can then process. This is the challenge we had to overcome for Amazon Echo and Alexa – how do you get the machine to understand you from a distance, (i.e. in the far-field environment)?
- Second, we need to make sure we understand the user correctly. NLU helps us parse the user’s request into their true intent. This enables us to find the meaning behind the speech. NLU is a particularly interesting problem, as we want to clearly understand what you are saying. A human-being is very good at disambiguating multiple responses, but with a voice interface you want to try to make the one, right choice from the very beginning for them.
- Third, we need to decide how to respond to the user and take an action to address the request. We call this dialogue management. There’s also a personalization element here. We need to give the user the right response based on past behavior and preferences. So when a user asks to skip a song, we have to quickly deliver a new song that they will like.
- Finally, TTS – we convert text back to speech to respond to the customer’s request. And of course, the TTS needs to be very natural.
When asked about the initial vision for Alexa’s implementation and its vision going forward, Dawn said, “We wanted to create a computer in the cloud that’s controlled entirely by your voice – you could ask it things, ask it to do things for you, find things for you, and it’s easy to converse with in a natural way. We’re always inventing and looking at ways to make customers’ lives easier. We believe voice is the most natural user interface and can really improve the way people interact with technology.”
Asking how Alexa compared to other voice-based assistants, such as Google Now, Microsoft’s Cortana, Apple’s Siri, or Facebook M, Dawn said, “Alexa is different than a voice assistant on a phone or tablet, which is designed to accompany a screen. Alexa was designed with the assumption that the user is not looking at a screen; therefore, the interactions become very different than with other voice assistants. Alexa isn’t a search engine giving you a list of choices on a screen; she’s making a decision on the best choice and delivering that back to the customer. We also leverage AWS, which is a huge advantage – things like huge processing power, Lambda, IoT.”
What We Think
While the primary use for voice-based assistants appeared to be focused on the smart home, voice-based assistants are beginning to see newer use cases, including implementations in devices outside of the smart home. With the Alexa Skills kit, Amazon introduced a platform that allows multiple developers to create skills for its Alexa ecosystem, which are essentially apps that you talk to instead of touch. As it stands, there are 10,588 skills available for Alexa, up from 1,000 in June of 2016, which fall into multiple categories as shown below.
Based on the analysis above, although smart home control is a core use of the Alexa platform ([see the September 5, 2016 LRSJ] client registration required), it is not one of the top five categories. The top five categories – News; Games, Trivia & Accessories; Education & Reference; Lifestyle; and Novelty & Humor – are heavily weighted toward providing information and entertainment. This indicates that users of voice-based assistants are using them not only to control smart home devices, but as a way to interact with apps and devices like never before. As more devices continue to integrate voice-based assistants, such as smartphones and connected cars, we can expect to see more use cases outside of controlling the smart home.
By: Reginald Parris