Amazon Upgrades Alexa for the ChatGPT Era

SEP 20, 2023

When Amazon launched the Alexa virtual assistant nine years ago, its ability to decode voice commands to set a timer or play a song seemed almost magical. Today, the bar for impressive language skills is much higher, thanks to OpenAI’s ChatGPT. Amazon is giving its voice assistant a reboot that takes advantage of the technology behind the new wave of chatbots that can engage in remarkably lifelike conversation.

Amazon announced the upgrade to Alexa at an event held at its second headquarters in Arlington, Virginia. The assistant will answer much more complex questions and engage in more flowing, open-ended conversation, dropping the need for users to say “Alexa …” at each turn.

In a few weeks, users who say, “Alexa, let’s chat,” will get access to the new, more capable voice assistant. Amazon calls it an “early preview” because the new capabilities remain a work in progress.

Demos given onstage on Wednesday showed Alexa exhibiting more simulated personality with its intonation and efforts at humor. Videos showed people asking Alexa to write poems on a theme, brainstorm ideas for a date night, and generate a story about Jell-O. Devices equipped with cameras, such as the Echo Show, will try to detect when a person is expecting Alexa to continue the conversation and when the conversation is over.

The new Alexa will also modulate its own voice to create a more natural-seeming back-and-forth. “If I ask Alexa how the Red Sox are doing, and they have just lost, it will come back with an empathetic tone,” says Rohit Prasad, who leads AI development at Amazon and is based in Cambridge, Massachusetts.

Prasad says that upgrading Alexa’s language skills required extensive engineering, because the large language models that power services like ChatGPT can make up facts, blurt out nonsense, and be downright inappropriate. “Especially given certain limitations of language models, this is a huge leap,” Prasad says.

Justine Cassell, a professor at Carnegie Mellon University who studies the way humans interact with AI agents, says it will be fascinating to see how people respond to a voice-enabled chatbot capable of richer responses. “The goals are great, and I'm excited to see what they do,” she says.

However, Cassell says some of the things Amazon is promising, like responding to body language, remain extremely challenging. “There is no grammar of body language, the way there is for spoken and written language,” she says. If Alexa misreads someone’s posture or movements and responds incorrectly, things could get awkward.

Cassell says that even if Alexa gains more ChatGPT-like fluency, its efforts to mimic human personality and feeling through characteristics like intonation are unlikely to match human capabilities for some while yet. Expect the new Alexa to sometimes feel stilted in its responses.

Amazon says users will be able to apply to gain access to an additional test of its new technology, where Alexa’s new capabilities can be used to control other devices, including some not made by Amazon. Over time, the company plans to add new features to Alexa, potentially including the ability to discuss and recommend products from the company’s vast inventory of products.

If Alexa can respond to more complex queries while avoiding embarrassing errors, it could herald a wider—and much needed—upgrade in the capabilities of voice assistants.

When Amazon launched Alexa in 2014, it helped create a new category in personal computing built around voice interaction, spurring predictions that voice interfaces would soon dominate. Alexa and Apple’s Siri benefited from advances in machine learning that finally made it feasible for devices to reliably recognize and respond to a user’s voice. But the complexity of language has limited these devices to only simple commands and left them unable to engage in anything resembling a real conversation. Even so, Amazon says that over half a billion devices featuring Alexa have been sold worldwide.

The advent of large language models trained on vast amounts of text has at last created algorithms that can handle more complex dialog. ChatGPT and other chatbots have startled both experts and the public with their flexibility and garrulousness, even though they are prone to spitting out statements that may be false, biased, or even offensive.

Prasad says Amazon developed a new cutting-edge large language model to invigorate Alexa. He says that the company fine-tuned this model toward phrasings appropriate for vocal conversation, and it uses additional algorithms to help with recognition of body language and intonation.

One of the big challenges for Amazon may prove to be handling the surprising errors that come with using large language models. When Microsoft added an advanced AI chatbot to its search engine Bing, users quickly discovered some odd behavior. “Is it 100 percent perfect? No,” Prasad says. “This is why it's an early preview, because there will be occasional errors.”

Prasad says Amazon has already developed guardrails to prevent Alexa from straying off course. He adds that some will remind people they are talking to a machine, and try to avoid the assistant presenting too much like a person. Some chatbot users form strong emotional and even romantic bonds with the simulated personalities they interact with. Prasad adds that Amazon is doing research on the long-term risks that may come from further advances in AI.

Similar articles you can read