Amazon has announced a new AI model designed to power voice-to-voice interactions in natural, conversational language. The company says that the new model, named Amazon Nova Sonic, uses the same technology present in its next-gen reboot of Alexa . The model is expected to be used by software developers to create advanced AI agents that can interact with humans and take actions on their behalf.

Amazon said in a blog post that Nova Sonic combines speech understanding and speech generation into a single model, making it particularly useful for building AI-powered vocal assistants, especially in customer support.

Many other vocal AI assistants work by first transcribing a users’ words into text, processing the text, generating a text response, and then converting that text back into audio. This process can take some time, and can slow down conversations. But by using speech-to-speech models like Amazon’s Nova Sonic, developers can cut out most of the steps, greatly shortening the time between a user’s request and the model’s response.

Not only is this method faster, but it also enables the AI models to pick up on details that would otherwise be lost, like a users’ tone of voice or speaking cadence. These kinds of details wouldn’t be transcribed, but the voice-to-voice system can understand the nuances in speech and adapt accordingly. To show how this works, Amazon created a fake AI-powered travel agent, and asked it to buy plane tickets. In a recording, a user expressed apprehension about the price of a ticket, and in response the AI agent’s tone became reassuring. It told the user not to worry while it searched for low-cost options.