
Additionally, the model provider said that the updated gpt-realtime model has shown improvements in following complex instructions, calling tools with precision, and producing speech that “sounds more natural and expressive.”
These improvements, according to Dai, would help enterprises use the API for enabling low-latency, natural voice interactions for a spectrum of use cases, such as real-time medical transcription, conversational booking assistants, customer service for banking, insurance, and telco, and employee enablement across major verticals.
Enterprises accessing the model through the API can use two new voices, Cedar and Marin, the model provider said.