OpenAI has introduced a new language model ChatGPT-4o that works with audio, images, and texts in real time. The company announced this in its blog, UNN reports.
Details
Prior to GPT-4o, voice conversations with ChatGPT had an average delay of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4). The new model has improved these figures to an average of 320 milliseconds, which corresponds to the reaction of a live person.
OpenAI hopes that this product will be a step towards a more natural interaction between the user and the computer. GPT-4o can also act as a fast voice translator between interlocutors speaking different languages.
Addendum Addendum
Voice mode works through the synergy of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes the text and outputs text, and a third simple model converts this text back to audio. In addition, compared to existing language models, GPT-4o is better at understanding images and audio.
The new technology will be introduced gradually over the coming weeks. Separately, the company will present a PC application with new features.
Unlike GPT-4 Turbo, this product is free, but paid users will have access to more features.
OpenAI готує пошуковий продукт, кидаючи виклик Google09.05.24, 19:16