OpenAI introduced a new model of generative artificial intelligence, which she called GPT-4obeing the letter “o” “omni” regarding their ability to handle and manage text, voice and video in real time. This is a model with improved functionality, greater speed and performance, which the company sees as a step towards human-computer interaction much more natural.
As explained in the company headed by Sam Altman, GPT-4o accepts any information as input combination of text, sound and image and generates any combination of output data in the same formats. As specified, the model can respond to audio inputs in just 232 milliseconds, which is similar to the reaction time of a person during a conversation; and especially better in terms of vision and sound understanding compared to existing models.
GPT-4o promises to significantly improve the experience of ChatGPT, OpenAI’s chatbot, which until now offered the ability to interact and receive voice and text responses. But with the new model a video has been added that will make the ChatGPT application a virtual assistant. And the company reflected this in a series of videos that show how managers interact with the model on mobile phones and in different situations.
In the videos you can see what the model is capable of. identify the environment, sing, whisper, translate in real time, solve math problems, be sarcastic and express other emotions through intonation or singing, among other things. GPT-4o is multilingual, with the ability to handle 50 different languages.
In addition to improving the model’s capabilities, the company says it is focusing on interaction experience to make it simpler and more natural and allow users to focus on working with the tool, not just the interface. That’s why they see the new model as an important step in terms of usability.
It’s more Sam Altman The company’s CEO posted a cryptic tweet that said only “she.” Many users interpreted this as a nod to the Spike Jonze film starring Joaquin Phoenix, in which his character interacts with and falls in love with a digital assistant, which the company is now approaching with GPT-4o.
However, the company indicates that they optimization of model capabilities. “With GPT-4o, we train a single new model end-to-end across text, image, and audio, meaning all input and output are processed by the same neural network.“, explain from OpenAI. “Since GPT-4o is our first model to combine all of these capabilities, we are still just scratching the surface of the model’s capabilities and limitations.“
The company also explains that this safe modelwhich, among other things, underwent training in methods for filtering data and refining the behavior of the model through subsequent training, as well as establishing security barriers in voice output.
He also provided human and automated evaluations throughout the model training process; and was exposed to outside experts in the fields of social psychology, bias and fairness, and misinformation to identify risks.
However, the company is currently implementing GPT-4o’s text and image capabilities in ChatGPT for users with free modality
o Plus with an extended message limit. Over the next few weeks, he will work on the technical infrastructure, usability and security needed to launch other modalities, making them available to select users first.The same will happen with Developers They can already access GPT-4o’s text and image capabilities in the API, but will have to wait a few weeks to access audio and video.