OpenAI released a new flagship model, GPT-4o, and what the demo they released yesterday shows us is a model that is much closer to a human-like assistant. It can reason across text, vision, and audio, and is way faster than the previous models. Although OpenAI had something else to be released this week—a Google search engine competitor—this new model's release proved to be far more impressive.
GPT-4o will have GPT-4-level intelligence, and it will also be accessible by everyone, even free users, but its capabilities will be "rolled out iteratively over the coming weeks," as OpenAI says in their blog post. Although it will be free for all users, paid users will have access to five times the capacity limits that free users will have.
"Natively multimodal," OpenAI CEO Sam Altman says about the model, meaning that it will be able to generate and understand content in text, voice, or images. And their live demo showed just how true that is, and even how much less latency there is with GPT-4o in conversations, making interactions with it more natural and coherent and enabling real-time conversations.
Voice capabilities have been around for a while with some of OpenAI's multimodal models, but what GPT-4o presents is a more intuitive and natural-sounding voice that you can even interupt abruptly in its speech, something normal with human-to-human conversations, and still have a coherent conversation—something that wasn't possible in previous models. You'd have to wait for the previous model to finish before you could feed it another set of data or respond.
GPT-4o does impressively well with understanding the emotions it can hear in the tones of voices. In the live demo, the model was asked to calm a person down as well as interpret the person's breath, to which it responded as an actual person would.
The vision capabilities of the model are equally astounding. GPT-4o is also capable of reading and understanding facial expressions, as well as describing what it sees around it. It is rather intelligent and more descriptive than you would expect a machine to be.
A linear equation was written on a piece of paper. Rather than being asked to solve it, it was asked to walk one through the steps, just as expected of a human teacher. Unsuspecting people listening in could think there was another person at the end of a video call guiding the person through the math problem. Yet another reason why this model is very close to a human-like assistant.
Watching through all 26 minutes of the live demo, I was in awe of how close to a human this AI reasons and responds. Asked to read a bedtime story, there were some abruptions and changes in the request of what the model should sound like. From a casual bedtime reading voice to a more dramatic one and then to a robotic kind, GPT-4o handled the request intuitively and naturally.
Another demo showed how GPT-4o guessed it's May 13th announcement. The OpenAI staff was dressed in a company's hoodie and sat in a studio-like room designed for recording. The AI was shown all that, and it guessed that the whole setup was for it.
A conversation between two GPT-4o models is even possible, as was also demonstrated. One model was allowed to see around but was asked to inquire about the surroundings of the other GPT-4o model that was allowed to look around. The entire conversation between them and the human user ended with all three singing a song together.
With all the demos we have seen, the list of things this new flagship model can do is endless, as are the doors to new innovative ideas it has opened. The 'o' in GPT-4o stands for "omni," which alludes to the fact that it reasons with text, voice, and images.
This is magical indeed, but it really is only the dawn of AI, and we expect even more before the year runs out. Until then, what are your thoughts on GPT-4o? You can watch the live demo below.
https://www.youtube.com/live/DQacCB9tDaw?si=INzLkXOo1kpza87i
By the way, make earnings with your content on Hive via InLeo while you truly own your account. If you're new, sign up in a few minutes by clicking here! And here's a guide on navigating.
Posted Using InLeo Alpha