The Near Human-like Assistant Capabilities of GPT-4o

in #hive-1679226 months ago

OpenAI released a new flagship model, GPT-4o, and what the demo they released yesterday shows us is a model that is much closer to a human-like assistant. It can reason across text, vision, and audio, and is way faster than the previous models. Although OpenAI had something else to be released this week—a Google search engine competitor—this new model's release proved to be far more impressive.

GPT-4o will have GPT-4-level intelligence, and it will also be accessible by everyone, even free users, but its capabilities will be "rolled out iteratively over the coming weeks," as OpenAI says in their blog post. Although it will be free for all users, paid users will have access to five times the capacity limits that free users will have.

"Natively multimodal," OpenAI CEO Sam Altman says about the model, meaning that it will be able to generate and understand content in text, voice, or images. And their live demo showed just how true that is, and even how much less latency there is with GPT-4o in conversations, making interactions with it more natural and coherent and enabling real-time conversations.

Screenshot_2024-05-14-19-03-19-395_com.google.android.youtube-edit.jpg

Voice capabilities have been around for a while with some of OpenAI's multimodal models, but what GPT-4o presents is a more intuitive and natural-sounding voice that you can even interupt abruptly in its speech, something normal with human-to-human conversations, and still have a coherent conversation—something that wasn't possible in previous models. You'd have to wait for the previous model to finish before you could feed it another set of data or respond.

GPT-4o does impressively well with understanding the emotions it can hear in the tones of voices. In the live demo, the model was asked to calm a person down as well as interpret the person's breath, to which it responded as an actual person would.

The vision capabilities of the model are equally astounding. GPT-4o is also capable of reading and understanding facial expressions, as well as describing what it sees around it. It is rather intelligent and more descriptive than you would expect a machine to be.

Screenshot_2024-05-14-19-03-52-989_com.google.android.youtube-edit.jpg

A linear equation was written on a piece of paper. Rather than being asked to solve it, it was asked to walk one through the steps, just as expected of a human teacher. Unsuspecting people listening in could think there was another person at the end of a video call guiding the person through the math problem. Yet another reason why this model is very close to a human-like assistant.

Watching through all 26 minutes of the live demo, I was in awe of how close to a human this AI reasons and responds. Asked to read a bedtime story, there were some abruptions and changes in the request of what the model should sound like. From a casual bedtime reading voice to a more dramatic one and then to a robotic kind, GPT-4o handled the request intuitively and naturally.

Another demo showed how GPT-4o guessed it's May 13th announcement. The OpenAI staff was dressed in a company's hoodie and sat in a studio-like room designed for recording. The AI was shown all that, and it guessed that the whole setup was for it.

Screenshot_2024-05-14-19-04-57-779_com.google.android.youtube-edit.jpg

A conversation between two GPT-4o models is even possible, as was also demonstrated. One model was allowed to see around but was asked to inquire about the surroundings of the other GPT-4o model that was allowed to look around. The entire conversation between them and the human user ended with all three singing a song together.

With all the demos we have seen, the list of things this new flagship model can do is endless, as are the doors to new innovative ideas it has opened. The 'o' in GPT-4o stands for "omni," which alludes to the fact that it reasons with text, voice, and images.

This is magical indeed, but it really is only the dawn of AI, and we expect even more before the year runs out. Until then, what are your thoughts on GPT-4o? You can watch the live demo below.

https://www.youtube.com/live/DQacCB9tDaw?si=INzLkXOo1kpza87i

By the way, make earnings with your content on Hive via InLeo while you truly own your account. If you're new, sign up in a few minutes by clicking here! And here's a guide on navigating.

Posted Using InLeo Alpha

Sort:  

I watched a couple of TikTok videos on that... I was shocked how much it has advanced...we are gradually entering the future

Shocking indeed! Aren't we in the future already, though? It's the dawn, I think.

Absolutely you're right on that, we are practically already in the future

You see it now.

Absolutely good buddy, I see it clear as water

Omo! This is impressive, one can only imagine how far AI are going because it has only just started.

The singing a song together part had me laughing as I imagined it happen 😅

Merit say na, "Omo." 🤓

The singing part actually shocked me. I never imagined AI would ever be able to do that. Will you use it when you get access? I'm curious.

If it's an easy to navigate, I would love to.

Yeah, of course. It's easy to navigate.

This development in GPT is big. AI developers just keeps surprising us.

The surprises just keep getting bigger everyday, man. I'm excited for what comes next.

Hmm, should we be getting scared of this almost human like AI?

Hmm

I think we should rather be excited. Life experiences just got more fascinating and easier with AI.

😂😂 alright Jay, if you say so