Categories: Technology

New assistants from OpenAI and Google predict the next AI battle

This week Google and OpenAI announced the creation of artificial intelligence assistants super-powerful: tools that can communicate with you in real time and resume a conversation when you interrupt it, analyze your surroundings with real-time video, and translate conversations on the fly.

OpenAI got ahead of itself on Monday with the unveiling of its new flagship model, GPT-4o. The live demo showed him reading bedtime stories and helping solve math problems, all in a voice that sounds a lot like Joaquin Phoenix’s AI girlfriend in Her (a trait that didn’t go unnoticed by CEO Sam Altman).

Google announced its own tools on Tuesday, including the Gemini Live conversational assistant, which can do many of the same things as GPT-4o. He also said that he is creating a kind of universal AI agent that is currently in development, but which will not see the light of day until later this year.

You’ll soon be able to see for yourself whether these tools will be as useful in your daily life as their creators hope, or whether they’re more of a sci-fi gimmick that will eventually lose its charm. Here’s what you need to know about how to access these new tools, what you can use them for, and how much they’ll cost.

GPT-4o from OpenAI

What he can do: According to OpenAI, the model can talk to you in real time with a response latency of about 320 milliseconds, which is comparable to natural human conversation. You can ask it to interpret whatever you point your smartphone camera at, and it can help you with tasks like coding or text translation. It can also summarize information and generate images, fonts, and 3D representations.

How to access: OpenAI says it will begin implementing GPT-4o’s text and image features in the web interface as well as the GPT app, but has not given a date. The company says it will add voice features in the coming weeks, although it hasn’t given an exact date. Developers can already access text and visual features in the API, but speech mode will only be available initially to a “small group” of developers.

How much does it cost?: GPT-4o will be free to use, but OpenAI will set limits on how much the model can be used before upgrading to a paid plan. Those who join one of OpenAI’s paid plans, which start at $20 per month (€18.50), will receive five times the capacity in GPT-4o.

Google Gemini Live

What is Gemini Live?: This is Google’s closest product to GPT-4o: a version of the company’s artificial intelligence model that you can communicate with in real time. Google says that “this year” you’ll also be able to use the tool to communicate via video. The company promises that it will be a useful assistant in communication, for example, when preparing for an interview or rehearsing a speech.

How to access: Gemini Live will launch in the “coming months” as part of Google’s premium AI plan Gemini Advanced.

How much does it cost: Gemini Advanced offers a two-month free trial, with subsequent periods costing US$20 per month (€18.50).

What is the Astra project?: Astra is a project to create an artificial intelligence agent that can do everything. It was presented at the Google I/O conference, but will not go on sale until this year.

People will be able to use Astra through their smartphones and perhaps from their computers, but the company is also exploring other options, such as embedding it in smart glasses or other devices, Oriol Vinyals, vice president of research at Google DeepMind, told MIT Technology. Review.

Which is better?

It’s hard to know without having the full versions of these models in hand. Google showed off the Astra project in a slick video, and OpenAI chose to present GPT-4o in a seemingly more authentic live demo, but both asked the models to do something the designers had likely already practiced. The real test will come when millions of users are faced with new demands.

However, if you compare the videos published by OpenAI with the Google videos, The two main tools seem very similar, at least in terms of ease of use. To summarize, GPT-4o seems to be a little ahead in terms of sounddemonstrating realistic voices, speaking fluency and even singing, while Project Asta showcases more advanced visual capabilities, for example, the ability to “remember” where you left your glasses. OpenAI’s decision to roll out new features faster could mean its product will initially be used more often than Google’s, which won’t be fully available until later this year. It is too early to say which model is less likely to “hallucinate” false information or provide more useful answers.

Are they safe?

Both OpenAI and Google say their models are well tested.: OpenAI says GPT-4o has been evaluated by more than 70 experts in fields including disinformation and social psychology, and Google said Gemini “conducts the most comprehensive safety assessment of any Google AI model to date, including bias and toxicity.” .

But these companies are building a future in which AI models search, learn, and evaluate the world’s information to give us concise answers to our questions. Even more than with the simplest chatbots, it is advisable to be skeptical of what they tell us.

Additional reporting by Melissa Heikkilä.

Source link

Admin