That’s how GPT-4o and Gemini are new artificial intelligences that speak, see and even imitate human feelings.

AdminMay 19, 2024

0 205 6 minutes read

That’s how GPT-4o and Gemini are new artificial intelligences that speak, see and even imitate human feelings.

OpenAI introduced GPT-4o, a multimodal artificial intelligence model that can see, hear, speak and generate images in real time with enhanced human interaction capabilities. For its part, Google launched a project Astra integrates artificial intelligence into its services and introduces Gemini smart glasses. These innovations promise to change the way we interact with technology.

“Her”. In just three letters and one word, Sam Altman summed up on social network X the launch with which this Monday his company once again revolutionized the world of artificial intelligence. In this brief message, OpenAI’s CEO was undoubtedly referring to “Her”, AI; the personification of something that until now we considered an artificial object. But above all, Altman was referring to Her, the Spike Jonze film in which the titular character, masterfully played by Joaquin Phoenix, falls in love with his virtual assistant. a sensitive and sensitive artificial intelligence voiced by Scarlett Johansson. What was science fiction just ten years ago became reality this Monday.

In the presentation, on a stage converted into a Friends-themed room, Mira Murati, Mark Chen and Barrett Zopf, three prominent OpenAI figures, spoke to GPT-4o, the new AI model, as if she were another member of the team or a friend . The demo was not perfect, there were mistakes in the conversation, interruptions and misunderstandings. Living things. But The most remarkable thing that amazed those present at this speech was that the AI was able to recognize its mistakes, correct them, and even laugh at them..

Personalized in a voice capable of intoning, dramatizing its tone, or even speaking as if “imitating” a robot, GPT-4o had latency close to zero. Murati, Chen and Zopf smoothly interrupted her flowing conversation in which the AI was simultaneously solving math problems, telling stories or translating into Italian.

GPT-4o rendering of the battle between Google and OpenAI for AI supremacy in a 19th-century painterly style.

After the conversation, other examples could be seen, such as artificial intelligence. math lesson teenager showing the screen of his iPad, another assistant whom They asked him to be sarcastic or even a video in which two artificial intelligences were talking to each other and singing. All these videos are at the end of the article.

To achieve this atropomimetic miracle, OpenAI trained a new multimodal model capable of generating texts like ChatGPT, but also can see, hear, speak and create images. They called it GPT-4 Omni or GPT-4o citing its versatility, and it is an AI that is as accurate in its reactions as its previous version – if not better – and much faster when interacting with people.

Another novelty is that, unlike the previous model, GPT-4o is free for everyone., although the number of user interactions with this new AI is limited and once the limit is reached, the user will revert to the GPT-3.5 model.

This is the Google Astra project and its AI glasses

Google’s answer: integration and glasses

Just 24 hours after OpenAI wowed the world with its new model, It’s Google’s turn. Expectations were high and perhaps for this reason the tech giant did not surprise with its presentation as much as its main competitor. Google showed its own multimodal model calledor the Astra project. An intelligence with which the user can communicate and which, like GPT-4o, analyzes video images in real time while communicating with the user. In addition to the smartphone, Google has integrated this artificial intelligence into smart glasses equipped with cameras so that Astra sees what the user sees while they discuss it.

The biggest disappointment from Google’s presentation was that, unlike OpenAI, Astra was shown in a pre-recorded video and his responses were less natural.. However, we must not forget that Google has the power of its ecosystem, which allows it to implement its AI in services such as YouTube, Gmail or Google Docs, something that can be very valuable to the user.

This is, for example, allows Gemini (Google’s AI) to search a user’s email for all invoices for the year and automatically save them to a cloud folder or watch a YouTube video several hours long, summarize key aspects that it addresses and even answers user questions on the topics that appear in it.

‘Soon’

For now, OpenAI has only made the text chat version of GPT-4o available to users. and they assured that other features such as speech or real-time vision will be available in the coming weeks for paid users. Google, for its part, also delayed most of the news presented, without specifying dates.

With what’s currently available, users have already achieved incredible things, such as the Program a complete video game in seconds using a simple screenshot taken from the Google search engine.

Very soon there will be AI an omnipresent travel companion with whom you can communicate in natural language and it will help the user in everything he does on his computer or in the real world.

Modern artificial intelligences can now write, speak, listen, see and generate images. ~~better than~~ As a person. The next step on the path to omnipotence is that they will be able to act.

Conversations with the new GPT-4o

Error and the possibilities of interpreting feelings.

Here’s how AI translates from English to Italian in real time.

Artificial intelligence gives a teenager math lessons.

GPT-4o is sarcastic at the request of the user.

Two artificial intelligences talk to each other and sing.

Features of GPT-4o

Multimodal capabilities

GPT-4o, known as “Omni”, is a multi-modal artificial intelligence model that can generate and process text, images and audio in real time. This capability makes it a versatile tool for a variety of applications, from language translation to visual and audio content creation.

Improved speed and accuracy

The model offers almost zero latency, which means faster response and a smoother user experience. Additionally, the accuracy of its answers has improved significantly, outperforming its predecessors in several performance tests.

Improved multilingual support

GPT-4o supports over 50 languages, spoken by over 97% of people around the world.

Free access and advanced features

OpenAI has decided to make GPT-4o free for all users, albeit with restrictions on the number of interactions. Advanced features such as charting, data analysis, and photo interaction are available to all users, democratizing access to this technology.

Features of Google Gemini 1.5 Pro

Most of the features presented are not yet available or do not have a clear release date.

Gemini on Google Search

Gemini 1.5 Pro will be fed directly into Google’s search engine, allowing artificial intelligence to do the searching for us. Using its ability to track information in real time and advanced quality systems, the assistant will efficiently and accurately find the best results. This feature will make your search more intuitive and personalized.

Gemini in Google Workspace

Starting next month, Gemini will be available in the Google Workspace sidebar. Users will find a button with this AI icon that will allow them to quickly generate summaries and receive recommended actions, such as organizing receipts or helping them analyze and segment data in Google Sheets. This integration aims to improve productivity and make everyday tasks easier to manage.

Gemini on mobile phones

The Search Circle feature will be available to more users, resulting in significant improvements. One of the new features is Gemini Life, which will allow you to speak to the AI in natural language, even interrupting the assistant while it is responding. It will also add support for viewing your surroundings in real time using your smartphone camera, providing a more interactive and immersive experience.

Gemini in Gmail

Coming in September, Gmail will feature a Gemini button that lets you quickly create email summaries. This feature will be available on an experimental basis to help users manage their inbox more efficiently.

Gemini in the Google Photos app

The Google Photos app will be updated to include new features powered by Gemini. These improvements will make the app more useful and capable of offering personalized recommendations and automatic photo organization.

Gemini Advanced

Available in Europe, Gemini Advanced uses the Ultra 1.0 model and stands out for its speed and ability to handle complex tasks. From code generation and reasoning to collaboration on creative projects, Gemini Advanced surpasses many of its competitors, including ChatGPT-4. This version is included in the most advanced Google One plan, which costs €21.99 per month.

Gemini on Google Meet

The Gemini extension in Google Meet now supports 68 languages, facilitating global communication and improving collaboration in virtual meetings.

Gemini 1.5 Pro and Gemini 1.5 Flash

Google has also released Gemini 1.5 Pro and Gemini 1.5 Flash, which are faster and more efficient. Alongside these was the introduction of Project Astra, a new concept in AI-powered healthcare models designed to provide enhanced support and real-time personalization.

Updated Google search interface

The new Google search will show everything in an AI overview. While it’s currently only available in the United States, it’s expected to expand to other regions soon, promising a more structured and accessible search experience.

Automate tasks in Google Workspace

New automation features have been announced for Google Workspace, including the possible integration of assistant routines, which would address one of the major drawbacks of its use in home automation. These improvements will allow users to automate repetitive tasks and manage their time more efficiently.

Source link