AI Agent Anthropic Wants AI to Really Change Our Lives

Our computers do what we want, but they do it in stages and always after our various interactions. If we want to delete a file, we need to open File Explorer, go to the folder where it is located, select it and send it to the trash. What if we could simply tell the computer to find a file and delete it with one command?

What That’s what Claude’s Computer Usage feature offers.Chatbot Anthropic. Introduced this week in a quasi-experimental format, the feature allows us to command our computer to do things we would normally need to do with a mouse and keyboard.

This is one of the most compelling demonstrations of how AI agents can actually make our lives easier. From chatbots that give us answers (“this is a good hotel”), we can move to agents who do something based on those answers (“I booked this hotel for you on Saturday”). Paradigm change is brutal.

Generative AI seems stagnant. Big Tech thinks they have an ace up their sleeve: "agents" do something for us

This makes us think that our computers and especially our mobile phones will decide our lives. At least a little. They will become secretaries/butlers from whom we can ask things as if we were asking them from a human secretary.

The magic is in something that seems trivial, but isn’t: Anthropic’s new AI systems can see what’s on the screen, recognize it, and take actions using your mouse and keyboard.

This is similar to what Microsoft offers with Windows Recall or Google’s new Pixel Screenshots option, as both take screenshots to recognize, label, and analyze information from those snapshots. In such cases, the goal is to be able to review this information. With “Computer Use” we can act on it.which is a very important step forward.

Some users have already shown their potential

At the moment, Anthropic’s proposal can be tested in a limited way: they wanted to avoid problems. As expert Simon Willison explained in his tests: “This feature is used in a Docker container running Ubuntu 22.04, pre-configured with several applications and a VNC server, allowing us to see everything on our screen.

He ran some simple tests, such as going to his website and checking to see if he had written anything about “pelicans”, compiling and running a typical “Hello World” in C, installing the ffmpeg package on Ubuntu, or trying to solve Sudoku. something that Claude apparently failed miserably at.

These tests, carried out in an isolated Anthropic environment, soon turned out to be something of a curiosity, but showed that you can go much further. McKay Wrigley, an artificial intelligence expert, showed in a video on X (formerly Twitter) how he was able to control his iPhone by connecting it to a MacBook Pro in mirror mode and then using commands to perform various actions on the mobile phone’s screen.

Screenshot 2024 10 25 V 13 29 05
Screenshot 2024 10 25 V 13 29 05

The examples shared by many users on platforms like X are astounding and demonstrate two things. First, this technology is just taking its first steps and therefore has important limitations in both speed and power. Second thing is its potential is enormous.

We see this, for example, in tests that one user ran trying to get a chatbot to play Doom standalone, another who tested it in combination with Figma to design a user interface, another who ordered a pizza, or another who created an app for Windows, macOS and Linux, with which you can control our computer.

Another developer simply named “killian” in the Result: get out of the Anthropic sandbox to effectively automate work on your computer.

Screenshot 2024 10 25 V 14 00 26
Screenshot 2024 10 25 V 14 00 26

The developer warned: interaction is not particularly fastand Claude takes his time before doing each action, because he has to analyze the entire screen and gradually perform actions that, in theory, will lead to what we asked of him.

There is another important factor here: cost. To use all these features, we will need Claude Credits, and these credits cost money. User nicknamed “nearby” on The current price of this API is $15 per million output tokens ($3 per input tokens, your queries), so ordering such food was quite expensive.

However, it is normal that the first experiments are expensive: the technology is still quite “green”, and the consumption of resources is noticeable. However, Efficiency and cost are expected to improve significantly.which will theoretically give us access to much more powerful options in the coming months. This certainly seems to be one of Anthropic’s bets in this proposal, and of course this option is very, very promising.

Image | Danhasnotes with Midjourney

In Hatak | Microsoft is starting to offer autonomous AI agents. At the moment these are hypervitaminized IFTTT recipes.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button