The Rise and Future of Agent-Based AI in Technology

Agent-based AI is gaining the ability to use computers, which means they can type, scroll, and click. AI agents are currently the hottest topic in the tech industry. Leading companies like Google DeepMind, OpenAI, and Anthropic are working to expand large language models with the ability to perform tasks independently. Known in technical jargon as agent-based AI, these systems have quickly become the new target of Silicon Valley. From Nvidia to Salesforce, everyone is talking about how they will transform the industry.

OpenAI CEO Sam Altman wrote in a blog post in early January, “We believe that by 2025, the first AI agents will enter the workforce and significantly change the performance of companies.” In the broadest sense, an AI agent is a software system that starts and does something, usually with minimal or no supervision. The more complex the task, the smarter the agent must be.

For many, large language models are now intelligent enough to power AI agents that can perform a wide range of useful tasks for us. These include filling out forms, looking up a recipe, and adding the ingredients to an online shopping cart. They could also use a search engine to research a topic at the last minute before a meeting and create a quick summary in bullet points.

Last October, the US company Anthropic introduced one of the most advanced agents to date: an extension of its Claude large language model called “Computer Use.” As the name suggests, Claude can be instructed to use a computer, similar to how a person would move a cursor, click on buttons, and type text. Instead of just conversing with Claude, you can now ask it to perform tasks on the screen.

Anthropic points out that this feature is still cumbersome and error-prone. However, it is already available to a handful of testers, including third-party developers at companies like the delivery service DoorDash and software developers like Canva, specializing in visual content creation, and Asana, which sells software for work organization and project management.

Computer use is just a taste of the tasks that will come to AI agents. To learn what they should be able to do in the future, MIT Technology Review spoke with Jared Kaplan, co-founder and chief scientist of Anthropic. Here are the four possibilities he sees for agent improvements by 2025.

1. AI Agents Will Better Handle Tools

“I think there are two axes to think about what AI is capable of. One is the question of how complex the task is that a system can perform. As AI systems become more intelligent, they will also get better in this direction. Another important aspect is the question of what kind of environments or tools the AI can use.”

“If you go back almost ten years and look at [Deepmind’s Go game model] AlphaGo, we had AI systems that could play board games superhumanly well. But if you can only work with a board game, then that’s a very limited environment. It’s not really useful, even if it’s very intelligent. With text models, then with multimodal models, and now with computer use—and maybe in the future with robots—you are moving towards the possibility of bringing AI into different situations and tasks and making it useful.”

“We were excited about computer use mainly for this reason. Until recently, it was necessary for large language models to give them a very specific prompt, provide them with very specific tools, and then they are limited to a certain type of environment. In my opinion, computer use will probably improve quickly when it comes to how well models can perform various tasks and more complex tasks. And they also recognize when they have made mistakes or when an important question arises, and they need to ask the user for feedback.”

2. AI Agents Will Become More Personalized

“Claude needs to know enough about your specific situation and the conditions under which you work to be useful. For example, what role you hold, what your writing style is, or what needs you and your company have.”

“I think we will see improvements where Claude can search through things like your documents, your Slack, and similar things to really learn what is useful for you. This is a bit underestimated in agents. It is necessary for systems to be not only useful but also safe and do what is expected.”

“Another point is that Claude doesn’t have to think much about many tasks. You don’t have to sit and think for hours before opening Google Docs or something similar. So I believe we will not only see more thinking work but also the application of thinking work when it is really useful and important, but also not a waste of time when it is not necessary.”

3. AI Agents Will Improve Programming Assistants

“We wanted to provide developers with an early beta version of computer use to get feedback while the system is still relatively primitive. But as these systems get better, they could be used more widely and really work with you on various activities. I think DoorDash, The Browser Company, and Canva are all experimenting with different types of browser interactions and designing them with the help of AI.”

“I expect we will also see further improvements in programming assistants. This is very exciting for developers. There is great interest in using Claude 3.5 for programming, where it’s not just about automatic completion, as it was a few years ago. It’s really about understanding what’s wrong with the code, debugging it—running the code, seeing what happens, and fixing it.”

4. AI Agents Must Be Made Safe

“We founded Anthropic because we assumed that artificial intelligence would evolve very quickly and that safety aspects would inevitably play a role. I think this will become even more apparent this year, as these agents become more integrated into our work. We need to prepare for challenges like ‘prompt injections.'” [Note: Prompt Injection is an attack where a malicious prompt is fed into a large language model in a way that the developers did not foresee or intend. One way to do this is to add such a prompt to websites that the models might visit.]

“Prompt Injection is at the top of our list of risks when considering wider use of AI agents. I think it is especially important for computer use, and we are working very actively on it because if computer use by AI agents is deployed on a large scale, there could be harmful websites or other traps that Claude could fall into.”

“And with more advanced models, the risk is simply greater. We have a robust scaling policy, where if AI systems are sufficiently powerful, we feel we need to be able to really prevent their misuse. For example, if they could help terrorists—things like that.”

“I am really excited about how AI will be useful—it is also accelerating us internally at Anthropic as people use Claude in all sorts of ways, especially in programming. But there will also be many challenges. It will be an interesting year.”