The Growing Potential and Challenges of AI-Driven GUI Automation

The potential of GUI assistants is enormous. AI assistants based on large language models are becoming increasingly powerful. They can not only understand graphical user interfaces (GUIs) like humans but can also potentially operate them by clicking buttons, filling out forms, and navigating between applications.

A Microsoft research team has explored this potential, and the benefits for businesses could be significant. In practice, this means users no longer need to learn complex software commands. Instead, they simply give their request in natural language to the GUI agent, which then automatically performs the necessary actions.

Major tech companies are already working on integrating these features into their products. In a study involving Microsoft, the role of Power Automate is highlighted: the cloud-based software allows users to automate recurring tasks. Large language models (LLMs) are used to help create automated workflows across various applications. Microsoft’s AI Copilot can also control software directly based on text commands. However, Microsoft is not the only company working on such features.

Anthropic’s smart chatbot, Claude AI, can interact with web interfaces and perform complex tasks. Google is working on a similar project called Jarvis, which uses the Chrome browser to conduct web-based tasks like research, shopping, and travel bookings. This project is still under development and not publicly available yet.

According to analysts from BCC Research, the market for AI-driven GUI automation could reach a value of $68.9 billion by 2028. This is due to the growing trend in companies to automate repetitive tasks and make software more accessible to less tech-savvy users. Industry experts believe that by 2025, about 60 percent of large companies will use such systems, leading to significant efficiency gains.

However, before this technology is widely adopted, significant hurdles must be overcome. These include privacy concerns when handling sensitive information and limitations in computing power. Companies need to carefully examine the impact on security and infrastructure before they can profitably deploy LLM-based GUI agents. At least in Germany, this could take some time.