Unlocking the Potential of GPT-4 'Understand Images' Function: A Review

The long-awaited “Understand images” function of the AI language model GPT-4 is finally available to everyone. However, its full potential only manifests itself with the paid version of ChatGPT, while the image recognition function in the free version of Bing Chat has a noticeably reduced quality. The GPT-4 Vision concept was tested intensively in advance.

During a live demonstration, ChatGPT or Bing Chat was presented with a picture of a plate of food and asked for the corresponding recipe. The immediate response was the recipe for pasta in carrot sauce. Next, a photo of a broken curtain rod was shown and a repair method was requested. A detailed answer was then provided. Even when an unknown object was presented, ChatGPT was able to assist and identify the object.

In another scenario, ChatGPT provided the ability to scan algebra homework and immediately delivered the correct result, including a diagram. When a handwritten website design was presented and ChatGPT was asked for a digital version, the program delivered the corresponding code in HTML and Javascript. This code actually worked and the end result was visually appealing.

Of course, you also have to be careful. If you take a closer look at the examples given, there are some obvious errors. Although the technology is not yet fully developed, it represents impressive progress in the field of artificial intelligence. The “understand images” function also has the potential to make the lives of people with disabilities, such as the visually impaired, much easier and improve their quality of life.

The further progress of this technology is being watched with great interest and we are excited to see what other achievements it will bring in the future. The potential applications it already offers are impressive and it will be exciting to see how they develop and what new fields of application will emerge in the future.