Challenges in Distinguishing AI-Generated from Real Histological Images: A Study's Insights

When determining whether an image is real or AI-generated, people can identify correctly much faster than incorrectly, yet the task remains challenging. A study from the University of Jena titled “Experts Cannot Reliably Identify AI-Generated Histological Data” explored this topic. In this study, 800 participants were asked to classify real and artificial tissue section images.

In histopathology, deep-learning algorithms are increasingly used to assist pathologists in identifying and categorizing abnormalities, such as cancer in tissue samples. Medical diagnoses can be made faster and more accurately with AI’s help. Extensive datasets are required to train AI models. Besides real images, AI-generated synthetic images can be used in pre-training to improve models’ detection rates for specific cancer types. However, whether AI should be trained solely on synthetic data is a topic of debate among experts.

The study involved 800 students, 526 of whom were classified as “experts.” The rest were considered non-experts. An “expert” was defined as someone who had previously seen histological images. In medical studies, students learn about the fine structure of organs, including tissues and cells, through histology specimens examined under a microscope.

To generate artificial tissue section images for the study, DreamBooth was used to fine-tune the Stable Diffusion model. Two separate Stable Diffusion models were trained using real images of stained mouse kidney tissue: one with three images and the other with 15 images. Each model produced a batch of 100 artificial images. Four images were randomly selected from these two sets of 100 artificial images. These eight artificial images were mixed with three real training images from one model and a selection of five out of the 15 training images from the other model. Participants were then presented with a total of 16 images, one at a time, and asked to decide whether the image was real or AI-generated, or they could choose not to answer.

The non-expert group correctly classified the images 55% of the time. The expert group achieved a 70% accuracy rate. AI images from the model trained with only three real images were more often correctly identified as fake. No non-expert managed to classify all images correctly. Only 10 participants from the expert group succeeded.

Decisions were typically made within the first 30 seconds, regardless of the image shown. Experts generally took more time for each decision than non-experts. Remarkably, all participants, whether experts or non-experts, were significantly faster when they classified an image correctly than when they made a mistake. “This observation aligns with common models of perception-based decision-making,” says study lead author Dr. Jan Hartung.

As generative algorithms evolve, it becomes increasingly difficult for humans to recognize AI-generated content. Study leader Prof. Dr. Ralf Mrowka summarizes: “Our experiment shows that experience helps in recognizing fake images; however, a significant portion of artificial images still cannot be reliably identified.”

To prevent fraud in scientific publications, the study authors recommend introducing technical standards to ensure data origin. There should be a requirement to submit raw data. The use of automated tools to detect fake images is also considered.