Legal Challenges of Using Copyrighted Works in AI Training

The development of artificial intelligence has revealed a major legal challenge: Can AI companies use other people’s works to train their systems without permission? Professor Tim W. Dornis from Leibniz University Hannover sees clear problems here. Together with AI Professor Sebastian Stober from Otto-von-Guericke University Magdeburg, he explored whether training AI models falls under Text and Data Mining (TDM), which AI companies often cite. Their study, commissioned by the Copyright Initiative, is titled “Copyright and Training of Generative AI Models – Technological and Legal Foundations”.

The EU directive on TDM allows the use of copyrighted works for research purposes under certain conditions. However, Dornis argues that generative AI training goes beyond classic TDM. He believes AI training replicates the statistical properties of data, rather than simply filtering information. He illustrates this with images: AI models can reproduce specific elements like a sofa from training data almost identically. “And this is the part of generative AI activity that violates copyright,” he states.

The legal situation varies significantly between the USA and Europe. “Fair Use is the keyword in the USA,” says Dornis, allowing use if it serves the community. In European law, however, there is a catalog of specific exceptions. These different legal interpretations lead to complex legal questions, especially for international companies. “Copyright is territorial,” Dornis emphasizes. American companies might argue they haven’t violated copyright in Germany because they operate under the Fair Use Act in the USA.

Ongoing court cases highlight the complexity of the situation. The GEMA has sued OpenAI in Germany. It must first prove that OpenAI has acted in Germany. If successful, it must also demonstrate that ChatGPT’s offering in Germany constituted a copyright infringement. A quick solution is not in sight. “We expect a period of one to three years before fundamental decisions are made,” predicts the legal scholar. It is also clear: “More legal proceedings will follow.” Rights holders feel pressured and find the current situation “burdensome and harmful.”

The possibility of an opt-out for rights holders to block their works from AI training is viewed critically by Dornis. He doubts the effectiveness and enforceability of such reservations, especially for individual creators. Licensing agreements with major publishers could be a way forward, but calculating fair compensation for the use of training data remains a challenge.

Despite this, Dornis does not see the debate as stifling innovation. “It’s not about preventing technologies, but creating fair conditions.” The central question remains: How can creatives be adequately compensated for their contributions to AI training? He expects a market stabilization in the long term, where the costs for using copyrighted works are passed on to consumers. “Things we currently use for free,” could then “only be available for a fee.” Personally, he could live well without seeing “a new artificial cat image every day.”