Judge Rules AI Book Training May Be Fair Use, But Piracy Claims Proceed

Summary:
A U.S. judge has ruled that using copyrighted books to train artificial intelligence models can qualify as “fair use” under copyright law. However, AI firm Anthropic must still face trial over allegations it used pirated book copies in its training process.

Legal Milestone in AI Copyright Dispute

A U.S. federal judge has delivered a significant ruling in a closely watched copyright case involving the use of books to train artificial intelligence systems. Judge William Alsup found that Anthropic, an AI company backed by tech giants Amazon and Alphabet, made a “transformative” use of authors’ works while developing its Claude language model. This means the company’s actions could be legally protected under the doctrine of fair use.

Despite this, the judge declined to dismiss the case entirely, ruling that Anthropic will still need to go to trial over its alleged use of pirated book copies to train its AI system.

Authors Challenge Anthropic Over Use of Books

The case was brought forward in 2023 by three authors: Andrea Bartz, known for her mystery thrillers such as The Lost Night and We Were Never Here, Charles Graeber, author of The Good Nurse, and Kirk Wallace Johnson, who wrote The Feather Thief. They accused Anthropic of using unauthorized digital copies of their copyrighted works to develop a multi-billion dollar AI business.

According to the lawsuit, Anthropic is alleged to have stored over seven million pirated books in a centralized training library—an accusation the court found serious enough to warrant further examination at trial.

What the Judge Said

Judge Alsup acknowledged that Anthropic’s training of its AI model on these books appeared to be “exceedingly transformative.” He wrote that the firm’s language models did not aim to replicate or replace the original works, but to develop new content in a different context.

However, the judge emphasized that the legality of storing pirated copies of books for training purposes remained unresolved. While the use of the materials might fall under fair use, how those materials were obtained is still a critical legal issue.

He further noted that the plaintiffs did not provide evidence that the AI generated direct reproductions or “knockoffs” of their works—something that could have significantly altered the outcome.

Broader Implications for the AI Industry

This case is one of the first major legal decisions addressing how Large Language Models (LLMs) can utilize copyrighted content. As AI tools increasingly draw from vast pools of online content, including books, music, articles, and videos, similar lawsuits have emerged across the media landscape.

Earlier this month, Disney and Universal filed a lawsuit against AI art generator Midjourney, alleging copyright infringement. The BBC has also expressed concern about unauthorized AI use of its content and is exploring legal options.

In response to mounting pressure, some AI firms have begun negotiating licensing agreements with publishers and creators to access content legally.

Anthropic Responds

Anthropic welcomed the court’s recognition that its use of the authors’ works was transformative. However, it disagreed with the decision to proceed to trial over how the books were obtained and stored. The company said it is reviewing its legal options and remains confident in its defense.

Lawyers representing the authors declined to comment on the ruling.

Source: BBC News

gyanugenai