Judge Pauses Anthropic’s $1.5 Billion Copyright Settlement
Anthropic PBC entered into a proposed class-wide settlement that would resolve Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal.), a high-profile copyright infringement action challenging the company’s use of millions of books to train its Claude large language models.
However, on September 8, Judge William Haskell Alsup denied preliminary approval of the settlement without prejudice, requesting additional information and a revised notice and claims protocol. The agreement, which remains subject to court approval, underscores the escalating financial exposure generative artificial intelligence (GenAI) developers may face depending on how they gain access to works to be incorporated into training datasets.
Background
Anthropic compiled a “central library” of copyrighted books and used those books to train the Claude large language models (LLMs), obtaining those works in two primary ways.
- Downloading millions of pirated copies from sites such as LibGen and PiliMi.
- Lawfully purchasing millions of physical books that Anthropic subsequently digitized.
The plaintiffs claimed the company’s practices infringed their exclusive copyrights and enabled Claude to generate competing content that could displace demand for the originals.
Anthropic asserted a fair-use affirmative defense. On summary judgment, Judge Alsup held partially in favor of Anthropic and partially in favor of the plaintiffs. Specifically, he held that:
Anthropic’s use of the books at issue to train LLMs for the purpose of returning new text outputs is “spectacularly” transformative and therefore a fair use.
Anthropic’s digitization of books it purchased in print form for use as part of its central library was a fair use because the digital copies were a replacement of the print copies it discarded after digitization.
Anthropic’s use of pirated copies of books in its central library was not a fair use.
Class Certification and Settlement Pressure
After the court certified a class comprising holders of works found in the LibGen and PiliMi libraries, Anthropic faced potential statutory damages approaching hundreds of billions of dollars (up to $150,000 per infringed work if willful). With trial approaching and its Ninth Circuit Rule 23(f) appeal pending, Anthropic negotiated the present settlement.
Monetary Relief: Minimum cash payment of $1.5 billion (approximately $3,000 per work for roughly 500,000 class members) plus statutory interest.
Injunctive Relief: Destruction of all datasets containing pirated works.
Release of Claims: Class members would release past claims arising from Anthropic’s ingestion of their works; claims based on infringing outputs would be preserved.
Claims Process: At the court’s direction, participation is “opt-in.” Rightsholders that decline to opt in may pursue individual actions.
Judge Alsup denied preliminary approval of the proposed settlement without prejudice on September 8. He requested that, by September 15, the parties provide a definitive list of affected works and a revised notice and claims protocol that would protect absent class members and Anthropic from later duplicative suits.
Key Takeaways
Sizeable Data Needs vs. Copyright Risk
GenAI developers routinely ingest massive corpora. When those datasets include copyrighted works obtained through pirating websites, potential statutory damages can dwarf the cost of early licensing or settlement.
Fair-Use Law Remains Unsettled in Other Pending Cases
While Judge Alsup found certain uses of copyrighted works to train GenAI qualify as fair use, there are many other copyright infringement cases pending against GenAI developers. Given the fact-intensive nature of the fair use doctrine, these cases may have outcomes that differ from Judge Alsup’s reasoning, but the specter of “bet-the-company” damages may drive litigants to settle before the courts weigh in.
Class Certification as Leverage
Given the court’s willingness to certify expansive classes under Rule 23 can increase financial exposure, GenAI developers may be encouraged to license works for their training datasets, even when there is a potential for fair-use defenses.
Notice and Claims Administration
Courts scrutinize proposed settlement mechanics — particularly opt-in vs. opt-out structures — when the stakes are extraordinary. Robust, transparent procedures are critical to approval.
While the resolution of Bartz v. Anthropic leaves many questions unanswered, this case suggests that legal and financial risks related to AI training data may be influenced by factors such as dataset provenance and licensing, the cost and feasibility of content licenses, evolving case law and legislation, and the potential for significant statutory damages.
Contacts
- Related Industries
- Related Practices