Anthropic Wins Key Battle in AI Copyright War—Training Data Fair Use Upheld
Another day, another courtroom skirmish in the AI intellectual property thunderdome. Anthropic just notched a critical precedent—their LLMs won't need permission slips for every scrap of training data.
Fair use flex
The ruling carves out breathing room for AI developers scrambling to feed their hungry algorithms. No more copyright shakedowns from publishers clutching their pearls about 'stolen' content—for now.
Wall Street's already pricing in the implications. Expect a fresh round of 'AI ethics' PowerPoints from C-suites suddenly very interested in fair use doctrine. (Just don't ask about their dark pools of unlicensed training data.)
The precedent sticks—until the next appeals court flips the script. In the meantime, the AI gold rush rolls on, copyright be damned.
“No carveout” from Copyright Act
The case, brought last August by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, accused Anthropic of building Claude using millions of pirated books downloaded from notorious sites like Library Genesis and Pirate Library Mirror.
The lawsuit, which seeks damages and a permanent injunction, alleges Anthropic “built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books,” to train Claude, its family of AI models.
Alsup said that AI training can be “exceedingly transformative,” noting how Claude’s outputs do not reproduce or regurgitate authors’ works but generate new text “orthogonal” to the originals.
Court records reveal that Anthropic downloaded at least seven million pirated books, including copies of each author’s works, to assemble its library.
Internal emails revealed that Anthropic co-founders sought to avoid the “legal/practice/business slog” of licensing books, while employees described the goal as creating a digital collection of “all the books in the world” to be kept “forever.”
“There is no carveout, however, from the Copyright Act for AI companies,” Alsup said, noting that maintaining a permanent library of stolen works — even if only some were used for training — “destroy the academic publishing market” if allowed.
Judge William Alsup's ruling is the first substantive decision by a U.S. federal court that directly analyzes and applies the doctrine of fair use specifically to the use of copyrighted material for training generative AI models.
The court distinguished between copies used directly for AI training, which were deemed fair use, and the retained pirated copies, which will now be subject to further legal proceedings, including potential damages.
AI copyright cases
While several lawsuits have been filed—including high-profile cases against OpenAI, Meta, and others—those cases are still in early stages, with motions to dismiss pending or discovery ongoing.
OpenAI and Meta both face lawsuits from groups of authors alleging their copyrighted works were exploited without consent to train large language models such as ChatGPT and LLaMA.
The New York Times sued OpenAI and Microsoft in 2023, accusing them of using millions of Times articles without permission to develop AI tools.
Reddit also recently sued Anthropic, alleging it scraped Reddit’s platform over 100,000 times to train Claude, despite claiming to have stopped.