On June 14, 2025, a federal courtroom in San Francisco became the stage for a landmark copyright fight. Judge William Alsup issued a split ruling in Bartz v. Anthropic. The plaintiffs claimed that Anthropic had copied millions of books without permission to build its large-language model, Claude.
The judge’s decision is the first to tackle head-on whether training modern AI on copyrighted text can qualify as fair use, and whether it matters how that text is acquired. The answers could shape every generative-AI project in the United States.
Judge Alsup said the training itself was legal. In his view, letting Claude read entire books is “quintessentially transformative.” The model studies each sentence for grammar and style, then writes fresh text instead of repeating pages. In classroom terms, it acts like a student who reads many novels, learns pacing and tone, and later pens an original essay.
Yet the judge drew a bright red line at data sourcing. Anthropic had downloaded more than seven million pirate copies from shadow libraries such as Books3, LibGen, and PiLiMi. Building a “central library” from stolen files, he wrote, is clear-cut infringement. A jury trial in December will now decide damages. Statutory awards start at seven hundred fifty dollars per book and can reach one hundred fifty thousand if jurors find that the copying was willful.
Online reaction was swift. On Reddit, one user called the outcome “not a huge win—more like a win,” capturing its two-edged nature. Another joked that companies might “just keep it offshore,” while a third pointed out that the court condemned only a library “filled with pirated books,” not a lawful in-house archive.
These comments reflect the confusion and strategy questions now racing through the tech world.
Anatomy of the Ruling: Green Light, Red Line
Judge Alsup gave two answers that shine like traffic signals. The green light covers training. If Claude reads books it has a right to access, the use is transformative; therefore, it is fair. The red line covers sourcing. When Anthropic grabbed pirate copies, the judge called that simple theft. A creative goal does not excuse an illegal starting point. In his view, you cannot break into a library, copy every book, and expect applause for learning from them tomorrow.
Reddit users quickly tested this logic. One wrote that a United States company “is still under U.S. law no matter where the servers sit,” pushing back on the offshore workaround. The back-and-forth shows how the ruling ties legal risk to each step in the data chain. Purpose must be transformative; provenance must be clean. Fail either test and the whole project can tip.
The stakes are huge. Even the lowest fine; about seven million books at $750 each, passes $5 billion. A willful finding could push the bill into nightmare territory, proving that sloppy sourcing can cancel out any legal win on training.

Fair Use Up Close: Why Copying Everything Can Still Be Legal
Fair use is the rule that lets people borrow parts of a work without asking first. It lives in U.S. copyright law and has four guiding questions, or “factors,” that judges must weigh. Judge Alsup walked through each one. His answers explain why an AI can scan an entire novel yet stay on the right side of the line.
First factor: purpose and character of the use
Claude did not save the books so people could read free versions. Instead, the model studied them to learn sentence rhythm, grammar, and plot structure.
This learning task is different from enjoying the story itself. In court words, that makes the use “transformative.” A transformation means the new use changes the old work’s role or adds something new.
Here, the change is turning many books into skills for writing fresh text. Because this purpose is far from selling copies, the first factor favored Anthropic.
Second factor: nature of the work
Most books are creative. Creative works usually get strong protection. Therefore this factor tilted toward the authors. Yet it carried less weight because the first factor was so strong the other way.
Third factor: amount used
Anthropic copied one hundred percent of each book. Normally, that looks bad. However, Judge Alsup said the model needed the full text to learn patterns. Copying everything was reasonable for that lesson. Because the goal was learning, not republishing, the judge treated this factor as neutral.
Fourth factor: effect on the market
The authors did not show that Claude spat out passages that could replace their books. Readers still have to buy The Feather Thief or The Lost Night if they want the real stories. Since sales were not harmed, this factor also leaned toward fair use.
After adding the four answers together, the court handed Claude a green light. Full copying became lawful because it served a new educational goal and posed no clear threat to book sales.
Anthropic also bought paperbacks, scanned them, and shredded the originals. The court said this print-to-digital trade is lawful. Anthropic owned each copy, made only one digital file to replace it, and kept that file inside the company. The process echoes the Sony Betamax ruling that let families tape shows for home use.
Some Reddit voices doubted the cost. One warned that “web hosting costs are gonna kill small business,” hinting that only deep-pocket firms can afford such clean-room methods. The ruling suggests they may be right; lawful scans provide legal armor but raise the price of entry.
Piracy Hard Stop: Where the Risk Begins
Anthropic’s pirate library is the case’s Achilles’ heel. Emails show leaders chose piracy to dodge “the legal slog” of licensing. That note could drive a willfulness finding. One commenter quipped, “If you don’t get caught…” but the court record proves they did.
Liability is settled. Only the bill remains, and it could shake the company’s finances. The episode warns every lab: money saved on data today may cost far more tomorrow.
From Pirate Gates to Licensing Lanes: Business Fallout
The ruling rewrites the AI playbook. “Scrape first, apologize later” now reads like a confession. Roadmaps start with clean data, clear licenses, and detailed logs. Large firms backed by Amazon, Microsoft, or Google can absorb the cost; many startups may pivot to narrower models or fold.
Rights holders hold stronger cards. Rather than chase model outputs, they can target dirty data sources. A single pirated title can trigger ruinous damages, so publishers may push for blanket licenses or a text-rights collective, something like ASCAP for books.
One Reddit user predicted the fight “will go to the Supreme Court” because fair-use rules vary by circuit. That legal journey may take years. Until then, two rules guide every builder: show your use is transformative, and show your data is clean. Skip either step and the gamble may swallow the gain.