Anthropic is the latest AI firm to be sued for copyright infringement, with three authors filing a class-action lawsuit in California.
AI models are notorious for scraping the web, devouring content in an effort to learn and improve. Unfortunately, the practice has raised a myriad of legal and ethical questions regarding copyright and ownership of content. OpenAI and Microsoft have already been sued, and Perplexity AI has faced allegations of ignoring the Robots Exclusion Protocol and scraping websites without permission.
Anthropic now joins the list of AI companies facing legal consequences related to allegations of unauthorized use of copyrighted material. Writers Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson have filed a class-action lawsuit, saying Anthropic trained its AI models on their books, as well as those of other writers.
The complaint accuses Anthropic of blatant theft.
Anthropic has built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books. Rather than obtaining permission and paying a fair price for the creations it exploits, Anthropic pirated them. Authors spend years conceiving, writing, and pursuing publication of their copyrighted material. The United States Constitution recognizes the fundamental principle that creators deserve compensation for their work. Yet Anthropic ignored copyright protections. An essential component of Anthropic’s business model—and its flagship “Claude” family of large language models (or “LLMs”)—is the largescale theft of copyrighted works
The plaintiffs go on to accuse Anthropic of knowingly using pirated copies of the writers’ works, and then going to great lengths to hide the extent of its actions.
Plaintiffs are authors of an array of works of fiction and nonfiction. They bring this action under the Copyright Act to redress the harm caused by Anthropic’s brazen infringement. Anthropic downloaded known pirated versions of Plaintiffs’ works, made copies of them, and fed these pirated copies into its models. Anthropic took these drastic steps to help computer algorithms generate human-like text responses.
Anthropic has not even attempted to compensate Plaintiffs for the use of their material. In fact, Anthropic has taken multiple steps to hide the full extent of its copyright theft. Copyright law prohibits what Anthropic has done here: downloading and copying hundreds of thousands of copyrighted books taken from pirated and illegal websites.
The plaintiffs then make the case that Anthropic would not have a commercially successful product if it hadn’t stolen their work, and the work of countless others, in its efforts to improve its AI models.
Anthropic’s immense success is a direct result of its copyright infringement. The quality of Claude, or any LLM, is a consequence of the quality of the data used to train it. The more high-quality, longform text on which an LLM is trained, the more adept an LLM will be in generating lifelike, complex, and useful text responses to prompts. Without usurping the works of Plaintiffs and the members of the Class to train its LLMs to begin with, Anthropic would not have a commercial product with which to damage the market for authors’ works. Anthropic has enjoyed enormous financial gain from its exploitation of copyrighted material. Anthropic projects it will generate more than $850 million of revenue in 2024. After ten rounds of funding, Anthropic has raised $7.6 billion from tech giants like Amazon and Google. As December 2023, these investments valued the company in excess of $18 billion and is likely even higher today.
Anthropic has gone to great lengths to position itself as the responsible AI firm, one focused on using AI for the betterment of humanity, while developing it in a responsible manner that directly addresses concerns about the damage AI may do. The plaintiffs take issue with that stand, pointing out that Anthropic’s alleged behavior would be in direct conflict with those goals.
Anthropic’s commercial gain has come at the expense of creators and rightsholders, including Plaintiffs and members of the Class. Book readers typically purchase books. Anthropic did not even take that basic and insufficient step. Anthropic never sought—let alone paid for—a license to copy and exploit the protected expression contained in the copyrighted works fed into its models. Instead, Anthropic did what any teenager could tell you is illegal. It intentionally downloaded known pirated copies of books from the internet, made unlicensed copies of them, and then used those unlicensed copies to digest and analyze the copyrighted expression-all for its own commercial gain. The end result is a model built on the work of thousands of authors, meant to mimic the syntax, style, and themes of the copyrighted works on which it was trained.
Anthropic styles itself as a public benefit company, designed to improve humanity. In the words of its co-founder Dario Amodei, Anthropic is “a company that’s focused on public benefit.” For holders of copyrighted works, however, Anthropic already has wrought mass destruction. It is not consistent with core human values or the public benefit to download hundreds of thousands of books from a known illegal source. Anthropic has attempted to steal the fire of Prometheus. It is no exaggeration to say that Anthropic’s model seeks to profit from strip-mining the human expression and ingenuity behind each one of those works.
This is not the first time Anthropic has been criticized for its content scraping activities. Electronics repair site iFixit recently called the company out for hitting its “servers a million times in 24 hours.” In response, other companies and organizations described similar behavior.
If Anthropic truly wants to set itself apart as a responsible AI firm, it needs to do a better job in how it approaches issues of copyright, web scraping, and AI model training.