Fair use limitations emerge as AI companies face copyright challenges

250227 Thomson Reuters vs. Ross Intelligence

The use of copyrighted material to train generative AI has often been justified on the grounds that the practice falls under the fair use doctrine, but a recent ruling from the US District Court for the District of Delaware may make that argument harder to sustain.

Table of Contents

A federal judge has ruled Ross Intelligence violated Thomson Reuters’ copyright when they used content from Thomson Reuters to build an AI-enhanced legal platform, challenging the widespread industry assumption that training AI on copyrighted materials constitutes fair use and raising urgent questions for AI developers and their investors.

What is copyright?

The downside of this approach is that a lot of useful data comes in the form of copyrighted material. Copyright is a legal concept that gives creators the sole right to use and distribute their original works for a limited period of time. It’s meant to incentivize creativity by ensuring that creators can benefit financially from their works. Otherwise, it might be hard for them to justify the time, energy, and money spent on creating. Traditionally, the 'fair use' doctrine has permitted the use of copyrighted material without the rightsholder’s permission in certain circumstances--for more on this subject, check out “From promoting progress to protecting innovation: AI's implications for copyright and fair use”. 

Does AI training constitute fair use?

Of course the $64,000 question is whether or not AI training constitutes fair use. Existing case law in this sphere has often revolved around the concept of transformability. Simply put, this means that the use of copyrighted material is more likely to fall under the heading of fair use if it’s used to create something new. For example, in Campbell v. Acuff-Rose Music, Inc., the Supreme Court found that 2 Live Crew was allowed to use portions of the Roy Orbison song “Oh, Pretty Woman” on the grounds of fair use since the end result was sufficiently transformative. Similarly, Sony Computer Entertainment v. Connectix Corp. and Sega v. Accolade allowed software developers to draw on copyrighted works for the purpose of creating something entirely new. 

The case of Thomson Reuters vs. Ross Intelligence

Thomson Reuters provides some of the world’s leading information databases. One of them is Westlaw, which includes a range of law-related content such as case law, statutes, journal articles, and the like. Their corpus also features original material such as headnotes–these are brief summaries of relevant points of law that often appear at the top of a published opinion when it’s included in a database.  

Ross Intelligence was looking to create their own legal-research platform which would be powered by AI. Since they wanted to train their AI on a database of legal questions and answers, they attempted to license Westlaw’s content (it’s worth noting, however, that Ross’ AI was not generative AI). However, Thomson Reuters (which owns the copyright to Westlaw’s material) refused on the grounds that Ross was essentially a competitor. 

Ross then decided to license content from a company called LegalEase which offered Bulk Memos–compendiums of legal questions along with good and bad answers. The lawyers who compiled these compendiums were encouraged to create the questions using Westlaw’s headnotes. Ultimately, LegalEase sold 25,000 Bulk Memos to Ross, which were then used to train Ross’s AI.

When the case came before the US District Court for the District of Delaware, Circuit Judge Stephanos Bibas initially considered that Ross’s use of the Westlaw material was covered by fair use. But upon reflection, he changed his mind and ruled that Ross had infringed copyright on some of Westlaw’s headnotes (a jury will decide whether other headnotes have been used in a way that constitutes infringement). At the same time, he also confirmed that the headnotes were original enough to be copyrighted.

Ross attempted to justify their use of the headnotes as fair use, but Bibas rejected their argument. In his analysis, factors 1 and 4 favored Thomson Reuters while 2 and 3 favored Ross. On balance, he ruled that Thomson Reuters had prevailed. For a critique of Bibas’ judgment, see this post from the Authors Alliance.

What are the implications of this ruling?

At the moment, this case isn’t particularly weighty. District courts are the trial courts of the US federal judiciary which means they’re the first ones to decide a case. As such, their decisions do not bind other courts. Furthermore, their decisions can be appealed to higher tribunals such as the Court of Appeals or the Supreme Court, so Judge Bibas’ ruling may well be overturned.

As noted in a post on the Davis, Wright, Tremaine blog, generative AI developers might have stronger claims to fair use since they’re working with something that can actively create new content that the courts may be inclined to view as transformative. 

However, other courts may find his reasoning persuasive and adopt it to other cases involving AI.. For now, however, the Thomson Reuters case highlights the need for developers to think carefully about the data they use to train their AIs. The safest course of action is to either use uncopyrighted material or use copyrighted material with permission from the rightsholder. The fact that the litigation with Thomson Reuters has forced Ross out of business shows how ruinous a lawsuit can be. 

A new direction or a bump in the road?

Training is a vital part of the process of creating an AI, whether generative or not. Scale is important, since more data can lead to better outcomes. This means that copyrighted work will almost always be part of the datasets. Many AI developers have sought refuge behind the shield of fair use which permits the use of copyrighted material in certain limited situations. But demonstrating fair use requires courts to engage in a careful balancing act. Judge Bilas’ finding that Ross Intelligence had violated Thomson Reuters’ copyright could ultimately influence other courts’ handling of AI-related cases. In the short term, it illustrates the importance of ethical AI training that either avoids the use of copyrighted material or only uses it with the permission of the rightsholders.  

Illustration of colorful books on a shelf against a dark background.