The doctrine of ‘fair use’ is one of the cornerstones of modern copyright law. It establishes that there are certain situations in which someone can use a copyrighted work without the owner’s permission. But if you’ve spent any time on the Internet, you know that boundaries of fair use are hotly contested. And the emergence of AI has made the waters even murkier since many of the leading AI tools have been trained on other (copyrighted) works. In this post, we’re going to look at the messy intersection of AI and copyright law.
What is copyright?
American copyright law is founded on Article I, clause 8, section 8 of the United States’ Constitution, which states that Congress can: “promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.”
The idea is that this will incentivize people to create new works by allowing them to exclusively benefit from their creations for a limited period of time. Without this, creators would find it harder to recoup the cost of creating their works or inventions. Prior to the ratification of the Constitution, copyright was a matter for the states, creating a confusing patchwork of jurisdictions. By placing this power in Congress’s hands, the Framers of the US Constitution hoped to create a more uniform system that would offer greater protections for creators, which in turn would promote the public good by “[promoting] the Progress of Science and useful Arts.” A number of court decisions have reiterated copyright’s status as a public good, including Kirtsaeng v. John Wiley & Sons, Inc., Feist Publications, Inc. v. Rural Tel. Service Co., and Fox Film Corp. v. Doyal.
What is fair use?
Copyright is not absolute, and there are circumstances in which others can use a copyrighted work. The fair use doctrine ultimately derives from an 18th century case in the English Court of Chancery. In Gyles v. Wilcox, Lord Hardwicke ruled that an abridgement of another work “may with great propriety be called a new book, because not only the paper and print, but the invention, learning, and judgment of the author is shown in them.” In America, fair use has been codified by section 107 of the Copyright Act of 1976, which established that “the fair use of a copyrighted work…for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.” The statute went on to establish a four-pronged test for evaluating fair-use claims that requires courts to consider:
- the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
- the nature of the copyrighted work;
- the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
- the effect of the use upon the potential market for or value of the copyrighted work.
Fair use is determined on a case-by-case basis, and judges have spilled gallons of ink as they wrestle with the concept.
Relevant caselaw
- Hustler Magazine, Inc. v. Moral Majority, Inc. When Hustler publisher Larry Flynt made derogatory remarks about the Rev. Jerry Falwell in the pages of his magazine, Falwell made thousands of copies of the offending passage and distributed them as part of his fundraising efforts. The US District Court ultimately held that this constituted fair use since Falwell’s actions didn’t have a negative effect on Hustler’s profitability.
- Authors Guild, Inc. v. Google, Inc. In the midst of Google’s efforts to make books available in digital form via Google Books, the Authors Guild of America and the Association of American Publishers filed suit, alleging that Google was perpetrating copyright infringement. Despite efforts to settle the case out of court, the matter ultimately went to trial and the US District Court for the Southern District of New York ruled in favor of Google. That judgment was sustained on an appeal to the Second Circuit Court of Appeals, with the court affirming that Google’s treatment of the copyrighted work was sufficiently transformative. However, a different judge of the US District Court for the Southern District of New York ruled that the ‘National Emergency Library’ created by the Internet Archive during the start of the COVID-19 pandemic was not fair use. The case, Hachette Book Group, Inc. v. Internet Archive, is currently working its way through the court system.
- Twin Peaks v. Publications Int’l, Ltd. In this case, the US District Court for the Southern District of New York held that an unauthorized guidebook to the television series Twin Peaks was not fair use because it made extensive use of direct quotes and included detailed episode synopses. The court ruled that this could make it harder for the show’s creators to market an official guidebook.
Enter the robots
Fair use claims have always been tricky to adjudicate, but the emergence of powerful AI tools has added a whole new layer of complexity. For example, ChatGPT draws on a wide range of Internet content to formulate its answers, while Stable Diffusion is trained on an equally vast tranche of visual images. In both cases, the developers have made use of copyrighted material.
Existing case law would seem to suggest that this is permissible. In Google LLC v. Oracle America, Inc., the Supreme Court held that Google’s use of Java’s application programming interfaces and portions of its source code to make early versions of their Android operating system constituted fair use. Other cases, such as Sony Computer Entertainment v. Connectix Corp. and Sega v. Accolade have similarly ruled that software developers can draw on copyrighted works for the purpose of creating something new and therefore transformative. Writing for the Court in the Google case, Justice Stephen Breyer argued that fair use provided “a context-based check that can help to keep a copyright monopoly within its lawful bounds.”
It’s also worth noting that not every aspect of a copyrighted work is protected. For example, if you publish a journal article that reveals previously unknown factual information on a subject, your article is protected by copyright but the facts are not. Although AI must absorb the entirety of a work to glean these facts, copyright lawyer Brandon Butler has argued that this is the equivalent of a human author reading a work and then reusing the facts for their own purposes.
On the other hand, Cala Coffman of the Copyright Alliance takes a more nuanced approach. She argues that the question of fair use will ultimately depend on the circumstances of each case and so “the unauthorized use of copyrighted material to train AI systems cannot be handwaved by a broad fair use exception that disregards the rights of creators.” She also identifies existing case law that might lead courts to take a more skeptical approach. For example, in American Geophysical Union v. Texaco Inc., the court ruled that Texaco’s internal policy of photocopying journal articles and distributing them to their scientists did not constitute fair use because it undermined the journal publishers’ ability to derive value from their work. It also found that Texaco could have simply licensed the articles. Furthermore, fair use cannot be applied to a process, but only a specific work. Coffman argues that the use of copyrighted material to train AI is analogous to Texaco’s photocopying policy and notes that AI developers could simply obtain a license to use copyrighted material.
Coffman also cites the case of Fox News v. TVEyes in which the court decided that TVEyes’ decision to offer Fox’s content via their own subscription service was not covered by fair use because it was only modestly transformative. It also undercut Fox’s existing licensing regime by offering another method of obtaining their content.
To make matters even more complicated, Daniel J. Gervais of Vanderbilt University’s Law School has argued that, while training AI on copyrighted material constitutes fair use, generating content using that material might not be fair use!
Of course, it’s worth remembering that this debate isn’t just a clash of abstract legal principles. There’s a human element here that can be easy to overlook. Take the case of Hollie Mengert. She’s an illustrator who works for Disney who made headlines recently when a Redditor taught Stable Diffusion to replicate her art style. In an interview with Waxy, Mengert noted that “I kind of feel like when [the Redditor] created the tool, they were thinking of me as more of a brand or something, rather than a person who worked on their art and tried to hone things, and that certain things that I illustrate are a reflection of my life and experiences that I’ve had. Because I don’t think if a person was thinking about it that way that they would have done it. I think it’s much easier to just convince yourself that you’re training it to be like an art style, but there’s like a person behind that art style.”
Where do we go from here?
How can the rights of creators over their work be respected without stifling the kind of creativity that has greatly enriched our culture? Unfortunately, there are no easy answers. On the surface, the creation of a robust licensing regime that allows AI developers to legitimately obtain permission for the use of copyrighted material could offer a solution, much like how the rampant music piracy of the early 2000s led to the creation of legitimate digital streaming services such as Spotify and iTunes (however, some experts have questioned whether this is even possible in the case of AI). Other possible solutions include the creation of funds to compensate creators for the use of their material, the creation of metadata tags that allow creators to ‘opt out’ of having their work used for AI training purposes, or the creation of ethically sourced datasets that developers can use for training purposes. But while these are welcome developments, they’re not panaceas. For example, allowing creators to opt out doesn’t mean much when the process is cumbersome and easily circumvented.
Whatever happens, the only certainty seems to be that we’ll be talking about AI for a long time to come.
For more information about AI, check out “How to spot AI-generated text and imagery.”