top of page

New York Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work

Introduction


In a groundbreaking legal battle that has sent shockwaves through the world of artificial intelligence (“AI”), The New York Times Company (“the Times”) has filed a lawsuit against OpenAI and Microsoft (“Defendants”), alleging copyright infringement in the use of AI-generated content. The Times claims that its copyrighted works (“Times Works”) were improperly utilized in the development and training of AI models by OpenAI and Microsoft. This legal saga has garnered widespread attention, raising significant questions about the intersection of AI technology and intellectual property rights. 

The lawsuit, first reported on December 27, 2023, by the Times itself, outlines the newspaper's contention that OpenAI and Microsoft utilized copyrighted material without proper authorization. The legal dispute is not an isolated incident, as the general timeline on AI-related lawsuits reveals a growing trend of legal challenges in the field.

The controversy gained further momentum when Reuters reported on November 21, 2023, that OpenAI and Microsoft faced another copyright lawsuit related to AI training. Subsequently, on January 5, 2024, another lawsuit was filed against the tech giants by authors Nicholas Basbanes and Nicholas Gage, who are proposing a class action lawsuit, signaling a persistent and escalating legal battle. 

OpenAI faces an existential threat in the wake of this lawsuit, and the outcome could have profound implications for the organization's future. The lawsuit stands at the crossroads of technology, media, and law, and its resolution will undoubtedly shape the future of AI development and copyright considerations.


Allegations


The crux of the Times' allegations revolves around the assertion that OpenAI's generative artificial intelligence (“GenAI”) products pose a significant threat to the quality journalism the Times produces. The newspaper argues that the traditional business models supporting quality journalism have collapsed over the past two decades, making it more challenging for news organizations to distinguish fact from fiction in today's information ecosystem. The Times contends that the protection of its intellectual property, including copyright and exclusive rights, is crucial for its ability to fund world-class journalism in the public interest.

Further, the Times highlights its efforts to control the use of its content through copyright registration, a paywall, and terms of service that set limits on copying and use. The lawsuit asserts that the Times reached out to Microsoft and OpenAI in April 2023 to address intellectual property concerns and explore a potential resolution, but these efforts have not led to a satisfactory agreement. The Times contends that if news organizations cannot control the use of their content, their ability to monetize will be harmed, resulting in fewer resources for in-depth reporting and potential untold stories, thus imposing a significant cost on society.


Copyright Law


Copyright Infringement (17 U.S.C. § 501)


Copyright infringement occurs when any person or entity exercises any of the copyright creator’s exclusive rights to the work without permission. The owner of a copyright has exclusive rights to reproduce, distribute, perform, display, license, and prepare derivative works based on their copyrighted work. Examples of copyright infringement include, but are not limited to, modifying and reproducing someone’s creative work without making significant changes or uploading copyrighted material to an accessible web page.


Vicarious Copyright Infringement 


In a vicarious copyright infringement action, a person or entity may be held liable for the infringing acts committed by another if the person or entity had the right and ability to control the infringing activities and had a direct financial incentive or interest in such activities. However, it is not necessary for the infringer to have intent or knowledge of the alleged infringement. For example, if a website allows a user to upload copyrighted material and then subsequently profits from advertising revenue generated by the content, the website could be held liable for vicarious copyright infringement. 


Contributory Copyright Infringement


Although the Copyright Act does not expressly impose liability for contributory infringement, the U.S. Supreme Court ruled that this does not preclude liability for copyright infringement by parties that have not directly engaged in the infringement. A person or entity may be held liable as a contributory infringer if they knowingly induce, cause, or materially contribute to copyright infringement by another and have knowledge or should have knowledge of the infringement. 


Digital Millennium Copyright Act - Removal of Copyright Management Information (17 U.S.C. § 1202)


Copyright management information includes any of the following information conveyed in connection with copies of a work: copyright notice, title and other identifying information, terms and conditions of use, and identifying numbers and symbols referring to the copyright management information. Without the authority of the copyright owner, individuals are  prohibited from (1) intentionally removing or altering any copyright management information, or (2) distributing or importing any copyright management information knowing that it has been removed or altered. 


Common Law Unfair Competition by Misappropriation


Unfair competition is a tort-law claim that centers around the economic harm resulting from deceptive or unfair business practices. One prime example of unfair competition arises in misappropriation cases. Taken together, the common law unfair competition by misappropriation sets out to protect valuable items that may be outside the protection offered by law (e.g., copyright law). 


Trademark Dilution (15 U.S.C. § 1125(c)) 


15 U.S.C. § 1125(c) allows an owner of a famous and distinctive mark to obtain an injunction against a later-in-time user of a mark or trade name in commerce that is likely to cause “dilution by tarnishment.” Section 2(c)(2)(C) provides that “dilution by tarnishment” is a connection that emerges from the resemblance between a famous mark and a mark that harms the famous mark’s reputation. 


Credit: Ajay Suresh | Wikimedia


Analysis of the Times’ Claims


Copyright Infringement (17 U.S.C. § 501)


First, the Times affirmed that it has exclusive, federally-registered copyrights in the literary works used by the Defendants in developing their Generative Pre-training Transformer (“GPT”) models. Further, the Times declared that Defendants infringed its exclusive rights to the works by: (1) storing, processing, and duplicating training datasets featuring millions of Times Works in order to train GPT models on Microsoft’s computing platform; and (2) unlawfully distributing generative output that incorporates Times Works. Thus, the Times requested the court to find that the Defendants jointly and directly infringed on its rights in violation of 17 U.S.C. § 501.


Vicarious Copyright Infringement


Secondly, the Times alleged that Microsoft had control over the computing platform that stored, processed, and duplicated Times Works. In addition, the Times explained that Microsoft profited from the Defendants’ infringing actions by offering products featuring GPT models that were trained using Times Works. Thus, the Times requested the court to find the Defendants vicariously liable for copyright infringement.


Contributory Copyright Infringement 


Additionally, the Times contended that Microsoft materially contributed to Defendants’ copyright infringement by supplying them with the supercomputing infrastructure and directly assisting them with: (1) assembling datasets for training GPT models, which were composed of millions of the Times’ Works; (2) training the GPT models by storing, processing, and duplicating such training datasets composed of millions of the Times’ Works; and (3) monetizing GPT models and GenAI products by supplying necessary resources. Further, since Microsoft knew or had reason to know about Defendants’ direct infringement, the Times argued that Microsoft is liable for contributory copyright infringement.


Digital Millennium Copyright Act - Removal of Copyright Management Information (17 U.S.C. § 1202)


Furthermore, the Times asserted that the Defendants made use of its works as training datasets for their GenAI models and that the Defendants knowingly eliminated or adapted the Times’ copyright-management information to model such data training datasets. Moreover, the Times proclaimed that the Defendants’ generation of outputs from GPT models resulted in unauthorized copies and derivative works. Thus, the Times demanded the court to find Defendants in violation of 17 U.S.C. § 1202. 


Common Law Unfair Competition by Misappropriation 


Next, the Times alleged that it invested considerable efforts, costs, and human capital to gather time-sensitive information and breaking news. However, the Times argued that Defendants misused such content without the Times’ consent to commit the following acts: (1) offer content through outputs generated by GPT models that directly compete with the Times; (2) appropriate commercial opportunities, including revenue; (3) alter content such as by removing links to products and divesting the Times of its referral revenue; and (4) free-riding on the Times’ efforts resulting in damages, including lost advertising and affiliate referral revenue. In light of these reasons, the Times declared that the Defendants have misappropriated its information. 


Trademark Dilution (15 U.S.C. § 1125(c))


Lastly, the Times stated that it owns various federally-registered trademarks such as “The New York Times,” “nytimes,” and “nytimes.com,” which are distinctive and famous. Pursuant to 15 U.S.C. § 1125(c), the Times claimed that the Defendants engaged in unlawful use of such marks in commerce through outputs generated by the Defendants’ GPT-based products such as ChatGPT, ChatGPT Enterprise, Bing Chat, Azure OpenAI Service, Microsoft 365 Copilot, and related tools. These tools are alleged to produce unreliable content, which is then falsely credited to the Times. Thus, as a result of such “lower quality and inaccurate writing,” the Times alleged that the writings produced by Defendants’ tools resulted in dilution by tarnishment in violation of 15 U.S.C. § 1125(c). 


The Times’ Prayer for Relief


Based on the foregoing, the Times demanded the following requests for relief:


  1. Awarding statutory damages, compensatory damages, restitution, disgorgement, and any other relief permitted by law or in equity; 

  2. Permanently enjoining the Defendants from unlawful, unfair, and infringing conduct; 

  3. Ordering destruction of all GPT or other models and training datasets that feature Times Works pursuant to 17 U.S.C. § 503(b);

  4. Awarding costs, expenses, and attorneys’ fees; and

  5. Ordering any other relief the Court deems appropriate, just, and equitable. 


Conclusion


OpenAI stated in a January blog post that it regards the Times’ lawsuit to be “without merit.” The company supports its position by claiming that AI model training is permitted by fair use and that the Times intentionally manipulated prompts to trigger a rare bug that caused the regurgitation of the Times’ articles. OpenAI has yet to file a response to the Times’ complaint, and it remains unclear whether its official response will mirror its blog post. A separate lawsuit filed by a group of well-known authors is also claiming that OpenAI used its work to train its algorithm. Legal analysts suggest that the case “could fundamentally shape the direction and capabilities of generative AI.” 

The implications for generative AI will depend on how broadly the court chooses to interpret the challenges brought by the authors and the Times. A decision in favor of the plaintiffs could force AI companies to obtain permission from authors and publishers first, opening the door for licensing agreement negotiations. A successful defense by the Defendants may allow private companies or individuals to widely scrape the internet to train AI models without any permission from authors or publishers.


*The views expressed in this article do not represent the views of Santa Clara University.

Comments


bottom of page