OpenAI Accused of Deleting Training Data in NY Times Copyright Lawsuit

Attorneys for The New York Times and Daily News allege that OpenAI engineers accidentally erased critical data that could be relevant to their ongoing copyright infringement lawsuit. The publishers claim OpenAI used their content without permission to train AI models like GPT-4o.

Key Developments in the Case

  • Virtual Machine Access Granted: Earlier this fall, OpenAI provided two virtual machines (software-based computers) to allow the publishers’ legal team to search for copyrighted material in its training datasets.
  • 150+ Hours of Research: Since November 1, the plaintiffs’ experts and lawyers have spent over 150 hours analyzing OpenAI’s training data.
  • Data Deletion Incident: On November 14, OpenAI engineers allegedly deleted all search data stored on one virtual machine, according to a court filing.

Impact of the Data Loss

While OpenAI managed to recover most of the deleted data, the folder structure and file names were lost, making it impossible to trace how the publishers’ articles were used in model training. This forced the legal teams to restart their investigation from scratch.

“News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,” the publishers’ counsel stated in their filing.

OpenAI’s Response

In a late Friday filing, OpenAI denied deleting any evidence and instead blamed the plaintiffs for a system misconfiguration that caused the technical issue. The company maintains that:

  • The affected drive was meant to be a temporary cache.
  • No files were permanently lost.
  • The incident does not constitute evidence destruction.

Broader Legal Context

OpenAI argues that training AI models on publicly available data, including news articles, falls under fair use. However, the company has also signed licensing deals with major publishers like:

  • Associated Press
  • Axel Springer (Business Insider)
  • Financial Times
  • Dotdash Meredith (People)
  • News Corp

While financial terms remain undisclosed, reports suggest OpenAI pays Dotdash Meredith at least $16 million annually for content licensing.

Why This Case Matters

This lawsuit highlights growing tensions between AI developers and content creators over:

  • Copyright boundaries in AI training
  • Transparency in dataset sourcing
  • Compensation models for copyrighted material

OpenAI has neither confirmed nor denied using specific copyrighted works without permission. The case continues to unfold in the U.S. District Court for the Southern District of New York.

Updated November 22 with OpenAI’s official response.


📚 Featured Products & Recommendations

Discover our carefully selected products that complement this article’s topics:

🛍️ Featured Product 1: Coach Crystal Tea Rose Purse

Coach Crystal Tea Rose Purse Image: Premium product showcase

High-quality coach crystal tea rose purse offering outstanding features and dependable results for various applications.

Key Features:

  • Industry-leading performance metrics
  • Versatile application capabilities
  • Robust build quality and materials
  • Satisfaction guarantee and warranty

🔗 View Product Details & Purchase


🛍️ Featured Product 2: Cinzia Rocca Animal Print Coat, 16

Cinzia Rocca Animal Print Coat, 16 Image: Premium product showcase

Premium quality cinzia rocca animal print coat, 16 designed for professional use with excellent performance and reliability.

Key Features:

  • Professional-grade quality standards
  • Easy setup and intuitive use
  • Durable construction for long-term value
  • Excellent customer support included

🔗 View Product Details & Purchase

💡 Need Help Choosing? Contact our expert team for personalized product recommendations!

Remaining 0% to read
All articles, information, and images displayed on this site are uploaded by registered users (some news/media content is reprinted from network cooperation media) and are for reference only. The intellectual property rights of any content uploaded or published by users through this site belong to the users or the original copyright owners. If we have infringed your copyright, please contact us and we will rectify it within three working days.