Meta Employees Discussed Using Copyrighted Content for AI Training, Court Documents Show

Newly unsealed court filings reveal that Meta employees internally debated using copyrighted materials—including books—to train the company’s AI models, despite potential legal risks. The documents, part of the ongoing Kadrey v. Meta lawsuit, shed light on Meta’s approach to sourcing training data for its AI systems, including the Llama family of models.

Key Revelations from the Court Filings

  • Internal Discussions on Copyrighted Works: Employees, including senior managers, openly discussed acquiring copyrighted books without explicit permission, adopting an “ask forgiveness, not permission” approach.
  • Avoiding Publisher Deals: Some staffers suggested purchasing e-books at retail prices instead of negotiating licensing agreements with publishers to expedite data collection.
  • References to Pirated Content: Discussions included using Libgen, a controversial platform known for hosting pirated books, as a potential data source despite its legal issues.

“Ask Forgiveness, Not Permission” Mentality

According to the filings, Xavier Martinet, a Meta research engineer, advocated for acquiring books first and seeking executive approval later, stating:

“This is why they set up this gen AI org: so we can be less risk-averse.”

Martinet argued that many startups were likely already using pirated books for training, minimizing the perceived legal risk.

Legal Concerns and Mitigations

Melanie Kambadur, a senior manager on Meta’s Llama team, acknowledged the need for licenses but noted that Meta’s legal team had become “less conservative” in approving data sources. The company also explored ways to reduce exposure, such as:

  • Filtering out Libgen files marked as “pirated” or “stolen.”
  • Avoiding public disclosure of training data sources.
  • Configuring models to reject prompts that could reveal copyrighted training material.

Broader Implications for AI Development

The case highlights the ethical and legal challenges facing AI companies as they seek vast amounts of training data. Meta’s internal debates reflect a broader industry tension between innovation and intellectual property rights.

Other Notable Findings

  • Reddit Data Scraping: Filings suggest Meta may have scraped Reddit data, possibly mimicking third-party tools like Pushshift.
  • Expanding Data Sources: Meta leadership considered overriding past restrictions on training data, including licensed books and Quora content, due to insufficient internal datasets.

Legal Battle Intensifies

The plaintiffs—including high-profile authors Sarah Silverman and Ta-Nehisi Coates—have amended their complaint multiple times, alleging Meta knowingly used pirated books. In response, Meta has bolstered its legal team with Supreme Court litigators from Paul Weiss, signaling the case’s high stakes.

Meta has yet to comment publicly on the filings. The outcome of Kadrey v. Meta could set a precedent for how copyrighted material is used in AI training moving forward.


📚 Featured Products & Recommendations

Discover our carefully selected products that complement this article’s topics:

🛍️ Featured Product 1: Geekria Audio Cable Compatible with WH-1000XM5 WH-1000XM4 WH-1000XM3 WH-XB910N WH-XB900N WH-CH520 WH-CH720N INZONE H5 Headphones Cable, 18″ (3.5m…

Geekria Audio Cable Compatible with WH-1000XM5 WH-1000XM4 WH-1000XM3 WH-XB910N WH-XB900N WH-CH520 WH-CH720N INZONE H5 Headphones Cable, <sup>1</sup>⁄<sub>8</sub>&#8243; (3.5m&#8230; Image: Premium product showcase

Premium quality geekria audio cable compatible with wh-1000xm5 wh-1000xm4 wh-1000xm3 wh-xb910n wh-xb900n wh-ch520 wh-ch720n inzone h5 headphones cable, 18″ (3.5m… designed for professional use with excellent performance and reliability.

Key Features:

  • Premium materials and construction
  • User-friendly design and operation
  • Reliable performance in various conditions
  • Comprehensive quality assurance

🔗 View Product Details & Purchase

💡 Need Help Choosing? Contact our expert team for personalized product recommendations!

Remaining 0% to read
All articles, information, and images displayed on this site are uploaded by registered users (some news/media content is reprinted from network cooperation media) and are for reference only. The intellectual property rights of any content uploaded or published by users through this site belong to the users or the original copyright owners. If we have infringed your copyright, please contact us and we will rectify it within three working days.