Meta’s Llama AI Training Under Fire: Zuckerberg Approved Pirated Book Dataset, Court Filing Reveals
Key Allegations in the Kadrey v. Meta Copyright Lawsuit
Newly unsealed court documents reveal explosive claims about Meta’s AI training practices. Plaintiffs in the Kadrey v. Meta lawsuit allege CEO Mark Zuckerberg personally approved using a dataset of pirated books and articles to train the company’s Llama AI models.
The Controversial LibGen Connection
According to the filing:
- Meta allegedly used LibGen (Library Genesis), a notorious repository of pirated academic texts and books
- The dataset contained copyrighted works from major publishers including McGraw Hill and Pearson
- Internal Meta documents reportedly refer to LibGen as “a data set we know to be pirated”
Internal Concerns and Executive Approval
The court documents paint a picture of internal debate at Meta:
- AI executives reportedly expressed concerns about legal and regulatory risks
- Engineers allegedly warned torrenting LibGen “could be legally not OK”
- Despite objections, Zuckerberg (referred to as “MZ” in internal memos) allegedly gave final approval
Evidence of Deliberate Obfuscation?
Plaintiffs claim Meta took steps to conceal its methods:
- Metadata Removal: Engineer Nikolay Bashlykov allegedly wrote scripts to strip copyright notices and acknowledgments
- Torrenting Practices: Meta reportedly minimized uploads while downloading LibGen content
- Attribution Scrubbing: Science journal articles allegedly had source metadata removed
The Legal Battle Over AI Fair Use
This case highlights the growing tension between:
- Tech Companies: Argue AI training falls under “fair use” doctrine
- Creators: Claim unauthorized use of copyrighted works constitutes infringement
Previous Legal Precedents
- 2023: Court dismissed some AI copyright claims against Meta
- Current case focuses specifically on early Llama models
- Judge Chhabria denied Meta’s redaction requests, citing PR motives
Why This Case Matters
The outcome could set important precedents for:
- AI development practices industry-wide
- Copyright law in the digital age
- Corporate accountability for training data sourcing
Meta has not yet responded to requests for comment. The case continues in the U.S. District Court for the Northern District of California.
For more AI industry developments, subscribe to TechCrunch’s AI newsletter.
📚 Featured Products & Recommendations
Discover our carefully selected products that complement this article’s topics:
🛍️ Featured Product 1: Beaufort – Rectangular Glass Top Dining Table – Chrome
Image: Premium product showcase
High-quality beaufort – rectangular glass top dining table – chrome offering outstanding features and dependable results for various applications.
Key Features:
- Industry-leading performance metrics
- Versatile application capabilities
- Robust build quality and materials
- Satisfaction guarantee and warranty
🔗 View Product Details & Purchase
🛍️ Featured Product 2: Beacon – Boucle Fabric Stool (Set of 2)
Image: Premium product showcase
Advanced beacon – boucle fabric stool (set of 2) engineered for excellence with proven reliability and outstanding results.
Key Features:
- Cutting-edge technology integration
- Streamlined workflow optimization
- Heavy-duty construction for reliability
- Expert technical support available
🔗 View Product Details & Purchase
🛍️ Featured Product 3: Beckham – 4 Piece Modular L-Sahped Sectional
Image: Premium product showcase
High-quality beckham – 4 piece modular l-sahped sectional offering outstanding features and dependable results for various applications.
Key Features:
- Professional-grade quality standards
- Easy setup and intuitive use
- Durable construction for long-term value
- Excellent customer support included
🔗 View Product Details & Purchase
💡 Need Help Choosing? Contact our expert team for personalized product recommendations!