Did DeepSeek Train Its AI Model Using Google’s Gemini? Experts Weigh In
New Evidence Suggests Potential Gemini Influence on DeepSeek’s R1 Model
Chinese AI lab DeepSeek recently unveiled an updated version of its R1 reasoning AI model, showcasing impressive performance on math and coding benchmarks. While the company hasn’t disclosed its training data sources, mounting evidence suggests potential use of outputs from Google’s Gemini AI.
The Gemini Connection: What Researchers Found
Sam Paech, an Australian developer specializing in AI emotional intelligence evaluations, published compelling findings on X (formerly Twitter). His analysis reveals that DeepSeek’s R1-0528 model demonstrates:
- Similar linguistic preferences to Gemini 2.5 Pro
- Nearly identical phrasing patterns
- Comparable reasoning structures
“If you’re wondering why new deepseek r1 sounds a bit different, I think they probably switched from training on synthetic openai to synthetic gemini outputs.” — Sam Paech (@sam_paech)
Adding weight to these claims, the anonymous creator of SpeechMap (an AI free-speech evaluation tool) noted the model’s reasoning traces bear striking resemblance to Gemini’s characteristic output patterns.
A Pattern of Controversial Training Practices
This isn’t DeepSeek’s first encounter with such allegations:
- December 2024: The V3 model frequently identified itself as ChatGPT
- Early 2025: OpenAI reported evidence of distillation techniques being used
- Microsoft investigation: Detected unusual data transfers from OpenAI accounts
The Growing Challenge of AI Data Contamination
While these similarities raise questions, experts note several complicating factors:
- Many AI models increasingly share linguistic patterns
- The internet now contains massive amounts of AI-generated content
- Content farms and bots flood platforms with synthetic text
Nathan Lambert, researcher at AI2, offers perspective:
“If I was DeepSeek I would definitely create synthetic data from the best API model available. They’re resource-constrained but well-funded — this approach effectively gives them more compute power.”
How AI Companies Are Responding
Major players are implementing new safeguards:
- OpenAI: Now requires ID verification for API access
- Google: Summarizes model traces in AI Studio
- Anthropic: Recently announced trace summarization for Claude 4
These measures aim to protect intellectual property while addressing growing concerns about model distillation.
TechCrunch has reached out to Google for comment and will update this story with any response.
📚 Featured Products & Recommendations
Discover our carefully selected products that complement this article’s topics:
🛍️ Featured Product 1: CONSOLE TABLE 2-TIER ANTIQUE GOLD #54284
Image: Premium product showcase
Advanced console table 2-tier antique gold #54284 engineered for excellence with proven reliability and outstanding results.
Key Features:
- Premium materials and construction
- User-friendly design and operation
- Reliable performance in various conditions
- Comprehensive quality assurance
🔗 View Product Details & Purchase
🛍️ Featured Product 2: COSMETIC BAG BOX SEASHELLS BLUE #77700
Image: Premium product showcase
Advanced cosmetic bag box seashells blue #77700 engineered for excellence with proven reliability and outstanding results.
Key Features:
- Premium materials and construction
- User-friendly design and operation
- Reliable performance in various conditions
- Comprehensive quality assurance
🔗 View Product Details & Purchase
💡 Need Help Choosing? Contact our expert team for personalized product recommendations!