Did DeepSeek Train Its AI Model Using Google's Gemini? Experts Weigh In

Did DeepSeek Train Its AI Model Using Google’s Gemini? Experts Weigh In

New Evidence Suggests Potential Gemini Influence on DeepSeek’s R1 Model

Chinese AI lab DeepSeek recently unveiled an updated version of its R1 reasoning AI model, showcasing impressive performance on math and coding benchmarks. While the company hasn’t disclosed its training data sources, mounting evidence suggests potential use of outputs from Google’s Gemini AI.

The Gemini Connection: What Researchers Found

Sam Paech, an Australian developer specializing in AI emotional intelligence evaluations, published compelling findings on X (formerly Twitter). His analysis reveals that DeepSeek’s R1-0528 model demonstrates:

Similar linguistic preferences to Gemini 2.5 Pro
Nearly identical phrasing patterns
Comparable reasoning structures

“If you’re wondering why new deepseek r1 sounds a bit different, I think they probably switched from training on synthetic openai to synthetic gemini outputs.” — Sam Paech (@sam_paech)

Adding weight to these claims, the anonymous creator of SpeechMap (an AI free-speech evaluation tool) noted the model’s reasoning traces bear striking resemblance to Gemini’s characteristic output patterns.

A Pattern of Controversial Training Practices

This isn’t DeepSeek’s first encounter with such allegations:

December 2024: The V3 model frequently identified itself as ChatGPT
Early 2025: OpenAI reported evidence of distillation techniques being used
Microsoft investigation: Detected unusual data transfers from OpenAI accounts

The Growing Challenge of AI Data Contamination

While these similarities raise questions, experts note several complicating factors:

Many AI models increasingly share linguistic patterns
The internet now contains massive amounts of AI-generated content
Content farms and bots flood platforms with synthetic text

Nathan Lambert, researcher at AI2, offers perspective:

“If I was DeepSeek I would definitely create synthetic data from the best API model available. They’re resource-constrained but well-funded — this approach effectively gives them more compute power.”

How AI Companies Are Responding

Major players are implementing new safeguards:

OpenAI: Now requires ID verification for API access
Google: Summarizes model traces in AI Studio
Anthropic: Recently announced trace summarization for Claude 4

These measures aim to protect intellectual property while addressing growing concerns about model distillation.

TechCrunch has reached out to Google for comment and will update this story with any response.

📚 Featured Products & Recommendations

Discover our carefully selected products that complement this article’s topics:

🛍️ Featured Product 1: CONSOLE TABLE 2-TIER ANTIQUE GOLD #54284

CONSOLE TABLE 2-TIER ANTIQUE GOLD #54284 Image: Premium product showcase

Advanced console table 2-tier antique gold #54284 engineered for excellence with proven reliability and outstanding results.

Key Features:

Premium materials and construction
User-friendly design and operation
Reliable performance in various conditions
Comprehensive quality assurance

🔗 View Product Details & Purchase

🛍️ Featured Product 2: COSMETIC BAG BOX SEASHELLS BLUE #77700

COSMETIC BAG BOX SEASHELLS BLUE #77700 Image: Premium product showcase

Advanced cosmetic bag box seashells blue #77700 engineered for excellence with proven reliability and outstanding results.

Key Features:

Premium materials and construction
User-friendly design and operation
Reliable performance in various conditions
Comprehensive quality assurance

🔗 View Product Details & Purchase

💡 Need Help Choosing? Contact our expert team for personalized product recommendations!

Remaining 0% to read

All articles, information, and images displayed on this site are uploaded by registered users (some news/media content is reprinted from network cooperation media) and are for reference only. The intellectual property rights of any content uploaded or published by users through this site belong to the users or the original copyright owners. If we have infringed your copyright, please contact us and we will rectify it within three working days.