Anthropic Accuses DeepSeek, Moonshot, and MiniMax of Industrial-Scale AI Distillation Attacks on Claude

Anthropic Exposes Massive "Industrial-Scale" AI Distillation Campaign

In a significant escalation of the intellectual property conflict between American and Chinese artificial intelligence laboratories, Anthropic has publicly accused three leading Chinese firms—DeepSeek, Moonshot AI, and MiniMax—of orchestrating a massive, coordinated campaign to siphon capabilities from its flagship model, Claude. The San Francisco-based AI safety startup characterizes the operation as "industrial-scale theft," involving over 16 million unauthorized exchanges generated through a sophisticated network of fraudulent accounts.

This revelation marks one of the most specific and quantified allegations of AI data theft to date. According to Anthropic, the operation was not merely opportunistic scraping but a deliberate "distillation attack" designed to train rival models using Claude’s advanced reasoning and coding outputs. The incident underscores the growing tension in the global AI arms race, where the line between competitive research and illicit extraction is becoming increasingly volatile.

The Anatomy of the Attack: 16 Million Exchanges

Anthropic’s security team identified a sprawling infrastructure of approximately 24,000 fraudulent accounts used to bypass the company’s terms of service and regional access restrictions. Since Claude is not commercially available in China, the accused firms allegedly utilized commercial proxy services to mask their origins, creating what Anthropic engineers have termed "Hydra clusters"—networks of accounts that distribute traffic across third-party APIs to evade detection.

The scale of the operation was heavily skewed toward MiniMax, a Shanghai-based unicorn, which Anthropic claims was responsible for the lion's share of the illicit traffic. While DeepSeek has garnered significant media attention recently for its efficient open-source models, it was MiniMax that allegedly conducted the most aggressive extraction campaign in this instance.

Breakdown of Alleged Distillation Activity

Accused Firm	Estimated Exchanges	Primary Target Capabilities	Scale of Operation
MiniMax	> 13,000,000	Agentic reasoning, tool use	Massive / Industrial
Moonshot AI	> 3,400,000	Long-context processing, coding	Significant
DeepSeek	> 150,000	Chain-of-thought reasoning	Targeted / Strategic

Data based on Anthropic’s threat intelligence report released February 2026.

The disparity in volume suggests different strategic goals for each firm. MiniMax’s massive volume indicates a broad attempt to replicate Claude’s general-purpose capabilities, particularly in "agentic" tasks where the model acts autonomously. In contrast, DeepSeek’s smaller footprint appears to have been highly surgical, focusing on specific high-value reasoning chains to fine-tune their existing architectures.

Decoding "AI Distillation": Innovation or Theft?

At the heart of this controversy is the practice of "knowledge distillation." In a legitimate context, developers use a large "teacher" model to train a smaller, more efficient "student" model. This process compresses the knowledge of a massive system into a faster, cheaper version, which is standard practice for internal product development.

However, Anthropic contends that when this is done by a competitor without permission, it constitutes a violation of terms and a theft of proprietary intelligence. By feeding Claude millions of complex prompts and harvesting its answers, the Chinese labs effectively bypassed the immense compute and data curation costs required to train a frontier model from scratch.

"These labs are not just learning from us; they are effectively photocopying the results of billions of dollars in R&D," stated an Anthropic spokesperson. The report highlights that the queries were not typical user interactions. Instead, they were structurally distinct—often involving complex coding challenges or requests for step-by-step reasoning that are ideal for training datasets (Fine-Tuning Data).

National Security and the "Safety Gap"

Anthropic has framed this incident not just as a commercial dispute, but as a national security imperative. The company argues that illicit distillation poses a unique danger: it strips away the safety guardrails embedded in the original model.

When a model like Claude is distilled, the "student" model learns the capabilities (how to write malware, how to synthesize chemicals) without necessarily learning the safety refusals or moral alignments that Anthropic spends months reinforcing. This results in "unprotected capabilities" that can be deployed by authoritarian regimes or bad actors without the built-in restrictions of the source model.

Implications of Illicit Distillation

Erosion of Export Controls: Sophisticated capabilities are transferred digitally, bypassing physical chip restrictions.
Safety Decoupling: Distilled models often fail to retain the "refusal" behaviors of the teacher model.
Market Distortion: Companies engaging in theft can undercut prices by avoiding training costs.
Proliferation Risk: Uncensored versions of frontier models can be easily adapted for offensive cyber operations.

The Technical Countermeasures

The detection of this campaign relied on advanced behavioral analysis. Anthropic’s "Trust and Safety" team noticed anomalies in traffic patterns that human users rarely exhibit, such as 24/7 distinct querying without idle time, and a high density of "jailbreak-style" prompts designed to test the model's limits.

By correlating IP addresses and payment methods associated with the proxy services, Anthropic was able to group the 24,000 accounts into distinct clusters attributed to the three specific firms. The company has since suspended these accounts and implemented stricter "Know Your Customer" (KYC) protocols for API access, though they acknowledge that the "whack-a-mole" nature of proxy networks makes permanent prevention difficult.

Industry Reaction and Future Outlook

This accusation comes weeks after OpenAI leveled similar, though less detailed, charges against Chinese competitors, suggesting a systemic pattern across the industry. The "distillation" shortcut is becoming the primary method for lagging competitors to close the gap with U.S. frontier models.

For the AI community, this incident raises critical questions about the enforceability of Terms of Service in a global digital economy. As models become more powerful, the value of their output increases, making them lucrative targets for extraction. We can expect this to accelerate the push for legislative action, potentially leading to new U.S. regulations that treat model weights and model outputs as controlled commodities, subject to the same strict scrutiny as the high-end GPUs currently restricted for export to China.

As the dust settles, the focus turns to how DeepSeek, Moonshot, and MiniMax will respond. While they have historically remained silent on such accusations, the specificity of Anthropic's data leaves little room for ambiguity regarding the origin of the attacks.