20 Comments
User's avatar
James Parker's avatar

It's worth noting that the Chinese AI developers tend to produce documents explaining their approaches and making their models open. As a result Anthropic developers can improve their products based on this output as well.

This exchange of knowledge and data is one of the keys to the rapid progress being made in the field of AI. It is not dissimilar to the early computer revolution (pre dotcom) where informal sharing of knowledge and data -- largely due to the job mobility of people in the computer industry pollinating companies with what they learned elsewhere. Rather than complain about it, I suggest celebrating it as win-win exchange.

Nimish Sanghi's avatar

Like the balanced analysis of the issue.

Nathan Lambert's avatar

Is why I do what I do!

Jordan Schneider's avatar

Or deepseek was just sneakier about it and only 150k they could attribute with confidence

“In the scale of training a language model, 150K samples is only scratching the surface as a substantive experiment. It looks like they were experimenting with some rubrics, which could’ve been for an online RL run, but that’s extremely unlikely with how distributed the access was, and then some minor stuff on completions for sensitive queries. This usage of Anthropic’s API will have a negligible impact on DeepSeek’s long-rumored V4 model (or whichever model the data here contributed to). This was also very likely a small team at DeepSeek and unknown to much of the broader training organization.”

Nathan Lambert's avatar

What Anthropic found is just the tip of the iceberg (most likely).

Jordan Schneider's avatar

Poor deepseek intern…

Jordan Schneider's avatar

No reason for bytedance and Ali not to be doing the same thing—my guess is they were just sneakier

Nathan Lambert's avatar

They have way more GPUs? At least bytedance does.

Polymathematics's avatar

what makes one learning technique more or less ethical than others?

isn't the entire business of this current generation of AI all about coming up with novel ways to engineer a learning process?

outside of specific points (eg tos violations, which would need to be debated in court for a firm answer), the entire surface area of innovation should be as open as possible

and if it's not, the principle should apply equally - not to specific jurisdictions or arbitrary but particular techniques

Chinese Cooking Demystified's avatar

AI companies: scrape the output of the entire internet without attribution

Also AI companies: “how dare you scrape the output of our tool!”

I like their framing of it as an “attack”. If that’s the case then the entire industry’s been pillaging content for years

Martiel's avatar

Isn’t it a bit rich for Anthropic to be complaining about misusing their terms of service? They are paying out billions for pirating and their whole approach involves using massive amounts of data that others have created—for free! (They are at least being paid to have their models distilled). And the Chinese models are at least open source…although Anthropic frames that as a threat rather than a bonus.

The major AI players don’t want actually enforceable protections for online content. The text of Anthropic’s statement makes it clear that the target is not so much misuse as any approach to improve the quality of Chinese models.

Cranky Old Guy's avatar

Lambert makes the technical case well: distillation's impact on Chinese labs is real but mixed, DeepSeek's actual usage was negligible, and restricting GPU shipments would matter more than policing API outputs. All fair. But the piece misses the structural problem. The ToS clause banning competitive distillation doesn't target Chinese labs — it targets everyone. Lambert even notes that U.S. academics and open model builders "used to greatly worry" about this clause. That worry was correct. The national security story gave Anthropic cover to run to Congress with a clause that predates DeepSeek and will survive it. The ask isn't just "stop China." It's "let us write the rule." https://www.mecrankyoldguy.com/p/congress-its-time-to-stop-big-ai

UiPath Community's avatar

This is an underexplored angle. What I find most interesting is how distillation changes the economics of deployment, especially for enterprises running automation at scale. If smaller distilled models can handle 80% of routine tasks reliably, the real competitive moat shifts from model size to orchestration and integration. Curious whether anyone's benchmarking distilled models specifically for structured enterprise workflows.

Tap's avatar

At first i read the news, i think standard distillation model need probability output tokens of teacher which is impossible to get from new reasoning model api.

and this article has very good points at RLVR need more computation. And make me clear.

Leo W.'s avatar

I worry the timing might suggest problems with not bending the knee to the current administration on recent issues with the DoD or Dario’s well known anti-authoritarian stance. Posturing as the visible anti-CCP company could help win policymaker support.. but the average American probably still doesn't know or care about what Anthropic is or does. Maybe they might be signaling this in anticipation of more strictly cutting off Chinese users/usage and need to signal shareholders for a temporary drop in some number investors care about.

Jonathon P Sine's avatar

Have you seen this video and his thread? Curious your opinion. He also raises a serious contention RE claims on MiniMax and Moonshot that not sure you note (about half way through the vid). Cheers https://x.com/theo/status/2026199981179449409?s=46

Nathan Lambert's avatar

I don't think Anthropic would lie about this. I also don't think doing what the Chinese labs are doing is particularly bad faith, as the LLM terms of service have been routinely violated for years.

Kevin Xu's avatar

"It’s clear from their open research that Chinese labs have excellent RL infrastructure, despite the compute shortages."

Is this at least in part due to the resources need in strong RL environment being skewed more towards CPUs, and access to CPUs are less constrained and falls more or less outside of current export control regime? (Of course not discounting Chinese AI labs' talent is strong, constraints breed innovation, etc. etc.)

Nathan Lambert's avatar

I think getting rl right is mostly hard infra problems and needing good GPUs. The cpus matter very little.

messyfork's avatar

Are tasks like coding still GPU limited in post-training/RL? Maybe its just from reading the semianalysis post about CPU demand going up but it made sense to me that just providing enough coding playgrounds for agents to RL in might have become the bottleneck.