How much does distillation really matter for…

Nathan Lambert

4 hrs ago

Reacting to Anthropic's post on "distillation attacks."

Read →

14 Comments

Nimish Sanghi

Like the balanced analysis of the issue.

Reply (1)

Nathan Lambert

Is why I do what I do!

Jordan Schneider

Or deepseek was just sneakier about it and only 150k they could attribute with confidence

“In the scale of training a language model, 150K samples is only scratching the surface as a substantive experiment. It looks like they were experimenting with some rubrics, which could’ve been for an online RL run, but that’s extremely unlikely with how distributed the access was, and then some minor stuff on completions for sensitive queries. This usage of Anthropic’s API will have a negligible impact on DeepSeek’s long-rumored V4 model (or whichever model the data here contributed to). This was also very likely a small team at DeepSeek and unknown to much of the broader training organization.”

Reply (1)

Nathan Lambert

What Anthropic found is just the tip of the iceberg (most likely).

Reply (2)

Jordan Schneider

Poor deepseek intern…

Jordan Schneider

No reason for bytedance and Ali not to be doing the same thing—my guess is they were just sneakier

Reply (1)

Nathan Lambert

They have way more GPUs? At least bytedance does.

Polymathematics

what makes one learning technique more or less ethical than others?

isn't the entire business of this current generation of AI all about coming up with novel ways to engineer a learning process?

outside of specific points (eg tos violations, which would need to be debated in court for a firm answer), the entire surface area of innovation should be as open as possible

and if it's not, the principle should apply equally - not to specific jurisdictions or arbitrary but particular techniques

Reply (1)

Chinese Cooking Demystified

36mEdited

AI companies: scrape the output of the entire internet without attribution

Also AI companies: “how dare you scrape the output of our tool!”

I like their framing of it as an “attack”. If that’s the case then the entire industry’s been pillaging content for years

Jonathon P Sine

Have you seen this video and his thread? Curious your opinion. He also raises a serious contention RE claims on MiniMax and Moonshot that not sure you note (about half way through the vid). Cheers https://x.com/theo/status/2026199981179449409?s=46

Reply (1)

Nathan Lambert

I don't think Anthropic would lie about this. I also don't think doing what the Chinese labs are doing is particularly bad faith, as the LLM terms of service have been routinely violated for years.

Kevin Xu

"It’s clear from their open research that Chinese labs have excellent RL infrastructure, despite the compute shortages."

Is this at least in part due to the resources need in strong RL environment being skewed more towards CPUs, and access to CPUs are less constrained and falls more or less outside of current export control regime? (Of course not discounting Chinese AI labs' talent is strong, constraints breed innovation, etc. etc.)

Reply (1)

Nathan Lambert

I think getting rl right is mostly hard infra problems and needing good GPUs. The cpus matter very little.

Reply (1)

messyfork

16mEdited

Are tasks like coding still GPU limited in post-training/RL? Maybe its just from reading the semianalysis post about CPU demand going up but it made sense to me that just providing enough coding playgrounds for agents to RL in might have become the bottleneck.