Or deepseek was just sneakier about it and only 150k they could attribute with confidence
“In the scale of training a language model, 150K samples is only scratching the surface as a substantive experiment. It looks like they were experimenting with some rubrics, which could’ve been for an online RL run, but that’s extremely unlikely with how distributed the access was, and then some minor stuff on completions for sensitive queries. This usage of Anthropic’s API will have a negligible impact on DeepSeek’s long-rumored V4 model (or whichever model the data here contributed to). This was also very likely a small team at DeepSeek and unknown to much of the broader training organization.”
what makes one learning technique more or less ethical than others?
isn't the entire business of this current generation of AI all about coming up with novel ways to engineer a learning process?
outside of specific points (eg tos violations, which would need to be debated in court for a firm answer), the entire surface area of innovation should be as open as possible
and if it's not, the principle should apply equally - not to specific jurisdictions or arbitrary but particular techniques
Have you seen this video and his thread? Curious your opinion. He also raises a serious contention RE claims on MiniMax and Moonshot that not sure you note (about half way through the vid). Cheers https://x.com/theo/status/2026199981179449409?s=46
I don't think Anthropic would lie about this. I also don't think doing what the Chinese labs are doing is particularly bad faith, as the LLM terms of service have been routinely violated for years.
"It’s clear from their open research that Chinese labs have excellent RL infrastructure, despite the compute shortages."
Is this at least in part due to the resources need in strong RL environment being skewed more towards CPUs, and access to CPUs are less constrained and falls more or less outside of current export control regime? (Of course not discounting Chinese AI labs' talent is strong, constraints breed innovation, etc. etc.)
Are tasks like coding still GPU limited in post-training/RL? Maybe its just from reading the semianalysis post about CPU demand going up but it made sense to me that just providing enough coding playgrounds for agents to RL in might have become the bottleneck.
Like the balanced analysis of the issue.
Is why I do what I do!
Or deepseek was just sneakier about it and only 150k they could attribute with confidence
“In the scale of training a language model, 150K samples is only scratching the surface as a substantive experiment. It looks like they were experimenting with some rubrics, which could’ve been for an online RL run, but that’s extremely unlikely with how distributed the access was, and then some minor stuff on completions for sensitive queries. This usage of Anthropic’s API will have a negligible impact on DeepSeek’s long-rumored V4 model (or whichever model the data here contributed to). This was also very likely a small team at DeepSeek and unknown to much of the broader training organization.”
What Anthropic found is just the tip of the iceberg (most likely).
Poor deepseek intern…
No reason for bytedance and Ali not to be doing the same thing—my guess is they were just sneakier
They have way more GPUs? At least bytedance does.
what makes one learning technique more or less ethical than others?
isn't the entire business of this current generation of AI all about coming up with novel ways to engineer a learning process?
outside of specific points (eg tos violations, which would need to be debated in court for a firm answer), the entire surface area of innovation should be as open as possible
and if it's not, the principle should apply equally - not to specific jurisdictions or arbitrary but particular techniques
AI companies: scrape the output of the entire internet without attribution
Also AI companies: “how dare you scrape the output of our tool!”
I like their framing of it as an “attack”. If that’s the case then the entire industry’s been pillaging content for years
Have you seen this video and his thread? Curious your opinion. He also raises a serious contention RE claims on MiniMax and Moonshot that not sure you note (about half way through the vid). Cheers https://x.com/theo/status/2026199981179449409?s=46
I don't think Anthropic would lie about this. I also don't think doing what the Chinese labs are doing is particularly bad faith, as the LLM terms of service have been routinely violated for years.
"It’s clear from their open research that Chinese labs have excellent RL infrastructure, despite the compute shortages."
Is this at least in part due to the resources need in strong RL environment being skewed more towards CPUs, and access to CPUs are less constrained and falls more or less outside of current export control regime? (Of course not discounting Chinese AI labs' talent is strong, constraints breed innovation, etc. etc.)
I think getting rl right is mostly hard infra problems and needing good GPUs. The cpus matter very little.
Are tasks like coding still GPU limited in post-training/RL? Maybe its just from reading the semianalysis post about CPU demand going up but it made sense to me that just providing enough coding playgrounds for agents to RL in might have become the bottleneck.