8 plots that explain the state of open models

Jan 7

Measuring the impact of Qwen, DeepSeek, Llama, GPT-OSS, Nemotron, and all of the new entrants to the ecosystem.

16 Comments

I love your analyses of the current open model ecosystem, but I was wondering if you have any thoughts about the safety or security measures taken in developing these models. Namely, do you have any insights into the safety measures being taken for each release, whether there are any differences between Chinese and Western approaches to model safety, and maybe how this might affect their adoption?

I also get the impression that the safety of open models is somewhat less emphasised or focused on than that of leading closed models despite there being more of a safety risk to releasing an open model. Do you see that as being the case?

Reply (1)

Nathan Lambert

Jan 8

Chinese models are a bit less safe on a variety of benchmarks vis a vis similar western models. A good snapshot is crtl f "safety" in the olmo 3 paper and look around: https://arxiv.org/pdf/2512.13961

Chinese builders are not "ignoring it", but also would say on average open models coming faster has normalized an environment of release more and you can say "well Qwen's model is less safe so we're probably good."

Thankfully, the most extreme risks of AI models haven't been playing out, so is fine for the time being.

ps image generation/video generation is much sketchier.

Reply (1)

Sruthi Poddutur

Feb 7

That’s a sharp question Ben. My take is that most safety work today, open or closed, still concentrates on output-level controls: filters, refusals, red-teaming for misuse. Necessary, but incomplete.

What often gets missed is reasoning-time safety. When models reason freely, especially as judges or evaluators, they can take unfaithful shortcuts and still produce fluent, confident justifications. That’s a real risk, and arguably a bigger one for open models, where downstream users assemble their own decision pipelines.

I’ve been exploring this angle recently through the lens of Controlled CoT in my recent blog: treating llm's reasoning as something that needs explicit constraints and auditability, not just post-hoc filtering. It’s not a full safety solution, but it feels like an important missing layer. especially as models are increasingly used to judge, rank, and decide.

I believe that industry must start realising that:

Output safety is visible. Reasoning safety is the blind spot.

Henry Ward

Jan 7

Thanks, a picture truly is worth a thousands words, or, more generally

N(Picture) = N(1000)Words

Thomas DeWitt

Jan 7

This is a nice analysis, but I do wonder if # downloads is the best way to measure adoption? Compared to something like # of tokens generated, I would expect Qwen to be inflated due to their small model sizes which anyone can download on a laptop. A single download of e.g. k2 on a cloud infra provider could represent vastly more usage than thousands of hobbyists trying out qwen 8b for local inference. And I would think the amount of usage would be the thing that matters for how influential a model/lab is.

Reply (2)

Nathan Lambert

Jan 7

We’re working on openrouter and other things like it, but downloads are still one of the best metrics we have, even if they’re imperfect.

Maybe we should plot how many likes the models get while we’re at it ;)

Thomas DeWitt

Jan 7

As a suggestion, maybe something like OpenRouter’s recent report of model usage (including open and closed) might better measure open model usage?

Reply (1)

Nathan Lambert

Jan 7

I don’t think open router for closed models is useful; we’ll see if it makes sense for open models.

kunal krishna

Jan 10Edited

Downloads should not remain the primary metric of adoption.

As we move forward, we should also recognize lightweight, friction-free usage patterns—such as curl and similar minimal clients—as first-class indicators of real adoption.

By the way I really love your wisdom writing, always keep me upto date.

TheAISlop

Jan 8

Nathan nice work. There is so much noise surrounding who is building what model with so many releases trying to garner attention....

But the download numbers tell the story. Qwen has won open source, that equals China has won open source.

Makes you wonder how many true evals occur that lead to these downloads vs a herd mentality driven by some source.

Guido

Jan 17

Love this article. I wasn't aware that China had the smartest open-source models... now I'm starting to wonder whether QWEN's value is 100% internalized in Alibaba's valuation.

Nothing Ventured

Jan 8

Do you have the downloads by region? I am curious to see how many american based programmers are downloading and fine tuning the chinese open source models

Reply (1)

Nathan Lambert

Jan 8

Not offered by HuggingFace sadly, we could diff downloads on HuggingFace vs model scope tho (the Chinese clone!)

Jeff Winterich

Jan 7

Nice work Nathan. So what will this mean to entities (think US Public Sector) that can’t (won’t) use Chinese models…? Will this push US and EU companies to do better in the open space or will they double down on their paid for models?

Reply (1)

Nathan Lambert

Jan 7

Lot's of them are still on Llama, but frankly they're going to get out competed and that's why I advocate for urgency to have more models being build openly here in the US. We can afford it!

Full argument is here: https://atomproject.ai/

DotProduct

Jan 7

https://open.substack.com/pub/drjohnclark/p/did-trump-just-hand-china-global?r=1qxy4t&utm_medium=ios