2025 Open Models Year in Review
The first recap of a long year in the trenches of open models.
Welcome to the first Artifacts Recap, where we highlight the most notable and impactful open model releases of this year. And what a year it has been! Starting into the year, the open model landscape was seen as lagging behind severely, with open models being mostly a choice for those who needed privacy or wanted to fine-tune models for their use cases.
While these are still compelling reasons for open models, the performance in this year has increased dramatically during the last 12 months. In 2024, the ecosystem was mostly relying on Llama 3 and looking ahead to the next generation of that model family, while Qwen2.5, QwQ and DeepSeek V2 / V2.5 / V3 were known to those deep into the ecosystem as capable but still niche picks. In 2025, DeepSeek and Qwen became household names with R1 and Qwen 3 respectively, which resulted in particular a lot of Chinese companies opening their models as well.
As a result, the open ecosystem has immensely accelerated in terms of capabilities, rivaling closed models on most key benchmarks. It is a much more nuanced debate on if they’re delivering as much in real-world usage, where closed models still dominate.
Selecting any number of models as the “best few” is a nearly impossible task, as the ecosystem is growing so rapidly. There are many more categories of open models that are relevant than just the biggest, text-only models. Open models thrive as being the default for many niche use-cases across modalities and compute perspectives.
To put the scale of our open ecosystem monitoring into perspective: Each day, around 1,000 - 2,000 models are uploaded to HuggingFace. Out of these 30,000 - 60,000 models a month, we select roughly 50 for the Artifacts series model roundups, which results in us covering 600 models a year. This of course means that some models didn’t make the cut.
Do you think we missed an obvious one? Leave your choice in the comments!
In this post, we start by highlighting our top models of the year in terms of their influence on the AI ecosystem broadly and the trends of open models specifically. We conclude with the complete tier list across model makers in the U.S., China, and the world, based on contributions in 2025.
The Winners
The models that defined this year’s releases and had outsized impact, even outside the open model space.
DeepSeek R1: Yes, that one was released this year! On January 20th, to be precise. It is hard to overstate the impact this model release has had, both on the open model, as well as the general AI landscape. Not only did it show that a small team is able to push forward with innovation, it also was released under the MIT license — while its predecessor, DeepSeek V3, used a custom, DeepSeek License with usage restrictions. This move inspired a lot of (Chinese) labs to release their models openly and under an open license as well. Remember the times when Qwen had their own license for their most capable models?
It is obvious to say that this release was the most impactful one for this year.Qwen 3: It might be unfair to put a whole model family in the same ranks as other models in this list. Qwen3 covers everything: From general models in all sizes and forms (both dense and MoE), to vision and omni, coding, embedding and reranker, cause why wouldn’t they?
While Qwen2.5 was mostly known as an insider tip and heavily used by academia, Qwen3 is regarded as the choice for a lot of problems, especially in terms of multilinguality. It therefore is no wonder that a lot of academic experiments are conducted on Qwen-based models, which might have consequences in terms of reproducibility on other models. By now, Qwen has overtaken Llama in terms of total downloads and as the most-used base model to fine-tune (for more download data, see The ATOM Project).Kimi K2: Moonshot AI is a laser-focused lab similar to DeepSeek: They work on one model line at a time, while running experiments on smaller models which will eventually feed back into their main model line for the next generation. This therefore makes it easy to guess what the next model will look like. Kimi K2 was (and is) a model loved by many, for both its sheer performance and its distinct writing style.
Runner Ups
Model releases that are very solid and deservedly well-known in the open model space.
MiniMax M2: MiniMax M2 was a surprising release this year. While MiniMax didn’t come from nowhere and we’ve been watching every release from them, the leap from the rather mediocre M1 to the very capable M2 is nothing short of remarkable. Minimax also executed the (Chinese) model release playbook perfectly, leading to lasting usage even after the free period ended, with M2 continuing to be one of the most-used models on OpenRouter.
GLM-4.5: The story of Zhipu feels similar to Moonshot: A team which is laser-focused on one model line and one goal, and continues to develop their models with rigor, followed by them getting more attention with one model release. That model release was Kimi K2 for Moonshot and GLM-4.5 for Zhipu. We also chose 4.5 over 4.6 because it was their breakthrough moment and has the beloved and smaller Air version, which will be released for 4.6 in the near future.
GPT-OSS: The long-awaited open model release by OpenAI. Flexing its muscles with sheer performance, this model is the driving force behind many agentic apps, in which it shines. Being weak in general world knowledge and multilingual, GPT-OSS must be used in very specific settings and setups, where it then outshines alternatives. It also pioneered different (low/medium/high) thinking levels, similar to its big closed-source brothers, something we might see adopted by other (open) models in the future.
Gemma 3: Gemma 3 is beloved for two reasons: Its strong multilingual abilities, especially at the <30B size range, and its vision capabilities. The latter is something the Western open model space is severely lacking in terms of strong options aside from Gemma and Moondream. Hopefully, this might change in the coming year!
Olmo 3: As is it has for the last few years, Ai2 (where Nathan works) delivered another update to the best models with all data, code, weights, logs, and methods released. These are crucial for researchers who cannot understand leading models without releases like this. Where the industry has shifted to MoEs for peak performance, and Ai2 will too, these models at 7 and 32B scales of dense transformers is crucial for accessibility of finetuning — a niche that is actually underserved by the model makers after Llama’s downfall and Qwen withholding some base models.
Honorable Mentions
Models that dominate or re-define a certain niche.
Parakeet 3: I (Florian) cannot speak highly enough of this speech-to-text model. It completely transformed how I work and interact with my computer. Speaking with your computer is awkward at first but becomes natural quickly. It also is a huge boost to (Claude) coding-based workflows if you can just waffle on for paragraphs to explain your problem compared to lazily writing a few sentences.
It is almost boring to see how well this model works while being blazingly fast on a MacBook, beating out every cloud-based platform in terms of end-to-end latency. It is such a good model that a lot of apps with “Whisper” in its name are switching to this model as the main engine (something we have seen time and time again — r/LocalLlama is not about Llama anymore, nor is r/StableDiffusion about Stable Diffusion these days). Parakeet 3 adds a whole new selection of languages, including German, which I happily use. Whisper has support for more languages, at least for now. Oh, did I mention that the majority of data is open as well?Nemotron 2: NVIDIA, the second: They are also in the open model LLM business (well, and also VLAs, reward models, biology, gaussian splatting, video generation, and and and). Aside from them pruning and post-training other models, they are training their own models under the Nemotron brand. Similar to Parakeet, the vast majority of data is released openly. Their models are mamba2-transformer hybrids, which improves the speed, especially at long contexts, compared to transformer-only models.
Moondream 3: Widely regarded as THE player in the vision space, the Moondream team puts a lot of care into their model releases, giving even closed models like GPT or Gemini a run for its money. Those deep in the vision space know that, those who aren’t should know. Try the model!
Granite 4: The IBM team puts out rock-solid (pun intended) releases one after the other, yet are unable to get the attention they deserve. Togglable thinking per prompt was debuted by Granite 3.2, for example. And while this seemed to be a short-lived phase, as the open model space is switching back to releasing reasoning and instruct models separately, it shows that IBMs LLM efforts are worth to be watched. With its fourth iteration, IBM adapts the mamba-attention architecture and also releases MoEs. Even more important: They are we scaling up the model sizes! The writing style is also distinctly non-sloptimized, which is regreshing in this day and age.
SmolLM3: A tiny, yet capable model for its 3B size. All the data is open, as well as intermediate checkpoints. Aside from the great initial blog, the HF team has also released other resources which deeper into the training. If you are in the need for a great on-device model, chances are that SmolLM3 is a perfect fit!
Mapping the open ecosystem
Edit: Added the tier list in text. Added Cohere, ServiceNow, Motif Technologies, and TNG Group to the tier list.
We have more requests than imagined to update our tier list, which covered the Chinese ecosystem and to extend it with Western orgs, which we’ve covered here. We have added a specialist tier which contains the organizations that trained few models or are specializing in a certain niche, e.g. small, on-device models (Liquid, HuggingFace).
The organizations are as follows.
Frontier: DeepSeek, Qwen, Moonshot AI (Kimi)
Close competitors: Zhipu (Z.Ai), Minimax
Noteworthy: StepFun, InclusionAI / Ant Ling, Meituan Longcat, Tencent, IBM, NVIDIA, Google, Mistral
Specialists: OpenAI, Ai2, Moondream, Arcee, RedNote, HuggingFace, LiquidAI, Microsoft, Xiaomi, Mohamed bin Zayed University of Artificial Intelligence
On the rise: ByteDance Seed, Apertus, OpenBMB, Motif, Baidu, Marin Community, InternLM, OpenGVLab, ServiceNow, Skywork
Honorable mentions: TNG Group, Meta, Cohere, Beijing Academy of Artificial Intelligence, Multimodal Art Projection, Huawei
Some notes:
A lot of the orgs in Noteworthy can reach a higher tier by scaling up their current recipe. This tier also includes model makers who train a lot of models, often for different modalities.
Meituan Longcat (China’s DoorDash equivalent) is a new addition to the tier list, their models are recurring guests in the artifacts series.
Meta was weird to place, given that there are a lot of reports that they will release proprietary models in the future. The future of Llama is uncertain.
ByteDance Seed’s papers show that they are a strong research organization, which yet has to be reflected in their open model releases. Seed-OSS 36B is their first capable LLM, while their other releases, such as AHN-Mamba2-for-Qwen-2.5-Instruct-3B are mostly research artifacts.
If you want to take your own stab at the tier list, you can do so by using this link!
Predictions for 2026
2025 was a seminal year in open models, where open model deployments became a real possibility. It is still well accepted that the best closed models have a robustness and richness that open models matching them on benchmarks don’t always have, but the potential of trying open models has never been higher. This leaves us at the point where open models are established, so where do they go next?
In 2026, we expect the major talking points of open models to follow:



