You fundamentally misunderstand the emerging character of the Chinese LLM community. It is not so much competitive as 'co-opetitive'. Being Open Weight, they share architectural software improvements willingly whilst each individual LLM concentrates on a slightly different - yet complementary - area of expertise. What is emerging is a Dragon Swarm whose watchword is consilience. DeepSeek is the Architect Dragon whose Open‑Weight 'foundation model excellence' (rich in software design features willingly shared) will be massively reinforced when R2 drops mid February, not coincidentally coinciding with the advent of the Year of the Fire Horse. Deep Seek is the bedrock of the swarm - the 'Mother of Dragons' if you will. Aside from being the technical supremo, it is optimized for all-round reasoning and general intelligence. MiniMax is the Creative & Sonic Dragon, a specialist in multimodal creativity – text, voice, music and immersive content synthesis. Deep Seek and Minimax (and Qwen, Kimi, Ubiquant, 01.AI, ZiAI, Sensetime and more) are not so much rivals as members of a Dragon Swarm of Open Weight LLMs covering an extraordinarily wide range of expertises.
I feel like the idea is presented a bit more strongly than I’d say it but yes I agree, and think the Chinese ecosystem is more interesting in its entire ecosystem dynamic that makes up for the models being slightly less good in absolute terms
For now...I agree. But after MLA - especially if soon to be complemented by v4, and perhaps soon thereafter 'all 'wrapped' in R2 (with extra features likely added then))- can you be so sure?
The open models make the overall pace of progress higher, which benefits open model builders most (who happen to be behind). If MLA helps, closed labs take it too. I see DeepSeek as the open lab with the best track record of innovation, and their architecture did a lot to start the wave in 2025, but in the long term it may not look like a repeating cycle.
I don't doubt the closed labs will be all over MLA. V4 too when it drops. And any of the 'extras' in R2...My point - and you have made this observation regarding OS LLMs better than any one else - is that with MLA 'reinventing' training (compressing attention memory by 93% during training) plus all the architectural elegance now evident on the inference side, the flood gates for the OS models being widely adopted are about to open... I simply do not see the old order 'brute force' CW models holding the trump card they have in the past (and I think Nvidia knows this: Nemotron 3; Groq!). I am not hypnotized by Benchmarks (especially with Yann LeCun revealing how Meta gamed the system!) but I think that R2 - when it drops - is going to unleash a wave of upgrades (CW as well as OW). But more importantly the barriers to constructing great LLMs will be lowered... FOR EVERYONE!
I'd love for you to be right. I'm currently slightly more bearish on DeepSeek's capabilities. I think more people have caught up and its hard to get ahead.
Let's speak again on 18 February...the day after the Chinese New Year begins and the Year of the Fire Horse starts. I don't think even the techies at DeepSeek will be able to resist using the start of BY FAR the most powerful year in the 60 year Chinese Zodiac to make a statement!
I do not say YOU misunderstand this. Not at all! What I said was THE WRITERS OF THE FT ARTICLE misunderstand what is happening. I suppose my comment merely wanted to suggest that were you to access DeepSeek after R2, I doubt you would encounter any 'jaggedness' between swarm members- One for All; All for One. YSo you would not need to use multiple closed weight LLMs to serve your needs. And there will be no fees involved either.
As a non-tech idiot who regularly gets in way over my head on vibecoding projects, I've developed a clunky method of consulting multiple models when the one I'm working with gets stuck or seems off base. I ask it to write up a memo describing the bug or strategic question or whatever and paste that into the other two of the ChatGPT, Claude, Gemini "Crew" and into a new instance of whatever one I'm working with (with the instruction that it's a naive model who should ignore any context that it comes across). Then I share the results with the model I've been working with -- and sometimes "fire" it and switch to working with another!
Is there a better way of doing the same thing? Meaning either likely to get better results or to take less time/effort. A lot of time the advice is great, especially on bugs. The most maddening things is that models won't tell me about available better solutions that I don't know to ask about.
Honestly sounds like about it. There are ways where you can use the API versions + set them up to share context, but that's a bit of effort and I haven't tried it yet even.
Fascinating article! I'm at the other end of the spectrum: as a retired AI researcher, I don't use *any* AI models in my normal life and am much the happier for it :-)
Helpful but this feels more like your cognitive model for routing than a "stack". I was intrigued to learn how your implementing context eng, workflow orchestration, building evals, red teaming, etc!
Hah I wondered if that might be it, still seems to be the most common answer I less you're at one of the FDE + platform companies trying to sell the platform (Dystil, Invisible, Crew, etc)
In a much recommended comment I wrote in last Friday's Financial Times to an article entitled: DeepSeek rival’s shares double in debut as Chinese AI companies rush to list (https://www.ft.com/content/a4fc6106-5a61-4a89-9400-c17c87fb1920#comments-anchor) I replied as follows:
You fundamentally misunderstand the emerging character of the Chinese LLM community. It is not so much competitive as 'co-opetitive'. Being Open Weight, they share architectural software improvements willingly whilst each individual LLM concentrates on a slightly different - yet complementary - area of expertise. What is emerging is a Dragon Swarm whose watchword is consilience. DeepSeek is the Architect Dragon whose Open‑Weight 'foundation model excellence' (rich in software design features willingly shared) will be massively reinforced when R2 drops mid February, not coincidentally coinciding with the advent of the Year of the Fire Horse. Deep Seek is the bedrock of the swarm - the 'Mother of Dragons' if you will. Aside from being the technical supremo, it is optimized for all-round reasoning and general intelligence. MiniMax is the Creative & Sonic Dragon, a specialist in multimodal creativity – text, voice, music and immersive content synthesis. Deep Seek and Minimax (and Qwen, Kimi, Ubiquant, 01.AI, ZiAI, Sensetime and more) are not so much rivals as members of a Dragon Swarm of Open Weight LLMs covering an extraordinarily wide range of expertises.
I feel like the idea is presented a bit more strongly than I’d say it but yes I agree, and think the Chinese ecosystem is more interesting in its entire ecosystem dynamic that makes up for the models being slightly less good in absolute terms
For now...I agree. But after MLA - especially if soon to be complemented by v4, and perhaps soon thereafter 'all 'wrapped' in R2 (with extra features likely added then))- can you be so sure?
The open models make the overall pace of progress higher, which benefits open model builders most (who happen to be behind). If MLA helps, closed labs take it too. I see DeepSeek as the open lab with the best track record of innovation, and their architecture did a lot to start the wave in 2025, but in the long term it may not look like a repeating cycle.
I don't doubt the closed labs will be all over MLA. V4 too when it drops. And any of the 'extras' in R2...My point - and you have made this observation regarding OS LLMs better than any one else - is that with MLA 'reinventing' training (compressing attention memory by 93% during training) plus all the architectural elegance now evident on the inference side, the flood gates for the OS models being widely adopted are about to open... I simply do not see the old order 'brute force' CW models holding the trump card they have in the past (and I think Nvidia knows this: Nemotron 3; Groq!). I am not hypnotized by Benchmarks (especially with Yann LeCun revealing how Meta gamed the system!) but I think that R2 - when it drops - is going to unleash a wave of upgrades (CW as well as OW). But more importantly the barriers to constructing great LLMs will be lowered... FOR EVERYONE!
I'd love for you to be right. I'm currently slightly more bearish on DeepSeek's capabilities. I think more people have caught up and its hard to get ahead.
Let's speak again on 18 February...the day after the Chinese New Year begins and the Year of the Fire Horse starts. I don't think even the techies at DeepSeek will be able to resist using the start of BY FAR the most powerful year in the 60 year Chinese Zodiac to make a statement!
I do not say YOU misunderstand this. Not at all! What I said was THE WRITERS OF THE FT ARTICLE misunderstand what is happening. I suppose my comment merely wanted to suggest that were you to access DeepSeek after R2, I doubt you would encounter any 'jaggedness' between swarm members- One for All; All for One. YSo you would not need to use multiple closed weight LLMs to serve your needs. And there will be no fees involved either.
(Yeah I read it quickly first thing in the morning and misunderstood sorry, I deleted it)
:)
As a non-tech idiot who regularly gets in way over my head on vibecoding projects, I've developed a clunky method of consulting multiple models when the one I'm working with gets stuck or seems off base. I ask it to write up a memo describing the bug or strategic question or whatever and paste that into the other two of the ChatGPT, Claude, Gemini "Crew" and into a new instance of whatever one I'm working with (with the instruction that it's a naive model who should ignore any context that it comes across). Then I share the results with the model I've been working with -- and sometimes "fire" it and switch to working with another!
Is there a better way of doing the same thing? Meaning either likely to get better results or to take less time/effort. A lot of time the advice is great, especially on bugs. The most maddening things is that models won't tell me about available better solutions that I don't know to ask about.
Love your appearances with Jordan et al. Thanks!
Honestly sounds like about it. There are ways where you can use the API versions + set them up to share context, but that's a bit of effort and I haven't tried it yet even.
Thanks!
Fascinating article! I'm at the other end of the spectrum: as a retired AI researcher, I don't use *any* AI models in my normal life and am much the happier for it :-)
Helpful but this feels more like your cognitive model for routing than a "stack". I was intrigued to learn how your implementing context eng, workflow orchestration, building evals, red teaming, etc!
Most is remarkably simple and yolo, I’m in the early days of figuring one out for Claude cod
Hah I wondered if that might be it, still seems to be the most common answer I less you're at one of the FDE + platform companies trying to sell the platform (Dystil, Invisible, Crew, etc)