Great post as always. It feels like the monolithic nature of the main final product -- a single model -- might be a temporary phenomenon in AI systems and accounts for a lot of the sensitivity of outcomes to exact org structure and even the character of a handful of leaders. I'm curious if you've thought about what a world in which we could have less monolithic artifacts (for example, imagine, say, MoE like model subcomponents that can be worked on in parallel, with router/adapter layers trained separately to mix them into multiple larger systems), and how this might derisk and stabilize modeling efforts.
I think the open ecosystem should be doing this but it’s very unlikely on the current technological art of models. Agents and systems can be composed of many small parts, but this also feels like a very different org to build them
Yeah, makes sense given the current state of the science. I guess I'm thinking about how permanent that is and also how much it bottlenecks progress. There was an interesting Dwarkesh podcast with Jeff Dean and Noam Shazeer recently where Dean argued for a possible future in which big models are compositions of separately trained smaller subcomponents.
Great post as always. It feels like the monolithic nature of the main final product -- a single model -- might be a temporary phenomenon in AI systems and accounts for a lot of the sensitivity of outcomes to exact org structure and even the character of a handful of leaders. I'm curious if you've thought about what a world in which we could have less monolithic artifacts (for example, imagine, say, MoE like model subcomponents that can be worked on in parallel, with router/adapter layers trained separately to mix them into multiple larger systems), and how this might derisk and stabilize modeling efforts.
I think the open ecosystem should be doing this but it’s very unlikely on the current technological art of models. Agents and systems can be composed of many small parts, but this also feels like a very different org to build them
Yeah, makes sense given the current state of the science. I guess I'm thinking about how permanent that is and also how much it bottlenecks progress. There was an interesting Dwarkesh podcast with Jeff Dean and Noam Shazeer recently where Dean argued for a possible future in which big models are compositions of separately trained smaller subcomponents.
There’s research where by training a bigger model you get smaller models out of it, I don’t know the name though