We disagree on what open-source AI should mean

... and that's okay. How to read what multiple people mean by the word openness and see through the PR speak.

Apr 03, 2024

We’re so early in the process of defining community norms and standards around open LLMs that the goals of the people involved are not clear at all. Open source means a lot of things to a lot of different people, and this is by design. The guiding principles and rules of open source make it succeed when multiple groups dictate its path. I’ve written a lot about how an open-source LLM should be described (definitions) and what the situation is with crucial discussion points around them (e.g. bio-risk, licenses, etc), but just as important to the development of the technology is the who and the why.

Recently, MIT Technology Review released an article with a title arguing the opposite of this post — that definitions of open-source AI are problematic. The authors argue that without change in the current trajectory, big tech will define open-source AI to be what suits it. The article generally covers recent events well, but the title is another contributor to the narrative against open-source AI that I’m trying to counteract. Given that open-source projects are designed to involve many people, disagreement on definitions is part of the process and expected. Yes, we should better support bodies establishing standards, but no, we shouldn’t be surprised about drawn out disagreements.

One of my most memorable open-source memories from working at HuggingFace was deciding the path for our open-source reinforcement learning from human feedback (RLHF) library TRL given the relatively established nature of Carper AI’s TRLX (a Stability AI lab at the time). At the time it looked like we were trying to revive a long archived repository and Carper had all the major boxes checked. Thom Wolf, co-founder and lead of the open-source efforts, reminded me that TRLX’s success was not a problem for us because “open-source thrives in multiplicity” (or something like this). The core point is that open initiatives win in the long term because multiple groups and motives are involved.

While similar at first glance, the motives of Stability AI and HuggingFace in the RLHF space could seem quite similar and simple: support the open RLHF ecosystem. In fact, the two companies had (and have today) quite different goals. Stability AI was largely focused on enterprise contracts and scaling, while HuggingFace is focused on growing the ecosystem with tight ties to their Hub product. Looking back now, TRL has much more sustained use than TRLX. Thom didn’t see this specific timeline, but he probably saw many different ways with the same outcome — both of the libraries will matter.

Today, the open-source LLM ecosystem, which includes everything from training to serving to monitoring open-weight LLMs, includes parties with many different motivations. When I analyze the space of open LLMs, I see proponents that generally fall into 4 buckets (and if I’m missing any, please let me know). This list is ordered by those I’m most exposed to, which is correlated with those making the most noise. I don’t think it is ranked by those that are most important.

1. Effective Accelerationists, Techno-Optimists, capitalists, etc.

This is the group of people that states that getting AI the fastest is the “best” route for humanity, without much evidence, where their revealed preference tends to be that it’ll make them the most money. I agree that getting AI faster rather than slower (or pausing) is likely the better direction to go, but I had to distance myself from due to some extreme stances or toxic behavior by leading individuals. A lot of the things I stand for in my work were listed in the “enemies” list of the Techno-Optimist Manifesto. I’m fine with Marc (a loyal subscriber) and I disagreeing.

The reasonable face of this movement is Zuckerberg, who stated plainly why Meta is pursuing open AI in an earnings call. The most reasonable reason is to kneecap competition and support their primary revenue centers which benefit from having more compelling content. I’ve written at length about why leaning open is better for businesses trying to find product-market fit, but the energy of this movement has strayed beyond the scope of my writing or Zuck’s comments.

2. Scientists, promoting understanding and transparency

The position that I’ve held for some time is that we should be as open as we can while still safely learning about the technology. The we in this statement is the entire scientific community and the general public, which can only be facilitated with a healthy open-source ecosystem.

The goal of promoting scientific understanding for the betterment of society has a long history. Recently I was pointed to the essay The Usefulness of Useless Knowledge by Abraham Flexner in 1939 which argued how basic scientific research without clear areas for profit will eventually turn into societally improving technologies. If we want LLMs to benefit everyone, my argument is that we need far more than just computer scientists and big-tech-approved social scientists working on these models. We need to continue to promote openness to support this basic feedback loop that has helped society flourish over the last few centuries.

A good motivation for how opening up AI can improve society is through the major breakthroughs in ML for science in the last decade — AlphaFold for protein folding and GraphCast for weather forecasting are just the tip of the iceberg.

Such access is the only way we have a full understanding of the risks of these models. While the API labs will thoroughly cover all the risks they’re exposed to, a long tail of public researchers will make sure we are ready for the unknown unknowns from AI that face our society. Governments, those spending the most to protect us, need to know how this stuff works. Open science fights concentration of expertise, which is the leading indicator of concentration of knowledge.

Thanks, Matt Salganick for the comments that helped motivate this section. Matt has done some interesting work on the role of data access to participants in studies, relative to the advancement of science.

3. Inclusion, public interest, and fighting concentration of power

This section is largely led in my brain by my former HuggingFace colleague, Irene Solaiman, but it is very obvious once you see it, and quite related to the goals of open science. If we don’t have open infrastructure around AI systems, there is no way we pull new people into the modern technology-driven society of today. Some of the major successes of open LLMs have been around enabling multilingual research and expansion of the technology into low-resource languages. This is an obvious starting point for the popular models of today, given they’re about language, but this is a door that we don’t want to close. This motivation is largely the upside to the relevant downside I’m hearing in policy circles, concentration of power.

The evolution of technology in the last decades has led to the biggest and most powerful companies the world has ever seen. All signs point to AI continuing this trend given the large capital costs of playing. While there isn’t a clear strategy in it, committing to openness does seem like the easiest way to degrade their business moat to the physical fundamentals (GPUs). Given the large political interest in anti-trust, I’m not surprised that this is top of mind among some policy groups.

4. Freedom advocates

The Columbia Convening on Openness and AI brought together leading folks from the open LLM ecosystem with those who defined the term open-source software (OSS) in the 1990s (and plenty of people in between). This was a great experience, and one of the things I was most surprised by was the pushback from some of the older folks in the room that we weren’t pushing the freedom argument of open-source AI enough. The idea that you should be able to own your own technology stack was a big motivation for OSS, but hasn’t really found a specific footing in the modern cultures of open-source AI. In other domains than language, such as image generation, the scale of compute is not as dominant of a factor, so this argument is likely to catch up as the language-centric lens wears off.

Each of these groups would prioritize different things when building the infrastructure powering the open ecosystem of the world. Some players, like HuggingFace, actually cover quite a few of these camps. Others are easier to put into one bucket. Meta falls in the first bucket and AllenAI falls in the same bucket as myself. To succeed in the long term, we need the tension of different goals to make sure that the open ecosystem supports many different methods and motivations of AI.

Dissecting “openness,” the new substitute for open-source

The word openness has replaced the phrase open-source among most leaders in the open AI movement. It’s the easiest way to get across what your goals are, but it is not better in indicating how you’re actually supporting the open ecosystem. The three words that underpin the one messy word are disclosure (the details), accessibility (the interfaces and infrastructure), and availability (the distribution).

For any open release, the values can easily be analyzed from this lens. For xAI’s Grok model that dropped a week or two ago, with 314 billion parameters and no real documentation, it maximized only on availability with no attempt at the others. For releases like OLMo or Pythia that try to release everything at model sizes that can run on consumer GPUs, almost every category is maximized. For releases defined by gating, availability can be limited. The trio of these makes it easy to see which groupings of “open-source” the organization releasing the model belongs to.

Most of the companies that use openness as a public relations strategy lead with availability. Organizations focused on the long-tail impact of the technology will put more focus on accessibility. Organizations focusing on the politics of openness will focus on disclosure, such as policy briefs detailing the impacts of their models.

Much like the definitions I wrote in my last post on open-source LLMs, I expect to be using this taxonomy for a long time (or a refined version of it). It’ll make the actions of any existing or new player in the open LLM space less surprising. With fewer surprises, it’ll be easier to build the ecosystem we’re most excited about.

Thanks to William Isaac of DeepMind for coming up with these three words I’ve fleshed out.

Newsletter stuff

Three particularly important news items for you all this week:

Google altered the Gemma license to remove the term that you “need to try and update” your model.
A crazy backdoor was revealed in the open source package XZ that shows why open source is useful to security of technology (more eyes). Coverage from Ben Thompson (paywall) and Evan Boehs.
An important milestone in the history of LLM-based chatbots was passed this week. Claude 3 Opus has passed GPT-4 Turbo on LMSYS’s ChatBotArena. Second, the smaller and “efficient” Claude 3 Haiku passed some GPT-4 models from last summer. OpenAI held the top spot for 321 days, pretty much the entire lifespan of the leaderboard. We’ll see how long stints last in the future!

Elsewhere from me

We got our first closed lab model addition to RewardBench. It’s a great benchmark for open models to chase after (Thx. friends at Cohere).
I was on a multi-organization submission to the NTIA about openness in AI, more on the group can be found on the IAS website.
I gave a talk summarizing this post and recent discussion on openness. Slides are here.
I gave a talk on RewardBench, slides are here (recording coming soon).

Models, datasets, and other tools

Qwen, makes of one of the strongest chat models right now — Qwen 1.5 72B Chat, released an MoE model too (2.7B active parameters). Most of us found it via the addition to HuggingFace Transformers. More is on their website.
AI21 released Jamba, a scaled version of “Mamba” even though it has attention, which makes it like a scaled StripedHyena with the RNN part being mamba-style, which got a ton of praise. I’m yet to see a demo. Comments from the author of Mamba.
InternLM2 released a nice technical report, but people are suspicious of its MMLU score.

Links

A few academia vs. industry and wellbeing posts crossed my desk this week. Take | care | of yourself.

A good podcast with the CEO of Mistral — particularly on sovereignty and why giving people weights is actually a business strategy.

Housekeeping

Audio of this post in podcast form or on YouTube.
My real podcast is at retortai.com.
Paid subscriber Discord access in email footer.
Referrals → paid sub: Use the Interconnects Leaderboard.
Student discounts in About page.