Why I build open language models

Oct 30, 2024

Reflections after a year at the Allen Institute for AI and on the battlefields of open-source AI.

11 Comments

Yaroslav Bulatov

Oct 30, 2024Edited

Hi Nathan, I enjoyed this post :) Also, I'm very impressed with the work AI2 is doing. After watching Hanna's talk at COLM, we got excited about OLMO and we started #ext-ai2-together collaboration channel between AI2 and Together (join!)

I have a more benign view of Meta's role in this. First of all, I'm passionate about open-source AI, writing about it 5 years ago for instance (https://medium.com/@yaroslavvb/large-scale-ai-and-sharing-of-models-4622ba59ec18). Also, I had worked at Google and Meta before Together.AI, which gives me an internally fine-tuned model of their operations.

Meta has a different source of funding from non-profits, they don't compete for it against AI2. Llama success shouldn't hamper development of truly open models. If anything, it should help in development -- having a "partially-open" foundation model at your fingertips should make it easier to build a great "truly open" foundation model. Imagine building a car from scratch, vs having a prototype you can fully disassemble to get inspiration.

There's an LLM built into every one of Meta's products nowadays, so they have to train a high-quality LLM anyway. Once the model is trained, it doesn't cost them anything to open-source it, so why not? This is similar to how Meta motivates open-sourcing their hardware designs, around the Open-Compute project.

Average tenure at Meta is <2 years, short relative to other Big Tech, so making an internal project open-source gets the benefit of more productive employees. People work harder knowing they can benefit from their work after they leave the company. Also, I feel like engineers can influence high-level decisions by obtaining high-level buy-in for decisions that ultimately also benefit them. It would be a strange coincidence that Meta relinquished control of PyTorch in favor of Linux Foundation shortly before the major players left PyTorch team to start Fireworks. In addition of what Zuck says, there are many people inside Meta who are personally invested in having the weights of Llama publicly available, and they know how to make a case internally.

Google is secretive about their hardware designs because they are in the business of selling compute, Meta is not, so they share. OpenAI is secretive about their model weights because they are in the business of selling access to these weights, Meta is not, so they share. On the other hand, Meta would not voluntarily share the social network graphs they discovered or the media they've been uploaded, because that's their core business.

Expand full comment

Nov 16, 2024Edited

Somehow missed this on the day! But I agree, its a good addition to the post. Culture matters.

OLMo is hugely helped along by Llama. If Llama 4 doesn't come out, my motivation for OLMo will rise, but it will only be harder.

To be clear, I don't think Meta is malicious, but I can see how it becomes pitted against eachother.

Expand full comment

Michael Spencer

These were really good insights.

Expand full comment

I broadly agree with this post that the benefits of open source AI currently clearly outweigh the risks, and have used the OLMo checkpoints in my own research. However, I do think that the statement:

"Many of the risks that “experts” expected open-source models to proliferate, from disinformation, biosecurity risks, phishing scams, and more have been disproven"

does not reflect the paper you cite, which instead shows that existing work is insufficient to prove that the risks exist. Furthermore, its been a year since high quality open source models have been released for LLMs, and it will be years until we know all the ways these models can be used. It could be that Llama3 405B with the right finetuning and a voice model is so good at running phone based spearphishing scams that Meta comes under fire for releasing it. Maybe it requires learning how to browse the web on a dataset opensourced tomorrow. We likely won't know until it starts happening or a red team attempts to do this and fails(which has happened for some use cases, like biosecurity risks)

Despite this, it is critical is that FULLY open models like OLMo keep up because the difference for bad actors between just Llama being and released and both OLMo and Llama being released in minimal, but the difference for researchers is enormous.

Thanks for reading!

Expand full comment

Sure, I do think the specific framing they use of as "marginal risks" is the technically correct one, and one thats held relative to closed models.

I should escalate to the extreme option of "regulating in the US isnt worth it because there'll be open models somewhere, so its better if they're here than China / Middle East"

Expand full comment

But I disagree on the final point. Even if Llama stops being released we should keep releasing OLMo, barring any very big changes in the ecosystem. One of those things thats hard to forecast, but really lots of loud risks come and go.

Expand full comment

I must be in a kind of snarky mood reading my replies, but hopefully you can tell I'm a bit more moderate and agree on the technicality!

Expand full comment

0. I currently agree that open models keeping up with closed models is important, because we have no choice but to adapt to a world where bad actors have access to powerful foundation models, and open models allows for a lot more researchers to work on finding problems / solutions.

1. I currently think the marginal risk framing is correct, but I think there's a marginal risk argument against open models, which is that putting the cat back into the bag is (marginally) easier with closed models. For example, if in the future we make it illegal for voice models to imitate humans, closed models provide points of failure that can be held accountable.

1.5. Right now I think this can mostly be ignored, as regulation can target businesses that use OS software for malign purposes. Right know I believe that one gal and a GPU is not a part of anyone's threat model.

2. I do think there's a separate point, that absence of evidence is not evidence of absence in a field that is as fast moving as this one. Optimizing models for downstream use cases, good or ill, takes time.

3. For the above reasons, I think the marginal risk of OLMo vs Llama is essentially 0 but the marginal risk of Llama is nonzero, but still worth it. So I agree at large, but do think that saying "really lots of loud risks come and go" is tempting fate...

Expand full comment

Congrats 🙂

Turning 30 myself in a few months, fun to think about what could have been (and could be).

Hope that I'll be in a place to be useful, given a few years. Going to try to look closer to see what "crumbs" are being left behind, as learning experiences: "We are making them easy to build on and understand, so entirely new research groups can pick up ideas we didn’t have time to finish".

Expand full comment

happy birthday! and thanks for doing what you do!

Expand full comment

Nathan - i have build a working prototype that synthesizes responses from multiple ais. Would love to discuss and have your feedback. Thanks Bayant

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts