4 Comments
Feb 2, 2023Liked by Nathan Lambert

Having a discussion with some STEM buddies, not necessarily theoretically deep into ML/AI. But your paper and the questions of scaling seem to formalize some of the discussions we are having about the state and future of ML/AI - especially chatGPT. Cheers.

Expand full comment
author

Let me know if you all have any questions! I hope I can keep getting it right.

Expand full comment

I don’t claim correctness of understanding, just a lingering uncertainty over how big a model will need to get to get to emergent behavior. And can we get to a lower power calculation technique, or segment the calculation to constrain the compute cost?

Nick

Expand full comment
author

A few points:

* Generally people say for language that the model needs to be >1Billion parameters for interesting things to emerge.

* This number will change substantially for each field imo. We're likely to see some scaling laws for diffusion models (like stable diffusion soon). Doesn't seem like that's done. I like the point that scaling laws make the most sense for generative models https://twitter.com/Thom_Wolf/status/1578505011708907520

* Lots of techniques to lower cost, for example see this post on inference https://lilianweng.github.io/posts/2023-01-10-inference-optimization/

Expand full comment