Having a discussion with some STEM buddies, not necessarily theoretically deep into ML/AI. But your paper and the questions of scaling seem to formalize some of the discussions we are having about the state and future of ML/AI - especially chatGPT. Cheers.
I don’t claim correctness of understanding, just a lingering uncertainty over how big a model will need to get to get to emergent behavior. And can we get to a lower power calculation technique, or segment the calculation to constrain the compute cost?
* Generally people say for language that the model needs to be >1Billion parameters for interesting things to emerge.
* This number will change substantially for each field imo. We're likely to see some scaling laws for diffusion models (like stable diffusion soon). Doesn't seem like that's done. I like the point that scaling laws make the most sense for generative models https://twitter.com/Thom_Wolf/status/1578505011708907520
Having a discussion with some STEM buddies, not necessarily theoretically deep into ML/AI. But your paper and the questions of scaling seem to formalize some of the discussions we are having about the state and future of ML/AI - especially chatGPT. Cheers.
Let me know if you all have any questions! I hope I can keep getting it right.
I don’t claim correctness of understanding, just a lingering uncertainty over how big a model will need to get to get to emergent behavior. And can we get to a lower power calculation technique, or segment the calculation to constrain the compute cost?
Nick
A few points:
* Generally people say for language that the model needs to be >1Billion parameters for interesting things to emerge.
* This number will change substantially for each field imo. We're likely to see some scaling laws for diffusion models (like stable diffusion soon). Doesn't seem like that's done. I like the point that scaling laws make the most sense for generative models https://twitter.com/Thom_Wolf/status/1578505011708907520
* Lots of techniques to lower cost, for example see this post on inference https://lilianweng.github.io/posts/2023-01-10-inference-optimization/