Discussion about this post

User's avatar
Chef abee Crombie's avatar

wow. for 7B, the thinking model is pretty good. the foundational models are amazing for general tasks but if you can fine tune a 7b beast like this to your domain/problem i'll always err on the smaller open models for repeatable results.

Expand full comment
JayNing's avatar

It seems that during the PyTorchCon presentation, it was mentioned that qk-norm might affect long context, yet it's still retained in Olmo 3. Was there any discussion about qk-norm?

Expand full comment
15 more comments...

No posts