When choosing a model, we're stuck in the middle between classic NLP benchmarks (e.g. MMLU) and qualitative chatbot ranking. Neither are exactly what we want.
Just FWIW you might want to call it Chatbot Arena rather than ChatArena because that’s how they stylize it on the site, and there’s also another project with the name ChatArena (multi agent language games for LLMs)
Just FWIW you might want to call it Chatbot Arena rather than ChatArena because that’s how they stylize it on the site, and there’s also another project with the name ChatArena (multi agent language games for LLMs)
Ah thanks, adding a note at least