Subscribe
Sign in
Home
Podcast
Navigation
($) Discord
Archive
About
evaluation
Building on evaluation quicksand
On the state of evaluation for language models.
Oct 16, 2024
•
Nathan Lambert
21
Share this post
Interconnects
Building on evaluation quicksand
Copy link
Facebook
Email
Notes
More
Interviewing Riley Goodside on the science of prompting
Listen now | Interconnects interview #6. o1, chain of thought, evaluation, and the future of prompting.
Sep 30, 2024
•
Nathan Lambert
16
Share this post
Copy link
Facebook
Email
Notes
More
1:08:38
On Nous Hermes 3 and classifying a "frontier model"
The latest model from one of the most popular fine-tuning labs makes us question how a model should be identified as a “frontier model.”
Aug 16, 2024
•
Nathan Lambert
19
Share this post
Interconnects
On Nous Hermes 3 and classifying a "frontier model"
Copy link
Facebook
Email
Notes
More
GPT-4o-mini changed ChatBotArena
And how to understand Llama 3.1’s results on the community's favorite benchmark.
Jul 31, 2024
•
Nathan Lambert
31
Share this post
Interconnects
GPT-4o-mini changed ChatBotArena
Copy link
Facebook
Email
Notes
More
Evaluations: Trust, performance, and price (bonus, announcing RewardBench)
Evaluation is not only getting harder with modern LLMs, it’s getting harder because it means something different.
Mar 20, 2024
•
Nathan Lambert
19
Share this post
Interconnects
Evaluations: Trust, performance, and price (bonus, announcing RewardBench)
Copy link
Facebook
Email
Notes
More
Big Tech's LLM evals are just marketing
A PSA everyone needs. The importance of a wait and see attitude when it comes to new models, big and small, open and closed.
Dec 13, 2023
•
Nathan Lambert
26
Share this post
Interconnects
Big Tech's LLM evals are just marketing
Copy link
Facebook
Email
Notes
More
2
Evaluating and uncovering open LLMs
When choosing a model, we're stuck in the middle between classic NLP benchmarks (e.g. MMLU) and qualitative chatbot ranking. Neither are exactly what we…
May 31, 2023
•
Nathan Lambert
13
Share this post
Interconnects
Evaluating and uncovering open LLMs
Copy link
Facebook
Email
Notes
More
2
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts