Interconnects
Interconnects
(Voiceover) Building on evaluation quicksand
0:00
-16:36

(Voiceover) Building on evaluation quicksand

On the state of evaluation for language models.

Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand

Chapters

00:00 Building on evaluation quicksand

01:26 The causes of closed evaluation silos

06:35 The challenge facing open evaluation tools

10:47 Frontiers in evaluation

11:32 New types of synthetic data contamination

13:57 Building harder evaluations

Figures

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp


Discussion about this episode