Evaluation is not only getting harder with modern LLMs, it’s getting harder because it means something different.
Evaluations: Trust, performance, and price (bonus, announcing RewardBench)
Evaluations: Trust, performance, and price…
Evaluations: Trust, performance, and price (bonus, announcing RewardBench)
Evaluation is not only getting harder with modern LLMs, it’s getting harder because it means something different.