Post
1025
π Early results on the 8B evaluation model we've been training...
@NinaCalvi wrote about the progress we've made this quarter towards training the best 'LLM-as-a-judge' evaluator. We've significantly improved against the baseline and are approaching state-of-the-art evaluation performance with an 8B model.
Next up: training Llama-3.1-70B π
Here's the full article: https://www.atla-ai.com/post/evaluating-the-evaluator
@NinaCalvi wrote about the progress we've made this quarter towards training the best 'LLM-as-a-judge' evaluator. We've significantly improved against the baseline and are approaching state-of-the-art evaluation performance with an 8B model.
Next up: training Llama-3.1-70B π
Here's the full article: https://www.atla-ai.com/post/evaluating-the-evaluator