I ran 580 experiments (yes, 580 ๐คฏ) to check if we can quantify data drift's impact on model performance using only drift metrics.
For these experiments, I built a technique that relies on drift signals to estimate model performance. I compared its results against the current SoTA performance estimation methods and checked which technique performs best.
The plot below summarizes the general results. It measures the quality of performance estimation versus the absolute performance change. (The lower, the better).
Inspired by the awesome work from @mlabonne, I created a Space to monitor the narrowing gap between open and proprietary LLMs as scored by the LMSYS Chatbot Arena ELO ratings ๐ค
The goal is to have a continuously updated place to easily visualize these rapidly evolving industry trends ๐