Ive never even heard of the benchmarks you are using. Except for the first one (IFEval)
Here is a good list of benchmarks to use for your models.
· Sign up or log in to comment