Spaces:
Running
on
CPU Upgrade
models not being evaluated ?
I submitted i think 6 modles to be evaluated and they did not get evaluated despite saying they were being in the queue ?
They said evaluating even they took perhaps 10 days then i could not find them on the final evaluated list?
hat happened did you drop a load of models ?
I resubmitted 10 models ??? I never submitted so many models before ? is this too much models ? was it 5 per week before ? as the board seems overloaded with specific people ... leaving it impossible to get a benchmark result ???
Do you only benchmark the paid models ? Or enterprise accounts now ?
Im lost Help out !
https://huggingface.co/datasets/open-llm-leaderboard/requests/tree/main/LeroyDyer
Look here what happened to eval.
For results https://huggingface.co/datasets/open-llm-leaderboard/results/tree/main/LeroyDyer
It's take some time to results appear on leaderboard, instead you can use https://huggingface.co/spaces/open-llm-leaderboard/comparator right away if results are ready.
Thanks friend . For those links . But the models that I had submitted did not àpwar on the list . Currently the models I submitted seem to be on the evaluating link on the submit models . As I had resubmitted the models plus some more to see if the board had crashed .. it accepted them so they obviously did not get evaluated last time ....plus there was a massive rush on the board ....? So I hope.tjwse model will.indeed be evaluated so I can see if these new rewards styled training works ....now I understand now to make rewards for the target formats ....they might still score quite low on benchmarking but are truly great performers ....
Today none of the models that were submitted can be found on the list ?? Somehow every model failed
My models failed too(CUDA error if it's same as previous case). Link requests that failed here, if guys from open_llm_leaderboard will be free they usually manually reruns evals, don't forget say thx. Ofc would be better if they fixed stuff but for free service like this I can only say thx, give feedback and wait.
aha !!
I think what i really need is a colab .... on using the eluther harness as i could not figure it out... then i could run my own benchmarks without ther leaderboard !!
As for most models cannot run locally i have to run them on colab , so i only train the 7b models .. but i really do intensive task training on my model which does not reflect in the bench , but the bench is also a important marker to see that the models other functionalitys did not change .... as on past datasets i only train 5 steps with a batch of 256 ! and always randomize my dataset sample ... so after training a task ... i run all the past datsets 5 steps to keep everything aligned ..
I think revcently i also merged one of your models and called it RP (Role Player) as wehn i was doing the rewards training on the merged model , it as making up all sorts of strange musings lol !
So benchmarking is important as it lets me know which model was strong, and were is should begin the next set of tasks !...
My models failed too(CUDA error if it's same as previous case). Link requests that failed here, if guys from open_llm_leaderboard will be free they usually manually reruns evals, don't forget say thx. Ofc would be better if they fixed stuff but for free service like this I can only say thx, give feedback and wait. ....
I share the same sentiments ...
So intruth the leader board seems to have actually crashed ~ as the models could not truly fail; my friend it can only be the cloud or hardware they are using and not the models as they load and Work fine !