Certain models perhaps clogging up the leaderboard?, Check logs?

#25
by CombinHorizon - opened

it appears some of the same models have been stuck in the running state for months, could they be clogging up the leaderboard?
for the Hugging Face Open LLM Leaderboard, GPTQ being mis-submitted as float32/16 model clogged it months ago,
I don't know what exactly is going on, but could this be the case, would you plz look into it?

could a time-out of 1-3 months be set in place ?
so that no model runs for too long, so that incompatible / misconfigured models (or it maybe leaderboard's framework setup not necessarily the models's fault) are put in the failed category, so that other models get to run, instead of being held up in the queue

one concern are large models (60B+) being run with float32 precision, could they be run with bfloat16, or if they are natively float16, with float16 percision?)
also meta-llama is a gated model, could that be cause of delay? (llama-2 is months old by now)

see:
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/upstage/SOLAR-0-70b-16bit_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/bigscience/bloom_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/augtoma/qCammel-70-x_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/h2oai/h2ogpt-4096-llama2-70b-chat_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/meta-llama/Llama-2-70b-chat-hf_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/meta-llama/Llama-2-70b-hf_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/TheBloke/Falcon-180B-Chat-GPTQ_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/upstage/SOLAR-0-70b-16bit_eval_request_False_float32_Original.json
https://huggingface.co/datasets/hallucinations-leaderboard/requests/blob/main/stabilityai/StableBeluga2_eval_request_False_float32_Original.json

please look in to the logs, esp for long running models, to glean any insight on the situation, for the next step forward?

would you cancel the running float32 runs,
to reduce compute load, esp since the apparently have been stuck for a long time (weeks, months), (save compute time/memory)

the following model's native precision is lower:

GPTQ:

  • TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
  • TheBloke/Falcon-180B-Chat-GPTQ
  • compressed-llm/vicuna-13b-v1.3-gptq

bfloat16:

  • ai4bharat/Airavata
  • bigscience/bloom
  • google/gemma-7b-it
  • meta-llama/Meta-Llama-3-8B
  • stanford-oval/Llama-2-7b-WikiChat

float16:

  • augtoma/qCammel-70-x
  • h2oai/h2ogpt-4096-llama2-70b-chat

the others are float32 models, but bfloat16 is prob the closest with minimal loss of presision (prob ~0.2-1.8 percent difference maybe)

perhaps maybe submit the models with a more suitable precision unless it fails, next time...

Sign up or log in to comment