Text Generation
Safetensors
English
medical

Please submit this model to the Open LLM Leaderboard

#1
by grimjim - opened

The leaderboard is located here.
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/

I performed a merge of two o1 models, including yours, and hit an unusually high MATH benchmark of 33.99%.
I posit that your model may be highly capable in mathematical reasoning despite the focus being on medical reasoning.

There's an issue with submitting it to the leaderboard, see here: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/1055

Personal merge tests with this model showed very high BBH and MMLU-PRO benchmarks, so I'd expect Skywork has hidden math performance.

@grimjim The issue should be fixed on their end! Now we just need to upvote the model: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/vote

If some votes could also be thrown at my models too I'd appreciate it! 🌴

Sign up or log in to comment