Van os

Vanos007

AI & ML interests

None yet

Recent Activity

upvoted a collection about 2 hours ago
Llama 4
reacted to lianghsun's post with πŸ‘ about 2 hours ago
With the arrival of Twinkle April β€” Twinkle AI’s annual open-source celebration held every April β€” our community is excited to unveil its very first project: πŸ“Š Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin . Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 β€” for example, evaluating LRMs on the https://huggingface.co/datasets/ikala/tmmluplus benchmark could take * half a day without finishing. One question we were especially curious about: Does shuffling multiple-choice answer order impact model accuracy? πŸ€” β†’ See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1 To address these challenges, Twinkle Eval brings three key innovations to the table: 1️⃣ Parallelized evaluation of samples 2️⃣ Multi-round testing for stability 3️⃣ Randomized answer order to test robustness After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15Γ— πŸš€πŸš€. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance β€” suggesting further benchmarking is needed. This framework also comes with additional tunable parameters and detailed logging of LM behavior per question β€” perfect for those who want to dive deeper. πŸ˜† If you find Twinkle Eval useful, please ⭐ the project and help spread the word πŸ€—
liked a dataset 3 days ago
jed351/Chinese-Common-Crawl-Filtered
View all activity

Organizations

Samsung Electronics's profile picture GEM benchmark's profile picture Turing's Solutions's profile picture OpenGVLab's profile picture fast.ai community's profile picture OpenVINO Toolkit's profile picture ONNXConfig for all's profile picture Gradio-Themes-Party's profile picture scikit-learn's profile picture Open-Source AI Meetup's profile picture lora concepts library's profile picture Subspace Network's profile picture Arabic Machine Learning 's profile picture Platzi Community's profile picture Kornia AI's profile picture UniversitΓ© Dauphine-PSL's profile picture Keras Dreambooth Event's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture The Waifu Research Department's profile picture Musika's profile picture Blog-explorers's profile picture OpenSky's profile picture Falcons.ai's profile picture ICCV2023's profile picture Media Party 2023's profile picture huggingPartyParis's profile picture Team Tonic's profile picture The Collectionists's profile picture Niansuh AI's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture Project Fluently's profile picture Women on Hugging Face's profile picture Nutanix's profile picture Major TOM's profile picture MLX Community's profile picture Narra's profile picture Social Post Explorers's profile picture C4AI Community's profile picture ginigen's profile picture Chinese LLMs on Hugging Face's profile picture Nerdy Face's profile picture VIDraft's profile picture Data Is Better Together Contributor's profile picture iiix-gt's profile picture

Vanos007's activity