Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance Paper • 2502.11578 • Published 12 days ago
Large Language Models and Mathematical Reasoning Failures Paper • 2502.11574 • Published 12 days ago • 3
Large Language Models and Mathematical Reasoning Failures Paper • 2502.11574 • Published 12 days ago • 3 • 3
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance Paper • 2502.11578 • Published 12 days ago • 2
view reply Bring back open source hackathons / challenges. You don't even have to handle things. I would love to host a hackathon about AI in the medical domain with the help of Huggingface.
Swedish Medical Benchmark Collection A collection of resources related to evaluating of LLMs in the Swedish Medical Domain • 2 items • Updated Sep 2, 2024