Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Paper • 2406.12644 • Published Jun 18, 2024 • 5
Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization Paper • 2406.15708 • Published Jun 22, 2024 • 1
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 188
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 123
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper • 2503.16252 • Published 14 days ago • 27
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published 20 days ago • 81
Gemma 3 Collection All versions of Google's new multimodal models in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. • 29 items • Updated 4 days ago • 48
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol Paper • 2503.05860 • Published 27 days ago • 9
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published Feb 25 • 71
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published Sep 5, 2024 • 35
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? Paper • 2502.13233 • Published Feb 18 • 14