VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? Paper • 2404.05955 • Published Apr 9, 2024
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published Jul 15, 2024 • 25
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories Paper • 2410.07706 • Published Oct 10, 2024
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published 23 days ago • 77
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published Sep 11, 2024 • 15
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published 23 days ago • 77
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published 23 days ago • 77 • 6
A Comprehensive Survey on Long Context Language Modeling Paper • 2503.17407 • Published Mar 20 • 49
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated 16 days ago • 147
Running 2.66k 2.66k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 76
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 32