Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.06842

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 147
Orion-14B: Open-source Multilingual Large Language Models

Paper • 2401.12246 • Published Jan 20, 2024 • 13
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 54
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 46

ML Optimization Papers

about 3 hours ago

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published 12 days ago • 23
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published 18 days ago • 77
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Paper • 2501.06842 • Published 16 days ago • 15
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published 21 days ago • 48

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

Paper • 2501.03916 • Published 21 days ago • 14
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published 20 days ago • 89
Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published 21 days ago • 82
Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published 19 days ago • 81

General Training

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 41
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Paper • 2501.06842 • Published 16 days ago • 15
The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published 19 days ago • 86

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 26
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 41
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs