-
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 53 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 111 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 142 -
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Paper • 2501.12370 • Published • 11

Av
Avi66
·
AI & ML interests
ML Research , LLMs , Applications
MultiModality
Recent Activity
updated
a collection
10 days ago
Vlm
updated
a collection
23 days ago
TTS
updated
a collection
23 days ago
TTS
Organizations
None yet
Collections
5
models
None public yet
datasets
None public yet