view article Article Distributed Training with JAX and Flax NNX: A Practical Guide to Sharding By jiagaoxiang • Mar 26 • 6
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Apr 28 • 119
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 105
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published Dec 23, 2024 • 33