view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 338
Scaling Test-Time Compute with Open Models Collection Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated Jan 6 • 27
HMoE: Heterogeneous Mixture of Experts for Language Modeling Paper • 2408.10681 • Published Aug 20, 2024 • 9