Yi Cui
onekq
AI & ML interests
Benchmark, Code Generation Model
Recent Activity
posted
an
update
about 14 hours ago
Here is the post on Muon optimizer. It's getting hard core. I tried to visualize orthogonalization but decided to drop it to avoid miscommunication.
https://huggingface.co/blog/onekq/muon-optimizer
No matter which angle I take, I can't detect slowdown. It's the opposite in fact.
published
an
article
about 15 hours ago
๐ Muon Optimizer: The Power of Collective Momentum
new activity
2 days ago
moonshotai/Kimi-K2-Thinking:If we apply PTQ to a QAT model, what will happen