arxiv:2501.11873
Kaiyue Wen
KaiyueWen
AI & ML interests
None yet
Recent Activity
authored
a paper
7 days ago
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To
Achieve Better Generalization
authored
a paper
7 days ago
RNNs are not Transformers (Yet): The Key Bottleneck on In-context
Retrieval
authored
a paper
7 days ago
Demons in the Detail: On Implementing Load Balancing Loss for Training
Specialized Mixture-of-Expert Models
Organizations
None yet