Jialiang Cheng

Julius-L

AI & ML interests

None yet

Recent Activity

liked a dataset 11 days ago

Salesforce/wikitext

liked a dataset 20 days ago

allenai/tulu-3-sft-mixture

liked a dataset about 1 month ago

HuggingFaceFW/fineweb

View all activity

Organizations

None yet

upvoted an article 2 months ago

Article

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

•

Feb 4

• 14

upvoted a collection 5 months ago

🧠 Reasoning datasets

Collection

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 164

upvoted a collection 6 months ago

Deepseek Papers

Collection

Deepseek papers collection • 24 items • Updated 1 day ago • 268

upvoted a paper 7 months ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10 • 53

upvoted 15 papers 10 months ago

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

Paper • 2410.20650 • Published Oct 28, 2024 • 17

A Survey of Small Language Models

Paper • 2410.20011 • Published Oct 25, 2024 • 45

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

Paper • 2410.19313 • Published Oct 25, 2024 • 19

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

Paper • 2408.07666 • Published Aug 14, 2024 • 3

Memory-Efficient LLM Training with Online Subspace Descent

Paper • 2408.12857 • Published Aug 23, 2024 • 14

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

Paper • 2409.12903 • Published Sep 19, 2024 • 23

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 180

What Matters for Model Merging at Scale?

Paper • 2410.03617 • Published Oct 4, 2024 • 9

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published Oct 3, 2024 • 51

upvoted a paper 11 months ago

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Paper • 2409.20566 • Published Sep 30, 2024 • 57

Jialiang Cheng

AI & ML interests

Recent Activity

Organizations

Julius-L's activity

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons