2 94 29

Maozhou Ge

Gmc2

GHGmc2

AI & ML interests

None yet

Recent Activity

upvoted an article 9 days ago

From GRPO to DAPO and GSPO: What, Why, and How

upvoted a paper 12 days ago

Group Sequence Policy Optimization

upvoted a collection 12 days ago

Qwen3

View all activity

Organizations

None yet

upvoted an article 9 days ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

•

10 days ago

• 11

upvoted a paper 12 days ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published 26 days ago • 289

upvoted a collection 12 days ago

Qwen3

Collection

84 items • Updated 13 days ago • 1.11k

liked a model 13 days ago

openai/gpt-oss-120b

Text Generation • 120B • Updated 5 days ago • 853k • • 3.47k

liked a model 22 days ago

physical-intelligence/fast

Robotics • Updated Jan 16 • 131

New activity in internlm/POLAR-7B 26 days ago

Any plan to open source the dataset?

#8 opened 26 days ago by

Gmc2

upvoted a paper about 1 month ago

Pre-Trained Policy Discriminators are General Reward Models

Paper • 2507.05197 • Published Jul 7 • 39

liked a Space about 1 month ago

Pipeline Parallelism Schedule Visualizer

📊

Visualize pipeline parallelism schedules

upvoted an article about 1 month ago

Article

Mixture of Depth is Vibe

•

Apr 22, 2024

• 48

upvoted an article 2 months ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

•

Oct 7, 2024

• 46

liked a model 3 months ago

deepseek-ai/DeepSeek-R1-0528

Text Generation • 685B • Updated May 29 • 432k • • 2.37k

upvoted a paper 3 months ago

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 68

upvoted an article 4 months ago

Article

Vision Language Models Explained

and 1 other •

Apr 11, 2024

• 437

liked a dataset 4 months ago

hiyouga/geometry3k

Viewer • Updated Apr 14 • 3k • 17.2k • 39

liked a dataset 5 months ago

Dahoas/full-hh-rlhf

Viewer • Updated Feb 23, 2023 • 125k • 634 • 83

liked 2 models 5 months ago

Qwen/Qwen2.5-VL-32B-Instruct

Image-Text-to-Text • 33B • Updated Apr 14 • 444k • • 425

deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27 • 404k • • 3.04k

upvoted a collection 5 months ago

🌾Oat-Zero: Understanding R1-Zero-Like Training

Collection

5 items • Updated Apr 10 • 7

upvoted a paper 5 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200

upvoted an article 5 months ago

Article

How 🤗 Accelerate runs very large models thanks to PyTorch

•

Sep 27, 2022

• 14

Maozhou Ge

AI & ML interests

Recent Activity

Organizations

Gmc2's activity

From GRPO to DAPO and GSPO: What, Why, and How

Any plan to open source the dataset?

Pipeline Parallelism Schedule Visualizer

Mixture of Depth is Vibe

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Vision Language Models Explained

How 🤗 Accelerate runs very large models thanks to PyTorch