1 65 7

Shuai Wang

Shuaiii

AI & ML interests

None yet

Recent Activity

upvoted a paper about 7 hours ago

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

upvoted a paper 1 day ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

liked a dataset 2 days ago

nvidia/Llama-Nemotron-Post-Training-Dataset

View all activity

Organizations

None yet

Shuaiii's activity

upvoted a paper about 7 hours ago

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Paper • 2505.03318 • Published 1 day ago • 63

upvoted a paper 1 day ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published 3 days ago • 21

upvoted a paper 12 days ago

Step1X-Edit: A Practical Framework for General Image Editing

Paper • 2504.17761 • Published 13 days ago • 86

upvoted a paper 21 days ago

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published 22 days ago • 16

upvoted a paper 22 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 23 days ago • 255

upvoted a paper 23 days ago

Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

Paper • 2504.07866 • Published 27 days ago • 10

upvoted a paper 26 days ago

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13 • 36

upvoted a collection 26 days ago

InternVL3

Collection

34 items • Updated 18 days ago • 65

upvoted a paper 26 days ago

Kimi-VL Technical Report

Paper • 2504.07491 • Published 28 days ago • 125

upvoted 3 papers 28 days ago

upvoted a paper 29 days ago

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published about 1 month ago • 102

upvoted a collection about 1 month ago

Llama 4

Collection

Llama 4 release • 13 items • Updated 9 days ago • 479

upvoted 2 papers about 1 month ago

Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published Apr 3 • 54

Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published Apr 1 • 29

upvoted a collection about 1 month ago

Meta's Llama 3.1 models & evals

Collection

17 items • Updated Dec 13, 2024 • 146

upvoted an article about 1 month ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Mar 12

• 406

upvoted a paper about 1 month ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 150