13 49 58

Tong Zhu

Spico

https://Spico197.github.io

AI & ML interests

Information Extraction, Mixture-of-Experts, LLM

Recent Activity

upvoted a paper 20 days ago

TEMPO: Scaling Test-time Training for Large Reasoning Models

upvoted a paper 20 days ago

PlayCoder: Making LLM-Generated GUI Code Playable

upvoted a paper about 1 month ago

GEMS: Agent-Native Multimodal Generation with Memory and Skills

View all activity

Organizations

upvoted 2 papers 20 days ago

TEMPO: Scaling Test-time Training for Large Reasoning Models

Paper • 2604.19295 • Published 21 days ago • 34

PlayCoder: Making LLM-Generated GUI Code Playable

Paper • 2604.19742 • Published 21 days ago • 26

upvoted a paper about 1 month ago

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published Mar 30 • 85

upvoted an article 2 months ago

Article

Your MoE Model Does Not Have to Select Fixed Number of Experts

Spico

•

Feb 26

• 7

upvoted an article 3 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

lysandre, ArthurZ, cyrilvallez, reach-vb

•

Dec 1, 2025

• 310

upvoted 3 papers 3 months ago

upvoted 3 papers 4 months ago

MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

Paper • 2601.11969 • Published Jan 17 • 27

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Paper • 2601.11655 • Published Jan 15 • 63

Toward Efficient Agents: Memory, Tool learning, and Planning

Paper • 2601.14192 • Published Jan 20 • 57

upvoted an article 4 months ago

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

loubnabnl, anton-l, davanstrien

•

Mar 20, 2024

• 113

upvoted a paper 4 months ago

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published Dec 30, 2025 • 52

upvoted 2 papers 5 months ago

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper • 2511.21689 • Published Nov 26, 2025 • 126

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 245

upvoted 2 papers 6 months ago

P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published Nov 17, 2025 • 134

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Paper • 2511.13704 • Published Nov 17, 2025 • 44

upvoted an article 6 months ago

Article

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

codelion

•

Nov 3, 2025

• 65

upvoted 2 papers 6 months ago

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 242

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published Oct 30, 2025 • 87

Tong Zhu

AI & ML interests

Recent Activity

Organizations

Spico's activity

Your MoE Model Does Not Have to Select Fixed Number of Experts

Transformers v5: Simple model definitions powering the AI ecosystem

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix