5 38 18

Lijun Wu

apeters

https://apeterswu.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language

updated a dataset about 1 month ago

opendatalab/Sci-Base

updated a dataset about 2 months ago

opendatalab/Sci-Base

View all activity

Organizations

upvoted a paper 4 days ago

BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language

Paper • 2606.22138 • Published 7 days ago • 24

upvoted a paper 2 months ago

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

Paper • 2604.10480 • Published Apr 12 • 20

upvoted 2 papers 3 months ago

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

Paper • 2604.04771 • Published Apr 6 • 124

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 134

upvoted a collection 3 months ago

ODA-Scored

Collection

ODA-Scored-Data by implemented multiple data scores. • 2 items • Updated Mar 31 • 3

upvoted 2 papers 4 months ago

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

Paper • 2603.07223 • Published Mar 7 • 13

GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 187

upvoted a collection 5 months ago

MMFineReason

Collection

High-quality STEM reasoning dataset for Multimodal LLM post-training. • 8 items • Updated May 7 • 24

upvoted 3 papers 5 months ago

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Paper • 2601.21821 • Published Jan 29 • 62

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Paper • 2601.17027 • Published Jan 17 • 42

ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

Paper • 2601.13606 • Published Jan 20 • 12

upvoted 3 collections 5 months ago

upvoted a paper 5 months ago

Closing the Data Loop: Using OpenDataArena to Engineer Superior Training Datasets

Paper • 2601.09733 • Published Dec 30, 2025 • 9

upvoted 2 papers 6 months ago

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Paper • 2512.17260 • Published Dec 19, 2025 • 53

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

Paper • 2512.14051 • Published Dec 16, 2025 • 47

upvoted a paper 7 months ago

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

Paper • 2512.01816 • Published Dec 1, 2025 • 94

upvoted 2 papers 9 months ago

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

Paper • 2510.04081 • Published Oct 5, 2025 • 23

ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning

Paper • 2509.21070 • Published Sep 25, 2025 • 9

Lijun Wu

AI & ML interests

Recent Activity

Organizations

apeters's activity