SONGJUN TU
SONGJUNTU
ยท
AI & ML interests
None yet
Recent Activity
authored
a paper
1 day ago
In-Dataset Trajectory Return Regularization for Offline Preference-based
Reinforcement Learning
authored
a paper
1 day ago
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical
Investigation
authored
a paper
1 day ago
SRFT: A Single-Stage Method with Supervised and Reinforcement
Fine-Tuning for Reasoning