Listener-Rewarded Thinking in VLMs for Image Preferences Paper • 2506.22832 • Published 6 days ago • 22
DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization Paper • 2505.20975 • Published May 27 • 36
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published 29 days ago • 125 • 20
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published 29 days ago • 125 • 20
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published 29 days ago • 125 • 20
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published 29 days ago • 125 • 20
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published 29 days ago • 125 • 20
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published 29 days ago • 125
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding Paper • 2502.03183 • Published Feb 5 • 3
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published Mar 24 • 119