Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance Paper • 2505.16348 • Published 12 days ago • 46
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Paper • 2406.05761 • Published Jun 9, 2024 • 3
Evaluating Robustness of Reward Models for Mathematical Reasoning Paper • 2410.01729 • Published Oct 2, 2024
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics Paper • 2406.14703 • Published Jun 20, 2024 • 2
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Paper • 2505.15277 • Published 12 days ago • 98
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Paper • 2505.15277 • Published 12 days ago • 98
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Paper • 2505.15277 • Published 12 days ago • 98
WPRM/ours_8b_mtl_enhanced_annotated_walite_combined_checklist Viewer • Updated 19 days ago • 812 • 152
WPRM/ours_8b_mtl_enhanced_annotated_walite_combined_checklist Viewer • Updated 19 days ago • 812 • 152
WPRM/ours_3b_mtl_enhanced_annotated_walite_combined_checklist Viewer • Updated 19 days ago • 812 • 118
WPRM/ours_3b_mtl_enhanced_annotated_walite_combined_checklist Viewer • Updated 19 days ago • 812 • 118
WPRM/WebShepherd_train_multimodal_final_0513_checklist_only Viewer • Updated 20 days ago • 3.63k • 45