Taiwei Shi's picture

Taiwei Shi

MaksimSTW

·

https://taiweis.com

AI & ML interests

reinforcement learning, alignment, human-AI collaboration, and computational social science

Recent Activity

upvoted a paper 2 days ago

Video-Based Reward Modeling for Computer-Use Agents

authored a paper 11 days ago

DP-RFT: Learning to Generate Synthetic Text via Differentially Private Reinforcement Fine-Tuning

upvoted a paper 11 days ago

DP-RFT: Learning to Generate Synthetic Text via Differentially Private Reinforcement Fine-Tuning

View all activity

Organizations

commented a paper 10 months ago

The Hallucination Tax of Reinforcement Finetuning

Paper • 2505.13988 • Published May 20, 2025 • 8 •

New activity in lime-nlp/DeepScaleR_Difficulty 11 months ago

Request for sharing the rollout outputs for success rate calculation

#2 opened 11 months ago by

New activity in lime-nlp/GSM8K_Difficulty 11 months ago

Add task category and link to paper/Github repo

#1 opened 11 months ago by

New activity in lime-nlp/MATH_Difficulty 11 months ago

Add link to paper and Github repo

#2 opened 11 months ago by

New activity in lime-nlp/orz_math_difficulty 11 months ago

Add Github repo link and task category.

#1 opened 11 months ago by

New activity in lime-nlp/DeepScaleR_Difficulty 11 months ago

Add link to paper, task categories

#1 opened 11 months ago by

commented a paper 11 months ago

Efficient Reinforcement Finetuning via Adaptive Curriculum Learning

Paper • 2504.05520 • Published Apr 7, 2025 • 11 •

commented a paper 12 months ago

Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base

Paper • 2503.23361 • Published Mar 30, 2025 • 5 •

New activity in microsoft/WildFeedback 12 months ago

Update README.md

#1 opened 12 months ago by

New activity in lime-nlp/safer-instruct about 2 years ago

[bot] Conversion to Parquet

#1 opened about 2 years ago by

parquet-converter

Librarian Bot: Add language metadata for dataset

#2 opened about 2 years ago by

New activity in InstantX/InstantID about 2 years ago

Add more style templates / change SD model

#23 opened about 2 years ago by

New activity in deepset/deberta-v3-large-squad2 over 3 years ago

The threshold for no answer

#7 opened over 3 years ago by deleted