Shihan Dou
Ablustrund
AI & ML interests
Natural Language Processing, Large Language Models
Recent Activity
upvoted
a
paper
about 2 months ago
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
authored
a paper
about 2 months ago
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning
from Human Feedback
authored
a paper
about 2 months ago
Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning