zihan-aiml (zzhhhh)

zzhhhh

zihan-aiml

AI & ML interests

(M)LLMs, LLM post training, LLM alignment, LLM Agent, LLM engineering

Recent Activity

Organizations

None yet

zihan-aiml's activity

commented on Replicating DeepSeek R1 for Information Extraction 2 months ago

Great work! I have two questions regarding the reward design:

How do you balance the different reward components? I assume it's through trial and error, but I'm particularly interested in:
- The scale of each reward component
- How numerical adjustments impact the RL training process
- The relative weights between different rewards
Regarding the F1 score calculation: Is it computed based on the number of entries in your graph? I'm curious about the granularity of reward design, as different reward components seem to operate at different levels of detail:
- Format reward appears to be one-dimensional
- F1 reward seems to be a composite metric derived from multiple sub- level data points

This granularity difference in reward design could potentially affect the training dynamics. Would love to hear your thoughts on handling these different scales of feedback. Thanks :)

upvoted an article 2 months ago

Article

Replicating DeepSeek R1 for Information Extraction

•

Jan 31

• 42