TIGER-Lab/NLVR2
Viewer
•
Updated
•
6.97k
•
397
•
3
Great work! I have two questions regarding the reward design:
How do you balance the different reward components? I assume it's through trial and error, but I'm particularly interested in:
Regarding the F1 score calculation: Is it computed based on the number of entries in your graph? I'm curious about the granularity of reward design, as different reward components seem to operate at different levels of detail:
This granularity difference in reward design could potentially affect the training dynamics. Would love to hear your thoughts on handling these different scales of feedback. Thanks :)