Batch Size
#4
by
winglian
- opened
The model card lists: Batch Size (effective): 32 (8B), 128 (70B), 256(405B)
, but are 8B and 70B reversed?
the 70B config seems like it should be 16 @ https://github.com/allenai/open-instruct/blob/main/configs/train_configs/tulu3/tulu3_dpo_70b.yaml lists
per_device_train_batch_size: 1
gradient_accumulation_steps: 2 # designed for 8 GPUs, so batch size 128
whereas the 8B should be 128 @ https://github.com/allenai/open-instruct/blob/main/configs/train_configs/tulu3/tulu3_dpo_8b.yaml lists
per_device_train_batch_size: 1
gradient_accumulation_steps: 16 # designed for 8 GPUs, so batch size 128
Ah, good catch! Looking at the original runs for the 8B and 70B models, they should both be 128. The 70b yaml should say designed for 8 nodes (so effective bsz of 882 = 128)
hamishivi
changed discussion status to
closed