Batch Size

#4
by winglian - opened

The model card lists: Batch Size (effective): 32 (8B), 128 (70B), 256(405B), but are 8B and 70B reversed?

the 70B config seems like it should be 16 @ https://github.com/allenai/open-instruct/blob/main/configs/train_configs/tulu3/tulu3_dpo_70b.yaml lists

per_device_train_batch_size: 1
gradient_accumulation_steps: 2 # designed for 8 GPUs, so batch size 128

whereas the 8B should be 128 @ https://github.com/allenai/open-instruct/blob/main/configs/train_configs/tulu3/tulu3_dpo_8b.yaml lists

per_device_train_batch_size: 1
gradient_accumulation_steps: 16 # designed for 8 GPUs, so batch size 128

Ah, good catch! Looking at the original runs for the 8B and 70B models, they should both be 128. The 70b yaml should say designed for 8 nodes (so effective bsz of 882 = 128)

hamishivi changed discussion status to closed

Sign up or log in to comment