What's the highest quality RoPE setting?
#1
by
imoc
- opened
Set rope_scaling to null and use seq len 40K?
BTW how many GPU hour used per finetune per model preview version?? 3~4 days a preview ver revision is so fast
Hi, the model has been trained with the rope_scaling enabled. Based on our evaluations, we recommend keep it on during inference.
Let's say 500~1500 steps have been done for each checkpoint.