Add the "max_length" parameter to the Generation configuration.
- opened
The 12B model does not match the performance of the 1.2B model as the generation defaults to the max_length of "20". This results in shorter sequences than the model should be generating. For example on WMT14-DE-EN: the 12B model scores 15.52 and the 1.2B model scores 31.786 (SacreBLEU). The default max_length is properly set in the smaller models (see and the 12B models should match this. I am submitting similar PRs for the other 12B models.