Pretraining command

#2
by stefan-it - opened

Hi @davda54 ,

many thanks for open sourcing the GPT-BERT architecture, which is super interesting and I would like to conduct some experiments with my ongoing GWLMs project.

Would it be possible that you share the training command for the train_*_gpu.py script for the xsmall model? That would be awesome! Do you think, the training can be done on a single GPU?

Many thanks in advance!

Language Technology Group (University of Oslo) org

Hi Stefan, thanks for you interest!

For context, you're talking about this GPT-BERT repository, right? These NorBERTs were trained with a slightly updated version of these scripts, we're currently writing a paper about GPT-BERTs and we will release the training code as part of that. You can get close by updating the hyperparameters according to the config, but maybe it'll be easier to discuss this via email: [email protected] :)

But to answer your question: yes, the smallest model can be trained on a single GPU without causing a headache.

Sign up or log in to comment