IPA CHILDES
Collection
The IPA-CHILDES dataset along with the models and tokenizers used for phoneme-based language modeling for the 31 languages in CHILDES.
•
5 items
•
Updated
This model repository contains all the runs for the size experiment in IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling. A GPT-2 model was trained on subsets of the EnglishNA portion of IPA-CHILDES. For each of six subset sizes, six models sizes were trained with three different dropout values, for a total of 108 models. See the paper details and results and here for training and analysis scripts. Note that the model training is spread over commits, so parsing of commits would be required to extract the individual best models for each run. If you need the raw results data, contact Zeb.