Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
3
5
7
Catherine Arnett
catherinearnett
Follow
danmana's profile picture
genesith's profile picture
shtefcs's profile picture
41 followers
·
22 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
authored
a paper
21 days ago
BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization
authored
a paper
21 days ago
Evaluating Morphological Alignment of Tokenizers in 70 Languages
liked
a dataset
22 days ago
classla/ParlaSpeech-PL
View all activity
Organizations
catherinearnett
's models
18
Sort: Recently updated
catherinearnett/B-GPT_pl_en_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
97
catherinearnett/B-GPT_en_pl_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
92
catherinearnett/B-GPT_pl_en_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
166
catherinearnett/B-GPT_en_pl_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
292
catherinearnett/B-GPT_el_en_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
187
catherinearnett/B-GPT_en_el_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
237
catherinearnett/B-GPT_el_en_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
191
catherinearnett/B-GPT_en_el_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
299
catherinearnett/B-GPT_es_en_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
197
catherinearnett/B-GPT_en_es_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
192
catherinearnett/B-GPT_es_en_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
186
catherinearnett/B-GPT_en_es_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
203
catherinearnett/B-GPT_nl_en_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
216
catherinearnett/B-GPT_en_nl_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
243
catherinearnett/B-GPT_nl_en_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
424
catherinearnett/B-GPT_en_nl_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
2.49k
catherinearnett/pythia-1b-bigram_masked
Updated
May 1
catherinearnett/pythia-160m-bigram_masked
Updated
May 1