Clue-instruct dataset and different models fine-tuned on it.
Andrea Zugarini
azugarini
AI & ML interests
Natural Language Processing, Language Models, Language Model Compression
Organizations
Collections
2
Collection of research on tokenizers' adaptation to specific domains and/or languages. Special focus on sequence compression directions
-
Fast Vocabulary Transfer for Language Model Compression
Paper • 2402.09977 • Published • 2 -
Multi-Word Tokenization for Sequence Compression
Paper • 2402.09949 • Published -
Zero-Shot Tokenizer Transfer
Paper • 2405.07883 • Published • 4 -
Language Model Tokenizers Introduce Unfairness Between Languages
Paper • 2305.15425 • Published • 1