matlok
's Collections
Papers - Multilingual
updated
A Biomedical Entity Extraction Pipeline for Oncology Health Records in
Portuguese
Paper
•
2304.08999
•
Published
•
2
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages
Paper
•
2309.09400
•
Published
•
84
Robust Open-Vocabulary Translation from Visual Text Representations
Paper
•
2104.08211
•
Published
•
1
Poro 34B and the Blessing of Multilinguality
Paper
•
2404.01856
•
Published
•
13
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper
•
2404.04167
•
Published
•
12
One Wide Feedforward is All You Need
Paper
•
2309.01826
•
Published
•
31
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
•
2404.12387
•
Published
•
38
Paper
•
2407.10671
•
Published
•
160
Meltemi: The first open Large Language Model for Greek
Paper
•
2407.20743
•
Published
•
67
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing
Alignment in LLM-Ops Framework
Paper
•
2407.20729
•
Published
•
25
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings
Paper
•
2407.20581
•
Published
•
23
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper
•
2308.11466
•
Published
•
1
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper
•
2105.13626
•
Published
•
3
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
Representation
Paper
•
2103.06874
•
Published
•
1