Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
Paper
β’
2505.22232
β’
Published
β’
18
Open Source Language Models for Europe
head
?" or "if multiple licenses were found, do they contradict each other?", which makes further filtering a breeze.