Pierre-Carl Langlais

Pclanglais

AI & ML interests

Open data & open LLMs

Recent Activity

updated a model about 5 hours ago
PleIAs/pleias_wikidata
updated a dataset 1 day ago
Pclanglais/course-material
updated a dataset 2 days ago
PleIAs/post-ocr
View all activity

Organizations

AgentPublic's profile picture BigScience Data's profile picture Kheops SAS's profile picture Blog-explorers's profile picture OpenLLM France's profile picture ZeroGPU Explorers's profile picture INAGUA's profile picture PleIAs's profile picture :probabl.'s profile picture Social Post Explorers's profile picture LLM - Digital Humanities's profile picture

Pclanglais's activity

published an article 3 months ago
published an article 4 months ago
view article
Article

Releasing the largest multilingual open pretraining dataset

By Pclanglais and 2 others
100
published an article 7 months ago
view article
Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

29
published an article 8 months ago
view article
Article

Announcing Finance Commons and the Bad Data Toolbox: Pioneering Open Data and Advanced Document Processing

20
published an article 11 months ago
view article
Article

Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM

16
published an article 11 months ago
view article
Article

Releasing Youtube-Commons: a massive open corpus for conversational and multimodal data

22
published an article 12 months ago
view article
Article

Releasing Common Corpus: the largest public domain dataset for training LLMs

21