mlo-data-cleaning

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

hynky authored a paper 22 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

guipenedo authored a paper 22 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

lvwerra authored a paper 23 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

View all activity

ZR0zNqSGMI's activity

hynky

authored a paper 22 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 24 days ago • 196

guipenedo

authored a paper 22 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 24 days ago • 196

lvwerra

authored a paper 23 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 24 days ago • 196

hynky

authored a paper about 1 month ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 55

lvwerra

authored a paper about 1 month ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 55

guipenedo

authored a paper about 1 month ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 55

vsabolcec

updated a Space 4 months ago

Annotation

🏢

Manage text and view statistics with an intuitive interface

lvwerra

authored a paper 4 months ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 24

lvwerra

authored a paper 8 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 93

hynky

authored a paper 8 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 93

guipenedo

authored a paper 8 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 93

lvwerra

authored a paper 9 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28, 2024 • 12

lvwerra

authored a paper 12 months ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 138

hynky

authored a paper about 1 year ago

A Dataset and Strong Baselines for Classification of Czech News Texts

Paper • 2307.10666 • Published Jul 20, 2023

guipenedo

authored a paper over 1 year ago

The Falcon Series of Open Language Models

Paper • 2311.16867 • Published Nov 28, 2023 • 13

mjaggi

authored 5 papers over 1 year ago

AI & ML interests

Recent Activity

Team members 8

ZR0zNqSGMI's activity

Annotation