Hynek Kydlicek

hynky

·

AI & ML interests

Data-processing

Recent Activity

new activity 6 days ago

macrodata/WGO-Bench:Embed synchronized perception states in benchmark rows

updated a dataset 19 days ago

macrodata/whats_going_on_runs

published a dataset 19 days ago

hynky/my_task_20260630_222041

View all activity

Organizations

upvoted a collection 7 months ago

📄 FinePDFs

82 items • Updated Jan 9 • 29

upvoted an article 8 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

lysandre, ArthurZ, cyrilvallez, reach-vb

•

Dec 1, 2025

• 312

upvoted 5 articles 9 months ago

Article

Parquet Content-Defined Chunking

kszucs

•

Jul 25, 2025

• 75

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

MiniMax-AI

•

Oct 30, 2025

• 81

Article

What makes good reasoning data

MiniMax-AI

•

Oct 30, 2025

• 45

Article

Aligning to What? Rethinking Agent Generalization in MiniMax M2

MiniMax-AI

•

Oct 30, 2025

• 43

Article

Supercharge your OCR Pipelines with Open Models

+5

merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq

•

Oct 21, 2025

• 316

upvoted an article 10 months ago

Article

Gaia2 and ARE: Empowering the community to study agents

+9

clefourrier, gregmialz, mlcu, mortimerp9, XciD, tfrere, evijit, RomainFroger, dheeraj7596, CarolinePascal, upiter

•

Sep 22, 2025

• 136

upvoted 2 articles about 1 year ago

Article

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

thomwolf, matthieu-lapeyre

•

Jul 9, 2025

• 804

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 782

upvoted a paper about 1 year ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 79

upvoted a collection over 1 year ago

Built with Distill blog ❤️

Collection of all interactive blogs built on top of Distill template. To create your own check: https://huggingface.co/spaces/lvwerra/distill-blog-tem • 6 items • Updated Mar 14, 2025 • 2

upvoted 3 articles over 1 year ago

Article

Open R1: Update #3

open-r1

•

Mar 11, 2025

• 298

Article

Fixing Open LLM Leaderboard with Math-Verify

+2

hynky, alozowski, SaylorTwift, clefourrier

•

Feb 14, 2025

• 31

Article

Open R1: Update #2

open-r1

•

Feb 10, 2025

• 219

upvoted a paper over 1 year ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 261

upvoted an article over 1 year ago

Article

FineWeb2-C: Help Build Better Language Models in Your Language

davanstrien

•

Dec 23, 2024

• 21

upvoted a collection over 1 year ago

🥂 FineWeb2

3 items • Updated Jun 27, 2025 • 24

upvoted a collection almost 2 years ago

IrokoBench

a human-translated benchmark dataset for 16 African languages covering three tasks: NLI, MMLU and MGSM • 6 items • Updated May 31, 2024 • 20

upvoted an article almost 2 years ago

Article

Scaling AI-based Data Processing with Hugging Face + Dask

+2

scj13, jrbourbeau, lhoestq, davanstrien

•

Oct 9, 2024

• 33