hynky (Hynek Kydlicek)

liked a Space 6 months ago

2.97k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

liked 2 datasets 8 months ago

data-is-better-together/fineweb-c

Viewer • Updated about 1 month ago • 88.7k • 783 • 54

HuggingFaceFW/fineweb-2

Viewer • Updated Jun 27 • 5.02B • 342k • 607

liked a Space 8 months ago

96

Number Tokenization Blog

📈

Explore how tokenization affects arithmetic in LLMs

liked a dataset 8 months ago

CohereLabs/Global-MMLU

Viewer • Updated Apr 15 • 602k • 10.7k • 129

liked a dataset 9 months ago

ClusterlabAi/InstAr-500k

Viewer • Updated Jul 30, 2024 • 481k • 111 • 13

liked a Space 10 months ago

68

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

Evaluate multilingual models using FineTasks

liked a dataset 10 months ago

LLM360/TxT360

Updated May 26 • 18.4k • 237

liked 2 Spaces 10 months ago

19

Hub LFS Analysis

📈

An analysis of LFS files on the Hub.

116

TxT360: Trillion Extracted Text

📖

Create a large-scale deduplicated text dataset for LLM training

liked a dataset 11 months ago

Cleanlab/bad_data_gsm8k_svamp.csv

Viewer • Updated Apr 25, 2024 • 34 • 7 • 3

liked a Space 11 months ago

4

Datasets Metrics Explorer

📊

liked 6 datasets about 1 year ago

liked a Space about 1 year ago

122

Open-LLM performances are plateauing, let’s make the leaderboard steep again

🏔

Update leaderboard for fair model evaluation

liked a dataset about 1 year ago

m-a-p/Matrix

Viewer • Updated Feb 25 • 6.43B • 4.75k • 166

Hynek Kydlicek

AI & ML interests

Organizations

The Ultra-Scale Playbook

data-is-better-together/fineweb-c

HuggingFaceFW/fineweb-2

Number Tokenization Blog

CohereLabs/Global-MMLU

ClusterlabAi/InstAr-500k

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

LLM360/TxT360

Hub LFS Analysis

TxT360: Trillion Extracted Text

Cleanlab/bad_data_gsm8k_svamp.csv

Datasets Metrics Explorer

ThaiSyntheticQA/ThaiQA-v1

coastalcph/fairlex

meta-llama/Llama-3.1-405B-Instruct-evals

jon-tow/okapi_mmlu

pakphum/winograd_th

scb10x/thai_exam

Open-LLM performances are plateauing, let’s make the leaderboard steep again

m-a-p/Matrix

Hynek Kydlicek

AI & ML interests

Organizations

hynky's activity

The Ultra-Scale Playbook

Number Tokenization Blog

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

Hub LFS Analysis

TxT360: Trillion Extracted Text

Datasets Metrics Explorer

Open-LLM performances are plateauing, let’s make the leaderboard steep again