4 3 14

Christopher PRO

chkla

https://linktr.ee/chkla

AI & ML interests

🚀 NLP and Computational Social Science

Recent Activity

updated a model about 1 month ago

chkla/Qwen2.5-1.5B-16bit-Instruct-Mentions

published a model about 1 month ago

chkla/Qwen2.5-1.5B-16bit-Instruct-Mentions

updated a model about 1 month ago

chkla/Qwen2.5-1.5B-16bit-Mentions

View all activity

Organizations

chkla's activity

updated a model about 1 month ago

chkla/Qwen2.5-1.5B-16bit-Instruct-Mentions

Text Generation • Updated Feb 26 • 4

published a model about 1 month ago

chkla/Qwen2.5-1.5B-16bit-Instruct-Mentions

Text Generation • Updated Feb 26 • 4

updated a model about 1 month ago

chkla/Qwen2.5-1.5B-16bit-Mentions

Text Generation • Updated Feb 26 • 11

published a model about 1 month ago

chkla/Qwen2.5-1.5B-16bit-Mentions

Text Generation • Updated Feb 26 • 11

published a dataset about 2 months ago

chkla/distset-test

Viewer • Updated Feb 16 • 195 • 191

updated a dataset about 2 months ago

chkla/distset-test

Viewer • Updated Feb 16 • 195 • 191

liked a Space 5 months ago

307

Aya Models

🌍

Interact with the Aya family of models.

liked 2 models 5 months ago

CohereForAI/aya-expanse-8b

Text Generation • Updated Mar 2 • 34.9k • 357

CohereForAI/aya-expanse-32b

Text Generation • Updated 14 days ago • 10.2k • 246

updated a Space 6 months ago

Narratives Annotation

✍

updated 2 datasets 7 months ago

chkla/quiz-politics-germany-education

Viewer • Updated Sep 1, 2024 • 981 • 55 • 1

chkla/polsci-exams-mcq

Viewer • Updated Aug 31, 2024 • 200 • 21

liked a Space 9 months ago

Mmlu Translation Progress

🔥

liked a model 9 months ago

partypress/partypress-monolingual-germany

Text Classification • Updated Nov 9, 2023 • 45 • 3

updated a Space 10 months ago

Parlbert Topic German Test

🏃

liked a dataset 10 months ago

allenai/WildChat

Viewer • Updated Oct 17, 2024 • 529k • 1.64k • 140

updated a model 12 months ago

chkla/parlbert-topic-german

Text Classification • Updated Apr 8, 2024 • 2.21k • 12

reacted to thomwolf's post with ❤️ about 1 year ago

Post

5286

A Little guide to building Large Language Models in 2024

This is a post-recording of a 75min lecture I gave two weeks ago on how to train a LLM from scratch in 2024. I tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports.

In the lecture, I introduce the students to all the important concepts/tools/techniques for training good performance LLM:
* finding, preparing and evaluating web scale data
* understanding model parallelism and efficient training
* fine-tuning/aligning models
* fast inference

There is of course many things and details missing and that I should have added to it, don't hesitate to tell me you're most frustrating omission and I'll add it in a future part. In particular I think I'll add more focus on how to filter topics well and extensively and maybe more practical anecdotes and details.

Now that I recorded it I've been thinking this could be part 1 of a two-parts series with a 2nd fully hands-on video on how to run all these steps with some libraries and recipes we've released recently at HF around LLM training (and could be easily adapted to your other framework anyway):
*datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrove
*nanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotron
*lighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval

Here is the link to watch the lecture on Youtube: https://www.youtube.com/watch?v=2-SPH9hIKT8
And here is the link to the Google slides: https://docs.google.com/presentation/d/1IkzESdOwdmwvPxIELYJi8--K3EZ98_cL6c5ZcLKSyVg/edit#slide=id.p

Enjoy and happy to hear feedback on it and what to add, correct, extend in a second part.

2 replies