7 32 81

neuralink

AI & ML interests

nanotron @ hf

Recent Activity

upvoted an article 11 days ago

The Transformers Library: standardizing model definitions

liked a model 27 days ago

Qwen/Qwen3-235B-A22B

upvoted an article about 2 months ago

You could have designed state of the art positional encoding

View all activity

Organizations

neuralink's activity

upvoted an article 11 days ago

Article

The Transformers Library: standardizing model definitions

and 3 others •

13 days ago

• 102

liked a model 27 days ago

Qwen/Qwen3-235B-A22B

Text Generation • Updated 6 days ago • 204k • • 902

upvoted 2 articles about 2 months ago

Article

You could have designed state of the art positional encoding

•

Nov 25, 2024

• 276

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

and 6 others •

Apr 5

• 144

liked a dataset about 2 months ago

nanotron/ultrascale-playbook-data

Updated Mar 12 • 185 • 5

upvoted a paper about 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 185

upvoted an article 3 months ago

Article

Open R1: Update #3

and 9 others •

Mar 11

• 291

liked a Space 3 months ago

Predict Memory

🧮

Calculate memory usage from model configurations

upvoted an article 3 months ago

Article

LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!

and 1 other •

Mar 7

• 59

New activity in nanotron/ultrascale-playbook 3 months ago

Make hash section working

#89 opened 3 months ago by

mishig

upvoted an article 3 months ago

Article

Open-source DeepResearch – Freeing our search agents

and 4 others •

Feb 4

• 1.25k

liked a Space 3 months ago

655

Open Deep-Research

🏆

OpenAI's Deep Research, but open

New activity in nanotron/ultrascale-playbook 3 months ago

More ressources

#73 opened 3 months ago by

eliebak

liked a Space 3 months ago

2.62k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

New activity in nanotron/ultrascale-playbook 3 months ago

xrsrke/link_nanotron_fp8_appexdix

#21 opened 3 months ago by

neuralink

xrsrke/fix_width_height_for_fp8_graph

#46 opened 3 months ago by

neuralink

updated a Space 3 months ago

2.62k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

upvoted 2 articles 4 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

and 2 others •

Jan 28

• 861

Article

Open-R1: Update #1

and 7 others •

Feb 2

• 305

upvoted a paper 4 months ago

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

Paper • 2409.15241 • Published Sep 23, 2024 • 1