Elie Bakouch's picture

Elie Bakouch PRO

eliebak

·

AI & ML interests

Training LLM's @ 🤗

Recent Activity

liked a model about 12 hours ago

openbmb/MiniCPM4.1-8B

new activity about 15 hours ago

Kwai-Klear/Klear-46B-A2.5B-Base:tech report link broken

liked a model 1 day ago

Kwai-Klear/Klear-46B-A2.5B-Instruct

View all activity

Organizations

New activity in Kwai-Klear/Klear-46B-A2.5B-Base about 15 hours ago

tech report link broken

#1 opened 1 day ago by

commented a paper 4 days ago

Fantastic Pretraining Optimizers and Where to Find Them

Paper • 2509.02046 • Published 5 days ago • 10 •

New activity in xai-org/grok-2 14 days ago

add rope_type:yarn

#7 opened 14 days ago by

New activity in huggingface/InferenceSupport 17 days ago

deepseek-ai/DeepSeek-V3.1

#4282 opened 17 days ago by

ByteDance-Seed/Seed-OSS-36B-Instruct

#4275 opened 17 days ago by

commented a paper 19 days ago

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

Paper • 2508.10975 • Published 23 days ago • 57 •

commented a paper 23 days ago

$μ$-Parametrization for Mixture of Experts

Paper • 2508.09752 • Published 24 days ago • 9 •

New activity in HuggingFaceTB/SmolLM3-3B about 1 month ago

SmolLM3 RL results

#33 opened about 1 month ago by

commented 2 papers about 1 month ago

Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

Paper • 2507.19427 • Published Jul 25 • 18 •

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 294 •

New activity in HuggingFaceTB/SmolLM3-3B-Base about 1 month ago

Release Intermediate Checkpoints?

#2 opened about 2 months ago by

xuanxiang-chatting

New activity in HuggingFaceTB/SmolLM3-3B about 1 month ago

Multi-head latent attention (MLA) instead of Grouped query attention (GQA)

#18 opened about 2 months ago by

Add This Model To the French Understanding Leaderboard By the French Government

#28 opened about 2 months ago by

New activity in HuggingFaceTB/SmolLM3-3B about 2 months ago

test1

#21 opened about 2 months ago by

Where is the SmolLM3-1B and 2B and 0.6B?

#19 opened about 2 months ago by

chat template: tool call fix?

#16 opened about 2 months ago by

Possible mistake in the chat template?

#14 opened about 2 months ago by

Evaluation metrics

#20 opened about 2 months ago by

BounharAbdelaziz

New activity in HuggingFaceTB/SmolLM3-3B 2 months ago

Language list

#5 opened 2 months ago by

Update README.md

#4 opened 2 months ago by