52 273 269

Yassine Ennaour

Lyte

AI & ML interests

None yet

Recent Activity

reacted to fdaudens's post with ❤️ about 2 hours ago

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after: - Original release: 8 models, 540K downloads. Just the beginning... - The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals. The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient. When you empower builders, innovation explodes. For everyone. 🚀 The most popular community model? @bartowski's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.

liked a model about 8 hours ago

Qwen/Qwen2.5-VL-3B-Instruct

new activity 1 day ago

Lyte/fineweb-edu-2016-26-750k:Librarian Bot: Add language metadata for dataset

View all activity

Organizations

Lyte's activity

upvoted a collection 1 day ago

Qwen2.5-VL

Collection

Vision-language model series based on Qwen2.5 • 3 items • Updated about 23 hours ago • 199

upvoted a paper 5 days ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 5 days ago • 226

upvoted an article 16 days ago

Article

TerjamaBench: A Cultural Benchmark for English-Darija Machine Translation

•

18 days ago

• 25

upvoted a paper 19 days ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 20 days ago • 249

upvoted a paper 20 days ago

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published 20 days ago • 48

upvoted 2 papers about 1 month ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 343

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 89

upvoted 2 papers about 2 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 93

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Paper • 2411.19943 • Published Nov 29, 2024 • 57

upvoted 2 papers 2 months ago

Top-nσ: Not All Logits Are You Need

Paper • 2411.07641 • Published Nov 12, 2024 • 20

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 113

upvoted 3 papers 3 months ago

upvoted 2 collections 3 months ago

Llama 3.2

Collection

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 564

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 479

upvoted 4 papers 4 months ago

Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11, 2024 • 85

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9, 2024 • 43

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 169

OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction

Paper • 2410.04932 • Published Oct 7, 2024 • 9