Smolvencoder

Enterprise

non-profit

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

andito authored a paper about 2 months ago

SmolVLM: Redefining small and efficient multimodal models

andito authored a paper 2 months ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

manu authored a paper 3 months ago

EuroBERT: Scaling Multilingual Encoders for European Languages

View all activity

SmolvencoderOrg's activity

andito

authored a paper about 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 184

andito

authored a paper 2 months ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 108

manu

authored 2 papers 3 months ago

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published Mar 7 • 79

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 34

andito

posted an update 3 months ago

Post

2777

Extremely bullish on @CohereForAI 's Aya Vision (8B & 32B) - new SOTA open-weight VLMs

- 8B wins up to 81% of the time in its class, better than Gemini Flash
- 32B beats Llama 3.2 90B!
- Covers 23 languages, excels in image captioning, VQA & more
- Integrated on transformers from Day 0!

Efficient multimodal models are here to stay!!🔥
Check out their blog! https://huggingface.co/blog/aya-vision

andito

authored a paper 4 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 230

andito

posted an update 4 months ago

Post

1649

𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗵𝗲 𝘄𝗼𝗿𝗹𝗱'𝘀 𝘀𝗺𝗮𝗹𝗹𝗲𝘀𝘁 𝘃𝗶𝘀𝗶𝗼𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹!

We’re thrilled to share 𝗦𝗺𝗼𝗹𝗩𝗟𝗠 (256M & 500M)—the smallest Visual Language Models ever built. Think: running on <1GB of GPU memory—you can fine-tune it on your laptop and run it on your toaster!

Why It’s Game-Changing:
- 𝗢𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗟𝗮𝗿𝗴𝗲𝗿 𝗠𝗼𝗱𝗲𝗹𝘀: Even the 256M model surpasses our SOTA 80B-parameter model from just 17 months ago. Over 300x reduction!
𝗠𝗶𝗴𝗵𝘁𝘆 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆: The 256M version delivers 80% of our 2.2B model’s performance, and the 500M version hits 90%
𝗟𝗶𝗴𝗵𝘁𝗻𝗶𝗻𝗴-𝗙𝗮𝘀𝘁 𝗦𝗲𝗮𝗿𝗰𝗵: SmolVLM integrates with ColiPali for state-of-the-art retrieval speeds—on par with models 10x bigger. That means cheaper, faster indexing and real-world impact.

What’s New Under the Hood:
- 𝗡𝗲𝘄 𝗩𝗶𝘀𝗶𝗼𝗻 𝗘𝗻𝗰𝗼𝗱𝗲𝗿: Smaller overall size (400M -> 93M), but with higher resolution.
- 𝗛𝗶𝗴𝗵𝗲𝗿 𝗣𝗶𝘅𝗲𝗹𝘀/𝗧𝗼𝗸𝗲𝗻: 4096 vs. 1820—more efficient image processing.
- 𝗦𝗺𝗮𝗿𝘁 𝗧𝗼𝗸𝗲𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Faster training and a performance boost.

Check our blog: https://huggingface.co/blog/smolervlm
The models: HuggingFaceTB/smolvlm-256m-and-500m-6791fafc5bb0ab8acc960fb0
The demo: HuggingFaceTB/SmolVLM-256M-Demo

1 reply

andito

posted an update 6 months ago

Post

1973

SmolVLM speeding locally on a laptop thanks to mlx-vlm and
@Gradio ! Try it with two lines:
pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix
python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit

Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !

andito

posted an update 6 months ago

Post

3398

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

manu

authored a paper 8 months ago

EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 26

andito

posted an update 8 months ago

Post

1096

Hugging face presents FineVideo 🎥! Unlocking the next generation of Video understanding 🚀

🤯3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
🔥
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.

Very psyched to fine-tune idefics on this dataset. ⚡️
Explore the videos: HuggingFaceFV/FineVideo-Explorer

andito

authored 2 papers 9 months ago

GACELA -- A generative adversarial context encoder for long audio inpainting

Paper • 2005.05032 • Published May 11, 2020

Adversarial Generation of Time-Frequency Features with application in audio synthesis

Paper • 1902.04072 • Published Feb 11, 2019

andito

posted an update 9 months ago

Post

1642

🚀 Introducing Hugging Face's Multilingual Speech-to-Speech! 🎤
💬Our modular, cross-platform pipeline to run GPT4o-like experiences on device can now seamlessly switch languages mid-conversation with an imperceptible 100ms delay.

🌟 Building on an amazing early reception with 2600 stars on GitHub 🌟
🚀 We are expanding the library to support multiple languages
🔥 Try it out with a flag: --language fr
🤯 Or don't set the flag and let the system detect the language

💡 What feature should we add next?

1 reply

andito

authored a paper 9 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 131

manu

authored a paper 11 months ago

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27, 2024 • 48

manu

authored 3 papers about 1 year ago

manu

posted an update over 1 year ago

Post

These past months, I've been busy baking a special sort of Croissant 🥐 with an awesome team !

🥐 CroissantLLM is a truly bilingual language model trained on 3 trillion tokens of French and English data. In its size category (<2B), it is the best model in French, but it also rivals the best monolingual English models !

💾 To train it, we collected, filtered and cleaned huge quantities of permissively licensed French data, across various domains (legal, administrative, cultural, scientific), and different text modalities (speech transcriptions, movie subtitles, encyclopedias, forums, webpages)...

⚖️ Assessing LLM performance is not easy, especially outside of English, and to this end we crafted a novel evaluation benchmark, FrenchBench, aiming to assess reasoning, factual knowledge, and linguistic capabilities of models in French !

🔎 The best current LLMs are hidden behind a shroud of mystery, trained with undisclosed training data mixes or strategies. We go the opposite way, releasing all of the project's artefacts (model checkpoints, data, training details, evaluation benchmarks...) We obtain 81 % of the Stanford FMTI transparency criterias, far ahead of even most open initiatives !

🧪Beyond a powerful industrial resource, our transparent initiative is a stepping stone for many scientific questions ! How does teaching a model two languages instead of one splits its monolingual ability ? Does training on so much French help the model integrate French-centric knowledge and cultural biases ? How does the model memorize the training data ?

Many more things to say, for those interested, I recommend checking out:

🗞️ The blogpost: https://huggingface.co/blog/manu/croissant-llm-blog
📖 The 45 page report with lots of gems: https://arxiv.org/abs/2402.00786
🤖 Models, Data, Demo:

croissantllm

3 replies

AI & ML interests

Recent Activity

Team members 3

SmolvencoderOrg's activity