Alvaro Bartolome's picture

Alvaro Bartolome

alvarobartt

AI & ML interests

machine learning @huggingface

Recent Activity

Articles

Organizations

Hugging Face's profile picture Spaces-explorers's profile picture Hackathon Somos NLP 2023: Los LLMs hablan Español's profile picture SomosNLP's profile picture Hugging Test Lab's profile picture Open-Source AI Meetup's profile picture Hugging Face H4's profile picture Argilla's profile picture Blog-explorers's profile picture ZeroGPU Explorers's profile picture gg-hf's profile picture MLX Community's profile picture Argilla Explorers's profile picture distilabel-internal-testing's profile picture Data Is Better Together's profile picture ORPO Explorers's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture LLHF's profile picture SLLHF's profile picture Hugging Quants's profile picture blhf's profile picture Argilla Warehouse's profile picture nltpt's profile picture IOPO Experiments's profile picture Google Cloud 🤝🏻 Hugging Face's profile picture Huggingface HUGS's profile picture Data Is Better Together Contributor's profile picture Open R1's profile picture

alvarobartt's activity

reacted to merve's post with 👀🤗 4 days ago
view post
Post
4157
Oof, what a week! 🥵 So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
·
replied to merve's post 4 days ago
view reply

Loving these, please never stop, these are so useful 🐐

reacted to merve's post with 🔥 4 days ago
view post
Post
4157
Oof, what a week! 🥵 So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
·
replied to burtenshaw's post 12 days ago
reacted to mlabonne's post with 🤗🔥 12 days ago
view post
Post
3394
🆕 LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course
reacted to AdinaY's post with 🚀🔥 2 months ago
view post
Post
1626
🌊 The wave of reasoning models from the Chinese community has arrived!

🚀 Marco-o1 by AIDC, Alibaba
👉 AIDC-AI/Marco-o1

✨ QwQ by Qwen, Alibaba
👉 Qwen/qwq-674762b79b75eac01735070a

🌟 Skywork-o1 by Kunlun Tech
👉 Skywork/skywork-o1-open-67453df58e12f6c3934738d0

🔥 Xkev/Llama-3.2V-11B-cot by PKU Yuan group
👉 Xkev/Llama-3.2V-11B-cot

💡 DeepSeek-R1-Lite-Preview by DeepSeek AI
👉 https://chat.deepseek.com/

🔍 InternThinker Preview by Shanghai AI Lab
👉 https://sso.openxlab.org.cn/login?redirect=https://internlm-chat.intern-ai.org.cn/&clientId=ebmrvod6yo0nlzaek1yp

📘 k0-math by Moonshot AI
🚀 https://kimi.moonshot.cn/ ( coming soon! )

Who's next? 👀
zh-ai-community/reasoning-models-67409fb3aa1ed78f10087cd7
reacted to andito's post with 🔥👀 2 months ago
view post
Post
1916
SmolVLM speeding locally on a laptop thanks to mlx-vlm and
@Gradio ! Try it with two lines:
pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix
python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit

Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !
reacted to danielhanchen's post with 🔥 2 months ago
reacted to clem's post with 🔥 3 months ago
view post
Post
4454
This is no Woodstock AI but will be fun nonetheless haha. I’ll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.

1,000 spots available first-come first serve with some surprises during the stream!

You can register and add to your calendar here: https://streamyard.com/watch/JS2jHsUP3NDM
·
posted an update 5 months ago
view post
Post
2941
🤗 Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai
reacted to merve's post with 🔥 6 months ago
view post
Post
4981
Idefics3-Llama is out! 💥💥
Model: HuggingFaceM4/Idefics3-8B-Llama3
Demo: HuggingFaceM4/idefics3

It's a multimodal model based on Llama 3.1 that accepts an arbitrary number of interleaved images with text with a huge context window (10k tokens!) ✨

Supported by Hugging Face transformers 🤗
  • 2 replies
·
reacted to dvilasuero's post with 🤗🚀🔥 8 months ago
view post
Post
8127
Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!
·
reacted to tomaarsen's post with 🔥❤️ 8 months ago
view post
Post
2016
‼️Sentence Transformers v3.0 is out! You can now train and finetune embedding models with multi-GPU training, bf16 support, loss logging, callbacks & much more. I also release 50+ datasets to train on.

1️⃣ Training Refactor
Embedding models can now be trained using an extensive trainer with a lot of powerful features:
- MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP))
- bf16 training support; loss logging
- Evaluation datasets + evaluation loss
- Improved callback support + an excellent Weights & Biases integration
- Gradient checkpointing, gradient accumulation
- Improved model card generation
- Resuming from a training checkpoint without performance loss
- Hyperparameter Optimization
and much more!
Read my detailed blogpost to learn about the components that make up this new training approach: https://huggingface.co/blog/train-sentence-transformers

2️⃣ Similarity Score
Not sure how to compare embeddings? Don't worry, you can now use model.similarity(embeddings1, embeddings2) and you'll get your similarity scores immediately. Model authors can specify their desired similarity score, so you don't have to worry about it anymore!

3️⃣ Additional Kwargs
Sentence Transformers relies on various Transformers instances (AutoModel, AutoTokenizer, AutoConfig), but it was hard to provide valuable keyword arguments to these (like 'torch_dtype=torch.bfloat16' to load a model a lower precision for 2x inference speedup). This is now easy!

4️⃣ Hyperparameter Optimization
Sentence Transformers now ships with HPO, allowing you to effectively choose your hyperparameters for your data and task.

5️⃣ Dataset Release
To help you out with finetuning models, I've released 50+ ready-to-go datasets that can be used with training or finetuning embedding models: sentence-transformers/embedding-model-datasets-6644d7a3673a511914aa7552

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.0.0