Say hello to hf: a faster, friendlier Hugging Face CLI ✨
We are glad to announce a long-awaited quality-of-life improvement: the Hugging Face CLI has been officially renamed from huggingface-cli to hf!
So... why this change?
Typing huggingface-cli constantly gets old fast. More importantly, the CLI’s command structure became messy as new features were added over time (upload, download, cache management, repo management, etc.). Renaming the CLI is a chance to reorganize commands into a clearer, more consistent format.
We decided not to reinvent the wheel and instead follow a well-known CLI pattern: hf <resource> <action>. Isn't hf auth login easier to type and remember?
You can now find it in the Hugging Face Collection in Azure ML or Azure AI Foundry, along with 10k other Hugging Face models 🤗🤗 Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
ZML just released a technical preview of their new Inference Engine: LLMD.
- Just 2.4GB container, which means fast startup times and efficient autoscaling - Cross-Platform GPU Support: works on both NVIDIA and AMD GPUs. - written in Zig
I just tried it out and deployed it on Hugging Face Inference Endpoints and wrote a quick guide 👇 You can try it in like 5 minutes!
We just released native support for @SGLang and @vllm-project in Inference Endpoints 🔥
Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.
And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users 🙌
🎉 New in Azure Model Catalog: NVIDIA Parakeet TDT 0.6B V2
We're excited to welcome Parakeet TDT 0.6B V2—a state-of-the-art English speech-to-text model—to the Azure Foundry Model Catalog.
What is it?
A powerful ASR model built on the FastConformer-TDT architecture, offering: 🕒 Word-level timestamps ✍️ Automatic punctuation & capitalization 🔊 Strong performance across noisy and real-world audio
It runs with NeMo, NVIDIA’s optimized inference engine.
Want to give it a try? 🎧 You can test it with your own audio (up to 3 hours) on Hugging Face Spaces before deploying.If it fits your need, deploy easily from the Hugging Face Hub or Azure ML Studio with secure, scalable infrastructure!
📘 Learn more by following this guide written by @alvarobartt
In case you missed it, Hugging Face expanded its collaboration with Azure a few weeks ago with a curated catalog of 10,000 models, accessible from Azure AI Foundry and Azure ML!
@alvarobartt cooked during these last days to prepare the one and only documentation you need, if you wanted to deploy Hugging Face models on Azure. It comes with an FAQ, great guides and examples on how to deploy VLMs, LLMs, smolagents and more to come very soon.
We need your feedback: come help us and let us know what else you want to see, which model we should add to the collection, which model task we should prioritize adding, what else we should build a tutorial for. You’re just an issue away on our GitHub repo!
AMD summer hackathons are here! A chance to get hands-on with MI300X GPUs and accelerate models. 🇫🇷 Paris - Station F - July 5-6 🇮🇳 Mumbai - July 12-13 🇮🇳 Bengaluru - July 19-20
Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm
Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.
Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.
Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
Build your first chatbot with a Hugging Face Spaces frontend and Gaudi-powered backend with @bconsolvo ! He will teach you how to build an LLM-powered chatbot using Streamlit and Hugging Face Spaces—integrating a model endpoint hosted on an Intel® Gaudi® accelerator.
New policy blogpost! The EU is speaking a lot about sovereignty. A cornerstone of digital sovereignty is and has to be open source. As AI becomes more central to everything from public services to national security, the ability to govern, adapt, and understand these systems is no longer optional. Sovereign control over data, infrastructure, technology, and regulation is vital, and open source AI provides the foundation. In my latest blog post, I explore how open source: ✅ Enables democratic oversight ✅ Reduces dependency on foreign platforms ✅ Supports regional innovation and infrastructure ✅ Advances regulatory and technological sovereignty 🛠 From small transparent models like OLMo2 to tools like Hugging Face Transformers or Sarvam-M for Indian languages, open source efforts are already powering sovereign AI ecosystems worldwide. 🔎 Read more about how open source AI is reshaping autonomy, innovation, and trust in the digital age: 👉 https://huggingface.co/blog/frimelle/sovereignty-and-open-source with @yjernite
Wrapping up a week of shipping and announcements with Dell Enterprise Hub now featuring AI Applications, on-device models for AI PCs, a new CLI and Python SDK... all you need for building AI on premises!
hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! 💥
as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!
in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.