🎉 New in Azure Model Catalog: NVIDIA Parakeet TDT 0.6B V2
We're excited to welcome Parakeet TDT 0.6B V2—a state-of-the-art English speech-to-text model—to the Azure Foundry Model Catalog.
What is it?
A powerful ASR model built on the FastConformer-TDT architecture, offering: 🕒 Word-level timestamps ✍️ Automatic punctuation & capitalization 🔊 Strong performance across noisy and real-world audio
It runs with NeMo, NVIDIA’s optimized inference engine.
Want to give it a try? 🎧 You can test it with your own audio (up to 3 hours) on Hugging Face Spaces before deploying.If it fits your need, deploy easily from the Hugging Face Hub or Azure ML Studio with secure, scalable infrastructure!
📘 Learn more by following this guide written by @alvarobartt
In case you missed it, Hugging Face expanded its collaboration with Azure a few weeks ago with a curated catalog of 10,000 models, accessible from Azure AI Foundry and Azure ML!
@alvarobartt cooked during these last days to prepare the one and only documentation you need, if you wanted to deploy Hugging Face models on Azure. It comes with an FAQ, great guides and examples on how to deploy VLMs, LLMs, smolagents and more to come very soon.
We need your feedback: come help us and let us know what else you want to see, which model we should add to the collection, which model task we should prioritize adding, what else we should build a tutorial for. You’re just an issue away on our GitHub repo!
Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.
Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.
Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
Build your first chatbot with a Hugging Face Spaces frontend and Gaudi-powered backend with @bconsolvo ! He will teach you how to build an LLM-powered chatbot using Streamlit and Hugging Face Spaces—integrating a model endpoint hosted on an Intel® Gaudi® accelerator.
If you are using AWS, give a read. It is a running document to showcase how to deploy and fine-tune DeepSeek R1 models with Hugging Face on AWS.
We're working hard to enable all the scenarios, whether you want to deploy to Inference Endpoints, Sagemaker or EC2; with GPUs or with Trainium & Inferentia.
We have full support for the distilled models, DeepSeek-R1 support is coming soon!! I'll keep you posted.
It’s 2nd of December , here’s your Cyber Monday present 🎁 !
We’re cutting our price down on Hugging Face Inference Endpoints and Spaces!
Our folks at Google Cloud are treating us with a 40% price cut on GCP Nvidia A100 GPUs for the next 3️⃣ months. We have other reductions on all instances ranging from 20 to 50%.
if you use Google Kubernetes Engine to host you ML workloads, I think this series of videos is a great way to kickstart your journey of deploying LLMs, in less than 10 minutes! Thank you @wietse-venema-demo !
I'd like to share here a bit more about our Deep Learning Containers (DLCs) we built with Google Cloud, to transform the way you build AI with open models on this platform!
With pre-configured, optimized environments for PyTorch Training (GPU) and Inference (CPU/GPU), Text Generation Inference (GPU), and Text Embeddings Inference (CPU/GPU), the Hugging Face DLCs offer:
⚡ Optimized performance on Google Cloud's infrastructure, with TGI, TEI, and PyTorch acceleration. 🛠️ Hassle-free environment setup, no more dependency issues. 🔄 Seamless updates to the latest stable versions. 💼 Streamlined workflow, reducing dev and maintenance overheads. 🔒 Robust security features of Google Cloud. ☁️ Fine-tuned for optimal performance, integrated with GKE and Vertex AI. 📦 Community examples for easy experimentation and implementation. 🔜 TPU support for PyTorch Training/Inference and Text Generation Inference is coming soon!