AI & ML interests

Accelerating DL

Recent Activity

badaouiย  updated a Space about 2 hours ago
optimum/neuron-export
badaouiย  updated a Space about 2 months ago
optimum/neuron-export
View all activity

pagezyhfย 
posted an update 5 days ago
view post
Post
3076
We've improved the Deploy button on Hugging Face model pages for Microsoft Azure

1/ no more long waits before seeing model support status

2/ ready-to-use CLI and Python snippets

3/ redirection to Azure AI Foundry rather than Azure ML

โœ‹ if you see any bugs or have feedback, open an issue on our repo:
https://github.com/huggingface/Microsoft-Azure
badaouiย 
posted an update 13 days ago
view post
Post
3082
Is there a "one-size-fits-all" recipe for quantizing Large Language Models? ๐Ÿค”

As part of my ongoing work in mixed-precision quantization, I've been exploring this question by measuring layer-by-layer sensitivity. The goal is to see if we can find universal rules for which layers can be quantized aggressively without impacting performance.The results are fascinating and reveal two key insights:

1๏ธโƒฃ Sensitivity profiles are like architectural "fingerprints." Models from the same family share strikingly similar sensitivity patterns. As you can see in the charts below for the Gemma and SmolLM families, the ranking and relative sensitivity of the layers remain remarkably consistent. This suggests that the underlying architecture is a primary driver of a model's quantization behavior.

2๏ธโƒฃ A "universal" mixed-precision quantization strategy is challenging. While models within a family are similar, these "fingerprints" change dramatically when comparing different architectures like LLaMA, Qwen, and StableLM. This highlights the difficulty in creating a generalized mixed-precision configuration that works optimally across all model families.

However, there is one near-universal truth we uncovered: the mlp.down_proj layer consistently emerges as one of the most sensitive components across all models studied.
This finding strongly resonates with the work in "The Super Weight in Large Language Models" (by Mengxia Yu et al.). The paper identifies that functionally critical parameters, or "super weights," are concentrated in these down_proj layers. Our empirical results provide clear validation for this theory, showing these layers are highly intolerant to precision loss.

In short, while every architecture has a unique sensitivity profile, a fingerprint shaped not only by its core design but also by its specific training dataset and optimization approach, some components remain universally critical!
What are your thoughts?
ยท
pagezyhfย 
posted an update 20 days ago
view post
Post
2158
Deploy GPT OSS models with Hugging Face on Azure AI!

Weโ€™re thrilled to enable OpenAI GPT OSS models on Azure AI Model Catalog for Azure users to try the model securely the day of its release.

In our official launch blogpost, thereโ€™s a section on how to deploy the model to your Azure AI Hub. Get started today!

https://huggingface.co/blog/welcome-openai-gpt-oss#azure
pagezyhfย 
posted an update 20 days ago
view post
Post
252
We now have the newest Open AI models available on the Dell Enterprise Hub!

We built the Dell Enterprise Hub to provide access to the latest and greatest model from the Hugging Face community to our on-prem customers. Weโ€™re happy to give secure access to this amazing contribution from Open AI on the day of its launch!

https://dell.huggingface.co/
IlyasMoutawwakilย 
posted an update 26 days ago
view post
Post
3327
๐Ÿš€ Optimum: The Last v1 Release ๐Ÿš€
Optimum v1.27 marks the final major release in the v1 series. As we close this chapter, we're laying the groundwork for a more modular and community-driven future:
- Optimum v2: A lightweight core package for porting Transformers, Diffusers, or Sentence-Transformers to specialized AI hardware/software/accelerators..
- Optimumโ€‘ONNX: A dedicated package where the ONNX/ONNX Runtime ecosystem lives and evolves, faster-moving and decoupled from the Optimum core.

๐ŸŽฏ Why this matters:
- A clearer governance path for ONNX, fostering stronger community collaboration and improved developer experience..
- Enable innovation at a faster pace in a more modular, open-source environment.

๐Ÿ’ก What this means:
- More transparency, broader participation, and faster development driven by the community and key actors in the ONNX ecosystem (PyTorch, Microsoft, Joshua Lochner ๐Ÿ‘€, ...)
- A cleaner, more maintainable core Optimum, focused on extending HF libraries to special AI hardware/software/accelerators tooling and used by our partners (Intel Corporation, Amazon Web Services (AWS), AMD, NVIDIA, FuriosaAI, ...)

๐Ÿ› ๏ธ Major updates I worked on in this release:
โœ… Added support for Transformers v4.53 and SmolLM3 in ONNX/ONNXRuntime.
โœ… Solved batched inference/generation for all supported decoder model architectures (LLMs).

โœจ Big shoutout to @echarlaix for leading the refactoring work that cleanly separated ONNX exporter logic and enabled the creation of Optimumโ€‘ONNX.

๐Ÿ“ Release Notes: https://lnkd.in/gXtE_qji
๐Ÿ“ฆ Optimum : https://lnkd.in/ecAezNT6
๐ŸŽ Optimum-ONNX: https://lnkd.in/gzjyAjSi
#Optimum #ONNX #OpenSource #HuggingFace #Transformers #Diffusers
pagezyhfย 
posted an update about 1 month ago
view post
Post
325
๐ŸŸช Qwen/Qwen3โ€‘235Bโ€‘A22Bโ€‘Instructโ€‘2507โ€‘FP8 is now available in Microsoft Azure for oneโ€‘click deployment! ๐Ÿš€

Check out their blogpost: https://qwenlm.github.io/blog/qwen3/

You can now find it in the Hugging Face Collection in Azure ML or Azure AI Foundry, along with 10k other Hugging Face models ๐Ÿค—๐Ÿค—
Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

Bear with us for the nonโ€‘quantized version.
pagezyhfย 
posted an update about 1 month ago
pagezyhfย 
posted an update about 1 month ago
view post
Post
202
๐ŸŽ‰ New in Azure Model Catalog: NVIDIA Parakeet TDT 0.6B V2

We're excited to welcome Parakeet TDT 0.6B V2โ€”a state-of-the-art English speech-to-text modelโ€”to the Azure Foundry Model Catalog.

What is it?

A powerful ASR model built on the FastConformer-TDT architecture, offering:
๐Ÿ•’ Word-level timestamps
โœ๏ธ Automatic punctuation & capitalization
๐Ÿ”Š Strong performance across noisy and real-world audio

It runs with NeMo, NVIDIAโ€™s optimized inference engine.

Want to give it a try? ๐ŸŽง You can test it with your own audio (up to 3 hours) on Hugging Face Spaces before deploying.If it fits your need, deploy easily from the Hugging Face Hub or Azure ML Studio with secure, scalable infrastructure!

๐Ÿ“˜ Learn more by following this guide written by @alvarobartt

https://huggingface.co/docs/microsoft-azure/azure-ai/examples/deploy-nvidia-parakeet-asr
pagezyhfย 
posted an update about 2 months ago
view post
Post
1261
If you want to dive into how the HF team worked with @seungrokj at @AMD
to optimize kernels on MI300, you should give a read to our latest blog!

Such a great educational material for anyone curious about the world of optimizing low level ML.

https://huggingface.co/blog/mi300kernels
pagezyhfย 
posted an update about 2 months ago
view post
Post
1630
In case you missed it, Hugging Face expanded its collaboration with Azure a few weeks ago with a curated catalog of 10,000 models, accessible from Azure AI Foundry and Azure ML!

@alvarobartt cooked during these last days to prepare the one and only documentation you need, if you wanted to deploy Hugging Face models on Azure. It comes with an FAQ, great guides and examples on how to deploy VLMs, LLMs, smolagents and more to come very soon.

We need your feedback: come help us and let us know what else you want to see, which model we should add to the collection, which model task we should prioritize adding, what else we should build a tutorial for. Youโ€™re just an issue away on our GitHub repo!

https://huggingface.co/docs/microsoft-azure/index
jeffboudierย 
posted an update 2 months ago
view post
Post
512
AMD summer hackathons are here!
A chance to get hands-on with MI300X GPUs and accelerate models.
๐Ÿ‡ซ๐Ÿ‡ท Paris - Station F - July 5-6
๐Ÿ‡ฎ๐Ÿ‡ณ Mumbai - July 12-13
๐Ÿ‡ฎ๐Ÿ‡ณ Bengaluru - July 19-20

Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm

Register to Paris event: https://lu.ma/fmvdjmur?tk=KeAbiP
All dates: https://lu.ma/calendar/cal-3sxhD5FdxWsMDIz
pagezyhfย 
posted an update 2 months ago
view post
Post
3220
Hackathons in Paris on July 5th and 6th!

Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.

Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.

Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
  • 2 replies
ยท
pagezyhfย 
posted an update 2 months ago
view post
Post
2404
Webinar Alert

Build your first chatbot with a Hugging Face Spaces frontend and Gaudi-powered backend with @bconsolvo ! He will teach you how to build an LLM-powered chatbot using Streamlit and Hugging Face Spacesโ€”integrating a model endpoint hosted on an Intelยฎ Gaudiยฎ accelerator.

Beginners are welcome

https://web.cvent.com/event/70e11f23-7c52-4994-a918-96fa9d5e935f/summary

  • 1 reply
ยท
jeffboudierย 
posted an update 3 months ago
view post
Post
1690
Today we launched Training Cluster as a Service, to make the new DGX Cloud Lepton supercloud easily accessible to AI researchers.

Hugging Face will collaborate with NVIDIA to provision and set up GPU training clusters to make them available for the duration of training runs.

Hugging Face organizations can sign up here: https://huggingface.co/training-cluster
jeffboudierย 
posted an update 3 months ago
jeffboudierย 
posted an update 3 months ago
view post
Post
497
Wrapping up a week of shipping and announcements with Dell Enterprise Hub now featuring AI Applications, on-device models for AI PCs, a new CLI and Python SDK... all you need for building AI on premises!

Blog post has all the details: https://huggingface.co/blog/dell-ai-applications
regisssย 
posted an update 3 months ago
jeffboudierย 
posted an update 3 months ago
view post
Post
2597
Transcribing 1 hour of audio for less than $0.01 ๐Ÿคฏ

@mfuntowicz cooked with 8x faster Whisper speech recognition - whisper-large-v3-turbo transcribes at 100x real time on a $0.80/hr L4 GPU!

How they did it: https://huggingface.co/blog/fast-whisper-endpoints

1-click deploy with HF Inference Endpoints: https://endpoints.huggingface.co/new?repository=openai%2Fwhisper-large-v3-turbo&vendor=aws&region=us-east&accelerator=gpu&instance_id=aws-us-east-1-nvidia-l4-x1&task=automatic-speech-recognition&no_suggested_compute=true