Organization Card

DFloat11 — The Drop-in Successor to bf16 🏆

30 % smaller · 0 % accuracy loss · ~50 % cheaper to serve

Why DFloat11 instead of bf16?

	bf16 (current standard)	DFloat11 (new standard)
Bits / weight	16	≈ 11 (variable)
Accuracy	Exact	Exact (lossless)
VRAM needed	100 %	≈ 70 %
Workflow	Native in `transformers`	Same APIs – load, fine-tune, serve
Licence	—	Apache-2.0 reference impl

What’s in it for OSS devs & researchers?

Run bigger models on the same card
A 24 GB GPU jumps from 8 B → 12 B params without code tweaks.
Perfect reproducibility
DF11 is mathematically lossless—identical logits across machines, great for shared baselines and papers.
No new tooling to learn
Works out-of-the-box with transformers + gguf; fully LoRA/QLoRA-compatible.
Open & hackable
Reference C++/CUDA codec under Apache-2.0 → https://github.com/LeanModels/DFloat11 (PRs welcome!).
Greener inference
Less memory traffic = lower power draw—helpful on laptops, edge devices, and big clusters alike.

How does it work?

DFloat11 keeps the sign + mantissa bits unchanged and Huffman-compresses the mostly-empty exponent bits of bf16 weights. Average cost: ≈ 11 bits instead of 16—no information discarded.

Try the first DFloat11 model sets

🔗 Deep-dive blog: “Pied Piper is Here with DFloat11” – diagrams, math, real-world redaction case study.

Lossless • Same transformers workflow • Fine-tune via LoRA/FeatherTune • Open spec

Benchmarks & formal proofs: see the arXiv preprint.

The xMADified Family

From xMAD.ai

Welcome to the official Hugging Face organization for xMADified models from xMAD.ai!

The repositories below contains popular open-source models xMADified with our NeurIPS 2024 methods from 16-bit floats to 4-bit integers, using xMAD.ai proprietary technology.

These models are fine-tunable over the same reduced (4x less) hardware in mere 3-clicks.

Watch our product demo here

CLICK HERE TO JOIN BETA for:

No-code deployment
Proprietary Dataset Management
On-Premise Fine-tuning
Endpoint Scaling
System Health Monitoring
Seamless API Integration

and more!

The memory and hardware requirements (GPU memory needed to run as well as fine-tune them) are listed in the table below:

Model	GPU Memory Requirement (Before/After)
Llama-3.1-405B-Instruct-xMADai-INT4	800 GB (16 H100s) → 250 GB (8 V100)
Llama-3.1-Nemotron-70B-Instruct-xMADai-INT4	140 GB (4 L40S) → 40 GB (1 L40S)
Llama-3.1-8B-Instruct-xMADai-INT4	16 GB → 7 GB (any laptop GPU)
Llama-3.2-3B-Instruct-xMADai-INT4	6.5 GB → 3.5 GB (any laptop GPU)
Llama-3.2-1B-Instruct-xMADai-4bit	2.5 GB → 2 GB (any laptop GPU)
Mistral-Small-Instruct-2409-xMADai-INT4	44 GB → 12 GB (T4)
Mistral-Large-Instruct-2407-xMADai-INT4	250 GB → 65GB (1 A100)
gemma-2-9b-it-xMADai-INT4	18.5 GB → 8 GB (any laptop GPU)