-
xmadai/Llama-3.1-405B-Instruct-xMADai-INT4
Text Generation • Updated • 12 • 6 -
xmadai/Llama-3.1-Nemotron-70B-Instruct-xMADai-INT4
Text Generation • Updated • 16 • 4 -
xmadai/Llama-3.1-8B-Instruct-xMADai-INT4
Text Generation • Updated • 10 • 5 -
xmadai/Llama-3.2-3B-Instruct-xMADai-INT4
Text Generation • Updated • 50 • 6

xMAD.ai
AI & ML interests
None defined yet.
Recent Activity
DFloat11 — The Drop-in Successor to bf16 🏆
30 % smaller · 0 % accuracy loss · ~50 % cheaper to serve
Why DFloat11 instead of bf16?
bf16 (current standard) | DFloat11 (new standard) | |
---|---|---|
Bits / weight | 16 | ≈ 11 (variable) |
Accuracy | Exact | Exact (lossless) |
VRAM needed | 100 % | ≈ 70 % |
Workflow | Native in transformers |
Same APIs – load, fine-tune, serve |
Licence | — | Apache-2.0 reference impl |
What’s in it for OSS devs & researchers?
Run bigger models on the same card
A 24 GB GPU jumps from 8 B → 12 B params without code tweaks.Perfect reproducibility
DF11 is mathematically lossless—identical logits across machines, great for shared baselines and papers.No new tooling to learn
Works out-of-the-box withtransformers
+gguf
; fully LoRA/QLoRA-compatible.Open & hackable
Reference C++/CUDA codec under Apache-2.0 → https://github.com/LeanModels/DFloat11 (PRs welcome!).Greener inference
Less memory traffic = lower power draw—helpful on laptops, edge devices, and big clusters alike.
How does it work?
DFloat11 keeps the sign + mantissa bits unchanged and Huffman-compresses the mostly-empty exponent bits of bf16 weights. Average cost: ≈ 11 bits instead of 16—no information discarded.
Try the first DFloat11 model sets
🔗 Deep-dive blog: “Pied Piper is Here with DFloat11” – diagrams, math, real-world redaction case study.
Lossless • Same
transformers
workflow • Fine-tune via LoRA/FeatherTune • Open spec
Benchmarks & formal proofs: see the arXiv preprint.
The xMADified Family
From xMAD.ai
Welcome to the official Hugging Face organization for xMADified models from xMAD.ai!
The repositories below contains popular open-source models xMADified with our NeurIPS 2024 methods from 16-bit floats to 4-bit integers, using xMAD.ai proprietary technology.
These models are fine-tunable over the same reduced (4x less) hardware in mere 3-clicks.
Watch our product demo here
CLICK HERE TO JOIN BETA for:
- No-code deployment
- Proprietary Dataset Management
- On-Premise Fine-tuning
- Endpoint Scaling
- System Health Monitoring
- Seamless API Integration
and more!
The memory and hardware requirements (GPU memory needed to run as well as fine-tune them) are listed in the table below:
Model | GPU Memory Requirement (Before/After) |
---|---|
Llama-3.1-405B-Instruct-xMADai-INT4 | 800 GB (16 H100s) → 250 GB (8 V100) |
Llama-3.1-Nemotron-70B-Instruct-xMADai-INT4 | 140 GB (4 L40S) → 40 GB (1 L40S) |
Llama-3.1-8B-Instruct-xMADai-INT4 | 16 GB → 7 GB (any laptop GPU) |
Llama-3.2-3B-Instruct-xMADai-INT4 | 6.5 GB → 3.5 GB (any laptop GPU) |
Llama-3.2-1B-Instruct-xMADai-4bit | 2.5 GB → 2 GB (any laptop GPU) |
Mistral-Small-Instruct-2409-xMADai-INT4 | 44 GB → 12 GB (T4) |
Mistral-Large-Instruct-2407-xMADai-INT4 | 250 GB → 65GB (1 A100) |
gemma-2-9b-it-xMADai-INT4 | 18.5 GB → 8 GB (any laptop GPU) |
Collections
3
spaces
2
models
9

xmadai/Llama-3.1-70B-Instruct-xMADai-INT4

xmadai/Mistral-Large-Instruct-2407-xMADai-INT4

xmadai/Llama-3.1-Nemotron-70B-Instruct-xMADai-INT4

xmadai/Llama-3.1-405B-Instruct-xMADai-INT4

xmadai/gemma-2-9b-it-xMADai-INT4

xmadai/Llama-3.2-3B-Instruct-xMADai-INT4

xmadai/Mistral-Small-Instruct-2409-xMADai-INT4

xmadai/Llama-3.1-8B-Instruct-xMADai-INT4
