xMAD.ai

Enterprise
company
Verified
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

DFloat11 — The Drop-in Successor to bf16 🏆

30 % smaller · 0 % accuracy loss · ~50 % cheaper to serve

Why DFloat11 instead of bf16?

bf16 (current standard) DFloat11 (new standard)
Bits / weight 16 ≈ 11 (variable)
Accuracy Exact Exact (lossless)
VRAM needed 100 % ≈ 70 %
Workflow Native in transformers Same APIs – load, fine-tune, serve
Licence Apache-2.0 reference impl

What’s in it for OSS devs & researchers?

  • Run bigger models on the same card
    A 24 GB GPU jumps from 8 B → 12 B params without code tweaks.

  • Perfect reproducibility
    DF11 is mathematically lossless—identical logits across machines, great for shared baselines and papers.

  • No new tooling to learn
    Works out-of-the-box with transformers + gguf; fully LoRA/QLoRA-compatible.

  • Open & hackable
    Reference C++/CUDA codec under Apache-2.0 → https://github.com/LeanModels/DFloat11 (PRs welcome!).

  • Greener inference
    Less memory traffic = lower power draw—helpful on laptops, edge devices, and big clusters alike.

How does it work?

DFloat11 keeps the sign + mantissa bits unchanged and Huffman-compresses the mostly-empty exponent bits of bf16 weights. Average cost: ≈ 11 bits instead of 16—no information discarded.

Try the first DFloat11 model sets

🔗 Deep-dive blog: “Pied Piper is Here with DFloat11” – diagrams, math, real-world redaction case study.

Lossless • Same transformers workflow • Fine-tune via LoRA/FeatherTune • Open spec

Benchmarks & formal proofs: see the arXiv preprint.


The xMADified Family

From xMAD.ai

Welcome to the official Hugging Face organization for xMADified models from xMAD.ai!

The repositories below contains popular open-source models xMADified with our NeurIPS 2024 methods from 16-bit floats to 4-bit integers, using xMAD.ai proprietary technology.

These models are fine-tunable over the same reduced (4x less) hardware in mere 3-clicks.

Watch our product demo here

CLICK HERE TO JOIN BETA for:

  • No-code deployment
  • Proprietary Dataset Management
  • On-Premise Fine-tuning
  • Endpoint Scaling
  • System Health Monitoring
  • Seamless API Integration

and more!

The memory and hardware requirements (GPU memory needed to run as well as fine-tune them) are listed in the table below:

Model GPU Memory Requirement (Before/After)
Llama-3.1-405B-Instruct-xMADai-INT4 800 GB (16 H100s) → 250 GB (8 V100)
Llama-3.1-Nemotron-70B-Instruct-xMADai-INT4 140 GB (4 L40S) → 40 GB (1 L40S)
Llama-3.1-8B-Instruct-xMADai-INT4 16 GB → 7 GB (any laptop GPU)
Llama-3.2-3B-Instruct-xMADai-INT4 6.5 GB → 3.5 GB (any laptop GPU)
Llama-3.2-1B-Instruct-xMADai-4bit 2.5 GB → 2 GB (any laptop GPU)
Mistral-Small-Instruct-2409-xMADai-INT4 44 GB → 12 GB (T4)
Mistral-Large-Instruct-2407-xMADai-INT4 250 GB → 65GB (1 A100)
gemma-2-9b-it-xMADai-INT4 18.5 GB → 8 GB (any laptop GPU)

xMAD.ai LinkedIn

datasets 0

None public yet