Introduction

SDAR(Synergy of Diffusion and AutoRegression)-model is a new large language model that integrates autoregressive (AR) and discrete diffusion modeling strategies. It combines the efficient training paradigm of AR models with the highly parallel inference capability of diffusion models, while delivering performance fully on par with SOTA opensource AR models. At the same time, SDAR sets a new benchmark as the most powerful diffusion language model to date.

Performance of SDAR-1.7B-Chat on various benchmarks

evaluation settings:

  • MMLU: 5-shot
  • Math500: 0-shot
  • GSM8K: 0-shot
  • HumanEval: 0-shot
  • Sanitized_MBPP: 0-shot
  • IFEval: 0-shot
  • MathBench: 0-shot
Model MMLU Math500 GSM8K HumanEval Sanitized_MBPP IFEval MathBench
SDAR-1.7B-Chat 62.9 63.2 80.06 61.59 61.09 43.44 63.55
SDAR-4B-Chat
SDAR-8B-Chat
SDAR-30B-A3B-Chat

Note: The 4B, 8B, and 30B models are coming soon. Performance results for these models will be released in the near future.

Inference

The inference code will come soon

Hightlights

  • Performance: SDAR-1.7B-Chat achieves state-of-the-art.
  • Efficiency: SDAR provides over 2ร— faster inference speed compared to the same-size AR models, while maintaining comparable performance.
Downloads last month
5
Safetensors
Model size
2.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support