Instructions to use SmallDoge/Doge-20M-MoE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SmallDoge/Doge-20M-MoE with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SmallDoge/Doge-20M-MoE", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M-MoE", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M-MoE", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use SmallDoge/Doge-20M-MoE with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SmallDoge/Doge-20M-MoE"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SmallDoge/Doge-20M-MoE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/SmallDoge/Doge-20M-MoE

SGLang

How to use SmallDoge/Doge-20M-MoE with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SmallDoge/Doge-20M-MoE" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SmallDoge/Doge-20M-MoE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SmallDoge/Doge-20M-MoE" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SmallDoge/Doge-20M-MoE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use SmallDoge/Doge-20M-MoE with Docker Model Runner:
```
docker model run hf.co/SmallDoge/Doge-20M-MoE
```

Doge 20M MoE

Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by SmallDoge community, for detailed algorithm and model architecture, paper coming soon, all training details and code are available in the small-doge repository.

Uses

>>> from transformers import AutoTokenizer, AutoModelForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M-MoE")
>>> model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M-MoE", trust_remote_code=True)
>>> inputs = tokenizer("Hey how are you doing?", return_tensors="pt")

>>> out = model.generate(**inputs, max_new_tokens=100)
>>> print(tokenizer.batch_decode(out))

Model Details

We build the Doge by doing Per-Training on Smollm-Corpus. If you want to continue pre-training this model, you can find the unconverged checkpoint here. These models has not been fine-tuned for instruction, the instruction model is here.

Pre-Training:

Model	Training Data	Steps	Content Length	Tokens	LR	Batch Size	Precision	RTX 4090 GPU hours
Doge-20M	smollm-corpus	8k	2048	4B	8e-3	0.5M	bfloat16	14
Doge-20M-MoE	smollm-corpus	8k	2048	8B	8e-3	0.5M	bfloat16	25
Doge-60M	smollm-corpus	16k	2048	16B	6e-3	1M	bfloat16	128
Doge-120M-MoE	smollm-corpus	16k	2048	32B	6e-3	1M	bfloat16	268
Doge-160M	smollm-corpus	24k	2048	32B	4e-3	1.5M	bfloat16	522
Doge-320M	smollm-corpus	32k	2048	64B	2e-3	2M	bfloat16	1856

Evaluation:

Model	MMLU	TriviaQA	ARC	PIQA	HellaSwag	OBQA	Winogrande	tokens / s on i7-11 CPU
Doge-20M	25.4	0.03	29.8	58.4	27.3	25.6	50.2	142
Doge-20M-MoE	26.5	0.2	30.9	59.0	28.9	28.4	51.2	132
Doge-60M	26.4	0.2	37.9	61.4	31.5	28.0	50.8	62
Doge-120M-MoE	28.2	0.4	40.2	63.2	36.3	31.6	51.3	58
Doge-160M	29.2	4.8	44.4	70.1	43.4	34.4	52.2	28
Doge-320M	35.6	9.4	55.4	73.9	52.7	37.9	59.3	16

Procedure:

Environment:

Image: nvcr.io/nvidia/pytorch:24.12-py3
Hardware: 1x NVIDIA RTX 4090
Software: Transformers

Citation

@misc{smalldoges,
  title={SmallDoges: A Family of Dynamic UltraFast Small Language Models}, 
  author={Jingze, Shi and Yifan, Wu and Bingheng, Wu and Yuyu, Luo},
  year={2025},
  month={March},
  url={https://github.com/SmallDoges/small-doge}
}