Instructions to use ByteDance/Ouro-2.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ByteDance/Ouro-2.6B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ByteDance/Ouro-2.6B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("ByteDance/Ouro-2.6B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ByteDance/Ouro-2.6B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ByteDance/Ouro-2.6B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance/Ouro-2.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ByteDance/Ouro-2.6B

SGLang

How to use ByteDance/Ouro-2.6B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ByteDance/Ouro-2.6B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance/Ouro-2.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ByteDance/Ouro-2.6B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance/Ouro-2.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ByteDance/Ouro-2.6B with Docker Model Runner:
```
docker model run hf.co/ByteDance/Ouro-2.6B
```

Lower evaluation results

by MianchuWang - opened Dec 9, 2025

Discussion

MianchuWang

Dec 9, 2025

Dear Authors,

Thank you for your contribution to this research direction. I'm currently trying to reproduce the GSM8K results reported for Ouro 1.4B R4 and Ouro 2.6B R4, but I'm encountering some difficulties.

I ran the following evaluation code:

import lm_eval
results = lm_eval.simple_evaluate(
    model="hf",
    model_args="pretrained=ByteDance/Ouro-1.4B,trust_remote_code=True,dtype=float32",
    tasks=["gsm8k_cot"],
    num_fewshot=3,
    batch_size=1,
    limit=50,
    device="cuda:0",
)

With this setup, I obtain ~0.5 accuracy for Ouro 1.4B and ~0.6 for Ouro 2.6B. May I ask whether there is anything incorrect in my configuration, or whether I am missing any additional steps required to replicate the reported results?

Thank you for your time and guidance.

KristianS7

Mar 9

Hi @MianchuWang ,

It may be too late for you, but for future reference: the main issue in your config is limit=50. Evaluating on only 50 samples introduces high variance. You need to remove the limit and run on the full dataset to get stable results.

Additionally, ensure NO chat template is applied to the prompts (as you already did) and exact match under flexible-extract should be reported.

With the full dataset and raw text formatting, I can reproduce all paper results with both vLLM and HF backends using the standard lm_eval settings.

Versions:

vllm: 0.16.0
transformers: 4.57.6
lm-eval: 0.4.11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment