|
--- |
|
language: |
|
- en |
|
- fr |
|
- de |
|
- es |
|
- pt |
|
- it |
|
- ja |
|
- ko |
|
- ru |
|
- zh |
|
- ar |
|
- fa |
|
- id |
|
- ms |
|
- ne |
|
- pl |
|
- ro |
|
- sr |
|
- sv |
|
- tr |
|
- uk |
|
- vi |
|
- hi |
|
- bn |
|
license: apache-2.0 |
|
library_name: vllm |
|
inference: false |
|
base_model: |
|
- mistralai/Devstrall-Small-2505 |
|
extra_gated_description: >- |
|
If you want to learn more about how we process your personal data, please read |
|
our <a href="https://mistral.ai/terms/">Privacy Policy</a>. |
|
pipeline_tag: text2text-generation |
|
--- |
|
|
|
# Devstral-Small-2505 |
|
|
|
Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this [benchmark](#benchmark-results). |
|
|
|
It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503), therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed. |
|
|
|
For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. |
|
|
|
Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral). |
|
|
|
|
|
## Key Features: |
|
- **Agentic coding**: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. |
|
- **lightweight**: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. |
|
- **Apache 2.0 License**: Open license allowing usage and modification for both commercial and non-commercial purposes. |
|
- **Context Window**: A 128k context window. |
|
- **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size. |
|
|
|
|
|
|
|
## Benchmark Results |
|
|
|
### SWE-Bench |
|
|
|
Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA by 6%. |
|
|
|
| Model | Scaffold | SWE-Bench Verified (%) | |
|
|------------------|--------------------|------------------------| |
|
| Devstral | OpenHands Scaffold | **46.8** | |
|
| GPT-4.1-mini | OpenAI Scaffold | 23.6 | |
|
| Claude 3.5 Haiku | Anthropic Scaffold | 40.6 | |
|
| SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 | |
|
|
|
|
|
When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B. |
|
|
|
 |
|
|
|
## Usage |
|
|
|
We recommend to use Devstral with the [OpenHands](https://github.com/All-Hands-AI/OpenHands/tree/main) scaffold. |
|
You can use it either through our API or by running locally. |
|
|
|
### API |
|
Follow these [instructions](https://docs.mistral.ai/getting-started/quickstart/#account-setup) to create a Mistral account and get an API key. |
|
|
|
Then run these commands to start the OpenHands docker container. |
|
```bash |
|
export MISTRAL_API_KEY=<MY_KEY> |
|
|
|
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik |
|
|
|
mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"mistral/devstral-small-2505","llm_api_key":"'$MISTRAL_API_KEY'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true}' > ~/.openhands-state/settings.json |
|
|
|
docker run -it --rm --pull=always \ |
|
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \ |
|
-e LOG_ALL_EVENTS=true \ |
|
-v /var/run/docker.sock:/var/run/docker.sock \ |
|
-v ~/.openhands-state:/.openhands-state \ |
|
-p 3000:3000 \ |
|
--add-host host.docker.internal:host-gateway \ |
|
--name openhands-app \ |
|
docker.all-hands.dev/all-hands-ai/openhands:0.39 |
|
``` |
|
|
|
### Local inference |
|
|
|
The model can also be deployed with the following libraries: |
|
- [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm-recommended) |
|
- [`mistral-inference`](https://github.com/mistralai/mistral-inference): See [here](#mistral-inference) |
|
- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers) |
|
- [`LMStudio`](https://lmstudio.ai/): See [here](#lmstudio) |
|
- [`llama.cpp`](https://github.com/ggml-org/llama.cpp): See [here](#llama.cpp) |
|
- [`ollama`](https://github.com/ollama/ollama): See [here](#ollama) |
|
|
|
|
|
### OpenHands (recommended) |
|
|
|
#### Launch a server to deploy Devstral-Small-2505 |
|
|
|
Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral-Small-2505`. |
|
|
|
In the case of the tutorial we spineed up a vLLM server running the command: |
|
```bash |
|
vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2 |
|
``` |
|
|
|
The server address should be in the following format: `http://<your-server-url>:8000/v1` |
|
|
|
#### Launch OpenHands |
|
|
|
You can follow installation of OpenHands [here](https://docs.all-hands.dev/modules/usage/installation). |
|
|
|
The easiest way to launch OpenHands is to use the Docker image: |
|
```bash |
|
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik |
|
|
|
docker run -it --rm --pull=always \ |
|
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \ |
|
-e LOG_ALL_EVENTS=true \ |
|
-v /var/run/docker.sock:/var/run/docker.sock \ |
|
-v ~/.openhands-state:/.openhands-state \ |
|
-p 3000:3000 \ |
|
--add-host host.docker.internal:host-gateway \ |
|
--name openhands-app \ |
|
docker.all-hands.dev/all-hands-ai/openhands:0.38 |
|
``` |
|
|
|
|
|
Then, you can access the OpenHands UI at `http://localhost:3000`. |
|
|
|
#### Connect to the server |
|
|
|
When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier. |
|
|
|
Fill the following fields: |
|
- **Custom Model**: `openai/mistralai/Devstral-Small-2505` |
|
- **Base URL**: `http://<your-server-url>:8000/v1` |
|
- **API Key**: `token` (or any other token you used to launch the server if any) |
|
|
|
#### Use OpenHands powered by Devstral |
|
|
|
Now you're good to use Devstral Small inside OpenHands by **starting a new conversation**. Let's build a To-Do list app. |
|
|
|
<details> |
|
<summary>To-Do list app</summary |
|
|
|
1. Let's ask Devstral to generate the app with the following prompt: |
|
|
|
```txt |
|
Build a To-Do list app with the following requirements: |
|
- Built using FastAPI and React. |
|
- Make it a one page app that: |
|
- Allows to add a task. |
|
- Allows to delete a task. |
|
- Allows to mark a task as done. |
|
- Displays the list of tasks. |
|
- Store the tasks in a SQLite database. |
|
``` |
|
|
|
 |
|
|
|
|
|
2. Let's see the result |
|
|
|
You should see the agent construct the app and be able to explore the code it generated. |
|
|
|
If it doesn't do it automatically, ask Devstral to deploy the app or do it manually, and then go the front URL deployment to see the app. |
|
|
|
 |
|
 |
|
|
|
|
|
3. Iterate |
|
|
|
Now that you have a first result you can iterate on it by asking your agent to improve it. For example, in the app generated we could click on a task to mark it checked but having a checkbox would improve UX. You could also ask it to add a feature to edit a task, or to add a feature to filter the tasks by status. |
|
|
|
Enjoy building with Devstral Small and OpenHands! |
|
|
|
</details> |
|
|
|
|
|
### vLLM (recommended) |
|
|
|
We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm) |
|
to implement production-ready inference pipelines. |
|
|
|
**_Installation_** |
|
|
|
Make sure you install [`vLLM >= 0.8.5`](https://github.com/vllm-project/vllm/releases/tag/v0.8.5): |
|
|
|
``` |
|
pip install vllm --upgrade |
|
``` |
|
|
|
Doing so should automatically install [`mistral_common >= 1.5.5`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.5). |
|
|
|
To check: |
|
``` |
|
python -c "import mistral_common; print(mistral_common.__version__)" |
|
``` |
|
|
|
You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39). |
|
|
|
#### Server |
|
|
|
We recommand that you use Devstral in a server/client setting. |
|
|
|
1. Spin up a server: |
|
|
|
``` |
|
vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2 |
|
``` |
|
|
|
|
|
2. To ping the client you can use a simple Python snippet. |
|
|
|
```py |
|
import requests |
|
import json |
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
url = "http://<your-server-url>:8000/v1/chat/completions" |
|
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"} |
|
|
|
model = "mistralai/Devstral-Small-2505" |
|
|
|
def load_system_prompt(repo_id: str, filename: str) -> str: |
|
file_path = hf_hub_download(repo_id=repo_id, filename=filename) |
|
with open(file_path, "r") as file: |
|
system_prompt = file.read() |
|
return system_prompt |
|
|
|
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt") |
|
|
|
messages = [ |
|
{"role": "system", "content": SYSTEM_PROMPT}, |
|
{ |
|
"role": "user", |
|
"content": [ |
|
{ |
|
"type": "text", |
|
"text": "<your-command>", |
|
}, |
|
], |
|
}, |
|
] |
|
|
|
data = {"model": model, "messages": messages, "temperature": 0.15} |
|
|
|
response = requests.post(url, headers=headers, data=json.dumps(data)) |
|
print(response.json()["choices"][0]["message"]["content"]) |
|
``` |
|
|
|
### Mistral-inference |
|
|
|
We recommend using mistral-inference to quickly try out / "vibe-check" Devstral. |
|
|
|
#### Install |
|
|
|
Make sure to have mistral_inference >= 1.6.0 installed. |
|
|
|
```bash |
|
pip install mistral_inference --upgrade |
|
``` |
|
|
|
#### Download |
|
|
|
```python |
|
from huggingface_hub import snapshot_download |
|
from pathlib import Path |
|
|
|
mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral') |
|
mistral_models_path.mkdir(parents=True, exist_ok=True) |
|
|
|
snapshot_download(repo_id="mistralai/Devstral-Small-2505", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path) |
|
``` |
|
|
|
#### Python |
|
|
|
You can run the model using the following command: |
|
|
|
```bash |
|
mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300 |
|
``` |
|
|
|
You can then prompt it with anything you'd like. |
|
|
|
### Transformers |
|
|
|
To make the best use of our model with transformers make sure to have [installed](https://github.com/mistralai/mistral-common) ` mistral-common >= 1.5.5` to use our tokenizer. |
|
|
|
```bash |
|
pip install mistral-common --upgrade |
|
``` |
|
|
|
Then load our tokenizer along with the model and generate: |
|
|
|
```python |
|
import torch |
|
|
|
from mistral_common.protocol.instruct.messages import ( |
|
SystemMessage, UserMessage |
|
) |
|
from mistral_common.protocol.instruct.request import ChatCompletionRequest |
|
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer |
|
from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy |
|
from huggingface_hub import hf_hub_download |
|
from transformers import AutoModelForCausalLM |
|
|
|
def load_system_prompt(repo_id: str, filename: str) -> str: |
|
file_path = hf_hub_download(repo_id=repo_id, filename=filename) |
|
with open(file_path, "r") as file: |
|
system_prompt = file.read() |
|
return system_prompt |
|
|
|
model_id = "mistralai/Devstral-Small-2505" |
|
tekken_file = hf_hub_download(repo_id=model_id, filename="tekken.json") |
|
SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt") |
|
|
|
tokenizer = MistralTokenizer.from_file(tekken_file) |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_id) |
|
|
|
tokenized = tokenizer.encode_chat_completion( |
|
ChatCompletionRequest( |
|
messages=[ |
|
SystemMessage(content=SYSTEM_PROMPT), |
|
UserMessage(content="<your-command>"), |
|
], |
|
) |
|
) |
|
|
|
output = model.generate( |
|
input_ids=torch.tensor([tokenized.tokens]), |
|
max_new_tokens=1000, |
|
)[0] |
|
|
|
decoded_output = tokenizer.decode(output[len(tokenized.tokens):]) |
|
print(decoded_output) |
|
``` |
|
|
|
### LMStudio |
|
Download the weights from huggingface: |
|
|
|
``` |
|
pip install -U "huggingface_hub[cli]" |
|
huggingface-cli download \ |
|
"mistralai/Devstral-Small-2505_gguf" \ |
|
--include "devstralQ4_K_M.gguf" \ |
|
--local-dir "mistralai/Devstral-Small-2505_gguf/" |
|
``` |
|
|
|
You can serve the model locally with [LMStudio](https://lmstudio.ai/). |
|
* Download [LM Studio](https://lmstudio.ai/) and install it |
|
* Install `lms cli ~/.lmstudio/bin/lms bootstrap` |
|
* In a bash terminal, run `lms import devstralQ4_K_M.gguf` in the directory where you've downloaded the model checkpoint (e.g. `mistralai/Devstral-Small-2505_gguf`) |
|
* Open the LMStudio application, click the terminal icon to get into the developer tab. Click select a model to load and select Devstral Q4 K M. Toggle the status button to start the model, in setting toggle Serve on Local Network to be on. |
|
* On the right tab, you will see an API identifier which should be devstralq4_k_m and an api address under API Usage. Keep note of this address, we will use it in the next step. |
|
|
|
Launch Openhands |
|
You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker |
|
|
|
```bash |
|
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik |
|
docker run -it --rm --pull=always \ |
|
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \ |
|
-e LOG_ALL_EVENTS=true \ |
|
-v /var/run/docker.sock:/var/run/docker.sock \ |
|
-v ~/.openhands-state:/.openhands-state \ |
|
-p 3000:3000 \ |
|
--add-host host.docker.internal:host-gateway \ |
|
--name openhands-app \ |
|
docker.all-hands.dev/all-hands-ai/openhands:0.38 |
|
``` |
|
|
|
Click “see advanced setting” on the second line. |
|
In the new tab, toggle advanced to on. Set the custom model to be mistral/devstralq4_k_m and Base URL the api address we get from the last step in LM Studio. Set API Key to dummy. Click save changes. |
|
|
|
### llama.cpp |
|
|
|
Download the weights from huggingface: |
|
|
|
``` |
|
pip install -U "huggingface_hub[cli]" |
|
huggingface-cli download \ |
|
"mistralai/Devstral-Small-2505_gguf" \ |
|
--include "devstralQ4_K_M.gguf" \ |
|
--local-dir "mistralai/Devstral-Small-2505_gguf/" |
|
``` |
|
|
|
Then run Devstral using the llama.cpp CLI. |
|
|
|
```bash |
|
./llama-cli -m Devstral-Small-2505_gguf/devstralQ4_K_M.gguf -cnv |
|
``` |
|
|
|
### Ollama |
|
|
|
You can run Devstral using the [Ollama](https://ollama.ai/) CLI. |
|
|
|
```bash |
|
ollama run devstral |
|
``` |