Update README.md
Browse files
README.md
CHANGED
@@ -1,9 +1,32 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
base_model:
|
4 |
-
- mistralai/Devstral-Small-
|
5 |
language:
|
6 |
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pipeline_tag: text-generation
|
8 |
tags:
|
9 |
- merge
|
@@ -21,14 +44,16 @@ tags:
|
|
21 |
library_name: transformers
|
22 |
---
|
23 |
|
24 |
-
|
|
|
|
|
25 |
|
26 |
This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats.
|
27 |
The source code can also be used directly.
|
28 |
|
29 |
This model contains Brainstorm 40x, combined with Mistral's 24B Coder (instruct model):
|
30 |
|
31 |
-
https://huggingface.co/mistralai/Devstral-Small-
|
32 |
|
33 |
Information on the 24B Mistral model below, followed by Brainstorm 40x adapter (by DavidAU) and then a complete help
|
34 |
section for running LLM / AI models.
|
@@ -54,15 +79,20 @@ For simpler coding problems, lower quants will work well; but for complex/multi-
|
|
54 |
|
55 |
---
|
56 |
|
57 |
-
# Devstral Small 1.
|
58 |
|
59 |
-
Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which
|
60 |
|
61 |
It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503), therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed.
|
62 |
|
63 |
For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
|
64 |
|
65 |
-
Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
|
|
|
|
|
|
|
|
|
|
|
66 |
|
67 |
|
68 |
## Key Features:
|
@@ -73,29 +103,31 @@ Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
|
|
73 |
- **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size.
|
74 |
|
75 |
|
76 |
-
|
77 |
## Benchmark Results
|
78 |
|
79 |
### SWE-Bench
|
80 |
|
81 |
-
Devstral achieves a score of
|
82 |
|
83 |
-
| Model
|
84 |
-
|
85 |
-
| Devstral
|
86 |
-
|
|
87 |
-
|
|
88 |
-
|
|
|
|
|
|
|
|
89 |
|
90 |
|
91 |
When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.
|
92 |
|
93 |
-
 scaffold.
|
98 |
-
You can use it either through our API or by running locally.
|
99 |
|
100 |
### API
|
101 |
Follow these [instructions](https://docs.mistral.ai/getting-started/quickstart/#account-setup) to create a Mistral account and get an API key.
|
@@ -104,19 +136,19 @@ Then run these commands to start the OpenHands docker container.
|
|
104 |
```bash
|
105 |
export MISTRAL_API_KEY=<MY_KEY>
|
106 |
|
107 |
-
|
108 |
|
109 |
-
|
110 |
|
111 |
docker run -it --rm --pull=always \
|
112 |
-
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.
|
113 |
-e LOG_ALL_EVENTS=true \
|
114 |
-v /var/run/docker.sock:/var/run/docker.sock \
|
115 |
-
-v ~/.openhands
|
116 |
-p 3000:3000 \
|
117 |
--add-host host.docker.internal:host-gateway \
|
118 |
--name openhands-app \
|
119 |
-
docker.all-hands.dev/all-hands-ai/openhands:0.
|
120 |
```
|
121 |
|
122 |
### Local inference
|
@@ -130,106 +162,27 @@ The model can also be deployed with the following libraries:
|
|
130 |
- [`ollama`](https://github.com/ollama/ollama): See [here](#ollama)
|
131 |
|
132 |
|
133 |
-
|
134 |
-
|
135 |
-
#### Launch a server to deploy Devstral Small 1.0
|
136 |
-
|
137 |
-
Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.0`.
|
138 |
-
|
139 |
-
In the case of the tutorial we spineed up a vLLM server running the command:
|
140 |
-
```bash
|
141 |
-
vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
|
142 |
-
```
|
143 |
-
|
144 |
-
The server address should be in the following format: `http://<your-server-url>:8000/v1`
|
145 |
-
|
146 |
-
#### Launch OpenHands
|
147 |
-
|
148 |
-
You can follow installation of OpenHands [here](https://docs.all-hands.dev/modules/usage/installation).
|
149 |
-
|
150 |
-
The easiest way to launch OpenHands is to use the Docker image:
|
151 |
-
```bash
|
152 |
-
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
|
153 |
-
|
154 |
-
docker run -it --rm --pull=always \
|
155 |
-
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
|
156 |
-
-e LOG_ALL_EVENTS=true \
|
157 |
-
-v /var/run/docker.sock:/var/run/docker.sock \
|
158 |
-
-v ~/.openhands-state:/.openhands-state \
|
159 |
-
-p 3000:3000 \
|
160 |
-
--add-host host.docker.internal:host-gateway \
|
161 |
-
--name openhands-app \
|
162 |
-
docker.all-hands.dev/all-hands-ai/openhands:0.38
|
163 |
-
```
|
164 |
-
|
165 |
-
|
166 |
-
Then, you can access the OpenHands UI at `http://localhost:3000`.
|
167 |
-
|
168 |
-
#### Connect to the server
|
169 |
-
|
170 |
-
When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier.
|
171 |
-
|
172 |
-
Fill the following fields:
|
173 |
-
- **Custom Model**: `openai/mistralai/Devstral-Small-2505`
|
174 |
-
- **Base URL**: `http://<your-server-url>:8000/v1`
|
175 |
-
- **API Key**: `token` (or any other token you used to launch the server if any)
|
176 |
-
|
177 |
-
#### Use OpenHands powered by Devstral
|
178 |
-
|
179 |
-
Now you're good to use Devstral Small inside OpenHands by **starting a new conversation**. Let's build a To-Do list app.
|
180 |
|
181 |
<details>
|
182 |
-
|
183 |
-
|
184 |
-
1. Let's ask Devstral to generate the app with the following prompt:
|
185 |
-
|
186 |
-
```txt
|
187 |
-
Build a To-Do list app with the following requirements:
|
188 |
-
- Built using FastAPI and React.
|
189 |
-
- Make it a one page app that:
|
190 |
-
- Allows to add a task.
|
191 |
-
- Allows to delete a task.
|
192 |
-
- Allows to mark a task as done.
|
193 |
-
- Displays the list of tasks.
|
194 |
-
- Store the tasks in a SQLite database.
|
195 |
-
```
|
196 |
-
|
197 |
-

|
198 |
-
|
199 |
-
|
200 |
-
2. Let's see the result
|
201 |
-
|
202 |
-
You should see the agent construct the app and be able to explore the code it generated.
|
203 |
-
|
204 |
-
If it doesn't do it automatically, ask Devstral to deploy the app or do it manually, and then go the front URL deployment to see the app.
|
205 |
-
|
206 |
-

|
207 |
-

|
208 |
-
|
209 |
-
|
210 |
-
3. Iterate
|
211 |
-
|
212 |
-
Now that you have a first result you can iterate on it by asking your agent to improve it. For example, in the app generated we could click on a task to mark it checked but having a checkbox would improve UX. You could also ask it to add a feature to edit a task, or to add a feature to filter the tasks by status.
|
213 |
-
|
214 |
-
Enjoy building with Devstral Small and OpenHands!
|
215 |
-
|
216 |
-
</details>
|
217 |
-
|
218 |
-
|
219 |
-
### vLLM (recommended)
|
220 |
|
221 |
We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
|
222 |
to implement production-ready inference pipelines.
|
223 |
|
224 |
**_Installation_**
|
225 |
|
226 |
-
Make sure you install [`vLLM >= 0.
|
227 |
|
228 |
```
|
229 |
pip install vllm --upgrade
|
230 |
```
|
231 |
|
232 |
-
|
|
|
|
|
|
|
|
|
233 |
|
234 |
To check:
|
235 |
```
|
@@ -238,14 +191,14 @@ python -c "import mistral_common; print(mistral_common.__version__)"
|
|
238 |
|
239 |
You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
|
240 |
|
241 |
-
|
242 |
|
243 |
We recommand that you use Devstral in a server/client setting.
|
244 |
|
245 |
1. Spin up a server:
|
246 |
|
247 |
```
|
248 |
-
vllm serve mistralai/Devstral-Small-
|
249 |
```
|
250 |
|
251 |
|
@@ -260,7 +213,7 @@ from huggingface_hub import hf_hub_download
|
|
260 |
url = "http://<your-server-url>:8000/v1/chat/completions"
|
261 |
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
|
262 |
|
263 |
-
model = "mistralai/Devstral-Small-
|
264 |
|
265 |
def load_system_prompt(repo_id: str, filename: str) -> str:
|
266 |
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
|
@@ -285,15 +238,42 @@ messages = [
|
|
285 |
|
286 |
data = {"model": model, "messages": messages, "temperature": 0.15}
|
287 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
288 |
response = requests.post(url, headers=headers, data=json.dumps(data))
|
289 |
print(response.json()["choices"][0]["message"]["content"])
|
290 |
```
|
|
|
|
|
291 |
|
292 |
-
|
|
|
|
|
|
|
293 |
|
294 |
We recommend using mistral-inference to quickly try out / "vibe-check" Devstral.
|
295 |
|
296 |
-
|
297 |
|
298 |
Make sure to have mistral_inference >= 1.6.0 installed.
|
299 |
|
@@ -301,7 +281,7 @@ Make sure to have mistral_inference >= 1.6.0 installed.
|
|
301 |
pip install mistral_inference --upgrade
|
302 |
```
|
303 |
|
304 |
-
|
305 |
|
306 |
```python
|
307 |
from huggingface_hub import snapshot_download
|
@@ -310,10 +290,10 @@ from pathlib import Path
|
|
310 |
mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
|
311 |
mistral_models_path.mkdir(parents=True, exist_ok=True)
|
312 |
|
313 |
-
snapshot_download(repo_id="mistralai/Devstral-Small-
|
314 |
```
|
315 |
|
316 |
-
|
317 |
|
318 |
You can run the model using the following command:
|
319 |
|
@@ -323,9 +303,15 @@ mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300
|
|
323 |
|
324 |
You can then prompt it with anything you'd like.
|
325 |
|
326 |
-
|
|
|
327 |
|
328 |
-
|
|
|
|
|
|
|
|
|
|
|
329 |
|
330 |
```bash
|
331 |
pip install mistral-common --upgrade
|
@@ -350,12 +336,11 @@ def load_system_prompt(repo_id: str, filename: str) -> str:
|
|
350 |
system_prompt = file.read()
|
351 |
return system_prompt
|
352 |
|
353 |
-
model_id = "mistralai/Devstral-Small-
|
354 |
-
tekken_file = hf_hub_download(repo_id=model_id, filename="tekken.json")
|
355 |
SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
|
356 |
|
357 |
-
tokenizer = MistralTokenizer.from_file(tekken_file)
|
358 |
|
|
|
359 |
model = AutoModelForCausalLM.from_pretrained(model_id)
|
360 |
|
361 |
tokenized = tokenizer.encode_chat_completion(
|
@@ -376,73 +361,139 @@ decoded_output = tokenizer.decode(output[len(tokenized.tokens):])
|
|
376 |
print(decoded_output)
|
377 |
```
|
378 |
|
379 |
-
|
380 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
381 |
|
382 |
```
|
383 |
pip install -U "huggingface_hub[cli]"
|
384 |
huggingface-cli download \
|
385 |
-
"
|
386 |
-
--include "
|
387 |
-
--local-dir "
|
388 |
```
|
389 |
|
390 |
You can serve the model locally with [LMStudio](https://lmstudio.ai/).
|
391 |
* Download [LM Studio](https://lmstudio.ai/) and install it
|
392 |
* Install `lms cli ~/.lmstudio/bin/lms bootstrap`
|
393 |
-
* In a bash terminal, run `lms import
|
394 |
-
* Open the
|
395 |
-
* On the right tab, you will see an API identifier which should be
|
396 |
|
397 |
-
|
398 |
-
You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker
|
399 |
|
400 |
-
```bash
|
401 |
-
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
|
402 |
-
docker run -it --rm --pull=always \
|
403 |
-
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
|
404 |
-
-e LOG_ALL_EVENTS=true \
|
405 |
-
-v /var/run/docker.sock:/var/run/docker.sock \
|
406 |
-
-v ~/.openhands-state:/.openhands-state \
|
407 |
-
-p 3000:3000 \
|
408 |
-
--add-host host.docker.internal:host-gateway \
|
409 |
-
--name openhands-app \
|
410 |
-
docker.all-hands.dev/all-hands-ai/openhands:0.38
|
411 |
-
```
|
412 |
|
413 |
-
|
414 |
-
In the new tab, toggle advanced to on. Set the custom model to be mistral/devstralq4_k_m and Base URL the api address we get from the last step in LM Studio. Set API Key to dummy. Click save changes.
|
415 |
|
416 |
-
|
|
|
417 |
|
418 |
Download the weights from huggingface:
|
419 |
|
420 |
```
|
421 |
pip install -U "huggingface_hub[cli]"
|
422 |
huggingface-cli download \
|
423 |
-
"mistralai/Devstral-Small-
|
424 |
-
--include "
|
425 |
-
--local-dir "mistralai/Devstral-Small-
|
426 |
```
|
427 |
|
428 |
-
Then run Devstral using the llama.cpp
|
429 |
|
430 |
```bash
|
431 |
-
./llama-
|
432 |
```
|
433 |
|
434 |
-
|
|
|
|
|
|
|
|
|
|
|
435 |
|
436 |
-
|
437 |
|
|
|
438 |
```bash
|
439 |
-
|
440 |
```
|
441 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
442 |
|
443 |
See more here:
|
444 |
|
445 |
-
https://huggingface.co/mistralai/Devstral-Small-
|
446 |
|
447 |
---
|
448 |
|
@@ -450,7 +501,7 @@ https://huggingface.co/mistralai/Devstral-Small-2505
|
|
450 |
|
451 |
---
|
452 |
|
453 |
-
<B>Brainstorm
|
454 |
|
455 |
The BRAINSTORM process was developed by David_AU.
|
456 |
|
@@ -463,7 +514,7 @@ What is "Brainstorm" ?
|
|
463 |
|
464 |
The reasoning center of an LLM is taken apart, reassembled, and expanded.
|
465 |
|
466 |
-
In this case for this model:
|
467 |
|
468 |
Then these centers are individually calibrated. These "centers" also interact with each other.
|
469 |
This introduces subtle changes into the reasoning process.
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
base_model:
|
4 |
+
- mistralai/Devstral-Small-2507
|
5 |
language:
|
6 |
- en
|
7 |
+
- fr
|
8 |
+
- de
|
9 |
+
- es
|
10 |
+
- pt
|
11 |
+
- it
|
12 |
+
- ja
|
13 |
+
- ko
|
14 |
+
- ru
|
15 |
+
- zh
|
16 |
+
- ar
|
17 |
+
- fa
|
18 |
+
- id
|
19 |
+
- ms
|
20 |
+
- ne
|
21 |
+
- pl
|
22 |
+
- ro
|
23 |
+
- sr
|
24 |
+
- sv
|
25 |
+
- tr
|
26 |
+
- uk
|
27 |
+
- vi
|
28 |
+
- hi
|
29 |
+
- bn
|
30 |
pipeline_tag: text-generation
|
31 |
tags:
|
32 |
- merge
|
|
|
44 |
library_name: transformers
|
45 |
---
|
46 |
|
47 |
+
(uploading...)
|
48 |
+
|
49 |
+
<h2>Mistral-Devstral-2507-CODER-Brainstorm20x-34B</h2>
|
50 |
|
51 |
This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats.
|
52 |
The source code can also be used directly.
|
53 |
|
54 |
This model contains Brainstorm 40x, combined with Mistral's 24B Coder (instruct model):
|
55 |
|
56 |
+
https://huggingface.co/mistralai/Devstral-Small-2507
|
57 |
|
58 |
Information on the 24B Mistral model below, followed by Brainstorm 40x adapter (by DavidAU) and then a complete help
|
59 |
section for running LLM / AI models.
|
|
|
79 |
|
80 |
---
|
81 |
|
82 |
+
# Devstral Small 1.1
|
83 |
|
84 |
+
Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this [benchmark](#benchmark-results).
|
85 |
|
86 |
It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503), therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed.
|
87 |
|
88 |
For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
|
89 |
|
90 |
+
Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral-2507).
|
91 |
+
|
92 |
+
**Updates compared to [`Devstral Small 1.0`](https://huggingface.co/mistralai/Devstral-Small-2505):**
|
93 |
+
- Improved performance, please refer to the [benchmark results](#benchmark-results).
|
94 |
+
- `Devstral Small 1.1` is still great when paired with OpenHands. This new version also generalizes better to other prompts and coding environments.
|
95 |
+
- Supports [Mistral's function calling format](https://mistralai.github.io/mistral-common/usage/tools/).
|
96 |
|
97 |
|
98 |
## Key Features:
|
|
|
103 |
- **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size.
|
104 |
|
105 |
|
|
|
106 |
## Benchmark Results
|
107 |
|
108 |
### SWE-Bench
|
109 |
|
110 |
+
Devstral Small 1.1 achieves a score of **53.6%** on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%.
|
111 |
|
112 |
+
| Model | Agentic Scaffold | SWE-Bench Verified (%) |
|
113 |
+
|--------------------|--------------------|------------------------|
|
114 |
+
| Devstral Small 1.1 | OpenHands Scaffold | **53.6** |
|
115 |
+
| Devstral Small 1.0 | OpenHands Scaffold | *46.8* |
|
116 |
+
| GPT-4.1-mini | OpenAI Scaffold | 23.6 |
|
117 |
+
| Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
|
118 |
+
| SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 |
|
119 |
+
| Skywork SWE | OpenHands Scaffold | 38.0 |
|
120 |
+
| DeepSWE | R2E-Gym Scaffold | 42.2 |
|
121 |
|
122 |
|
123 |
When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.
|
124 |
|
125 |
+

|
126 |
|
127 |
## Usage
|
128 |
|
129 |
We recommend to use Devstral with the [OpenHands](https://github.com/All-Hands-AI/OpenHands/tree/main) scaffold.
|
130 |
+
You can use it either through our API or by running locally.
|
131 |
|
132 |
### API
|
133 |
Follow these [instructions](https://docs.mistral.ai/getting-started/quickstart/#account-setup) to create a Mistral account and get an API key.
|
|
|
136 |
```bash
|
137 |
export MISTRAL_API_KEY=<MY_KEY>
|
138 |
|
139 |
+
mkdir -p ~/.openhands && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"mistral/devstral-small-2507","llm_api_key":"'$MISTRAL_API_KEY'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true}' > ~/.openhands-state/settings.json
|
140 |
|
141 |
+
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.48-nikolaik
|
142 |
|
143 |
docker run -it --rm --pull=always \
|
144 |
+
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.48-nikolaik \
|
145 |
-e LOG_ALL_EVENTS=true \
|
146 |
-v /var/run/docker.sock:/var/run/docker.sock \
|
147 |
+
-v ~/.openhands:/.openhands \
|
148 |
-p 3000:3000 \
|
149 |
--add-host host.docker.internal:host-gateway \
|
150 |
--name openhands-app \
|
151 |
+
docker.all-hands.dev/all-hands-ai/openhands:0.48
|
152 |
```
|
153 |
|
154 |
### Local inference
|
|
|
162 |
- [`ollama`](https://github.com/ollama/ollama): See [here](#ollama)
|
163 |
|
164 |
|
165 |
+
#### vLLM (recommended)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
166 |
|
167 |
<details>
|
168 |
+
<summary>Expand</summary
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
169 |
|
170 |
We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
|
171 |
to implement production-ready inference pipelines.
|
172 |
|
173 |
**_Installation_**
|
174 |
|
175 |
+
Make sure you install [`vLLM >= 0.9.1`](https://github.com/vllm-project/vllm/releases/tag/v0.9.1):
|
176 |
|
177 |
```
|
178 |
pip install vllm --upgrade
|
179 |
```
|
180 |
|
181 |
+
Also make sure to have installed [`mistral_common >= 1.7.0`](https://github.com/mistralai/mistral-common/releases/tag/v1.7.0).
|
182 |
+
|
183 |
+
```
|
184 |
+
pip install mistral-common --upgrade
|
185 |
+
```
|
186 |
|
187 |
To check:
|
188 |
```
|
|
|
191 |
|
192 |
You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
|
193 |
|
194 |
+
**_Launch server_**
|
195 |
|
196 |
We recommand that you use Devstral in a server/client setting.
|
197 |
|
198 |
1. Spin up a server:
|
199 |
|
200 |
```
|
201 |
+
vllm serve mistralai/Devstral-Small-2507 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
|
202 |
```
|
203 |
|
204 |
|
|
|
213 |
url = "http://<your-server-url>:8000/v1/chat/completions"
|
214 |
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
|
215 |
|
216 |
+
model = "mistralai/Devstral-Small-2507"
|
217 |
|
218 |
def load_system_prompt(repo_id: str, filename: str) -> str:
|
219 |
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
|
|
|
238 |
|
239 |
data = {"model": model, "messages": messages, "temperature": 0.15}
|
240 |
|
241 |
+
# Devstral Small 1.1 supports tool calling. If you want to use tools, follow this:
|
242 |
+
# tools = [ # Define tools for vLLM
|
243 |
+
# {
|
244 |
+
# "type": "function",
|
245 |
+
# "function": {
|
246 |
+
# "name": "git_clone",
|
247 |
+
# "description": "Clone a git repository",
|
248 |
+
# "parameters": {
|
249 |
+
# "type": "object",
|
250 |
+
# "properties": {
|
251 |
+
# "url": {
|
252 |
+
# "type": "string",
|
253 |
+
# "description": "The url of the git repository",
|
254 |
+
# },
|
255 |
+
# },
|
256 |
+
# "required": ["url"],
|
257 |
+
# },
|
258 |
+
# },
|
259 |
+
# }
|
260 |
+
# ]
|
261 |
+
# data = {"model": model, "messages": messages, "temperature": 0.15, "tools": tools} # Pass tools to payload.
|
262 |
+
|
263 |
response = requests.post(url, headers=headers, data=json.dumps(data))
|
264 |
print(response.json()["choices"][0]["message"]["content"])
|
265 |
```
|
266 |
+
</details>
|
267 |
+
|
268 |
|
269 |
+
#### Mistral-inference
|
270 |
+
|
271 |
+
<details>
|
272 |
+
<summary>Expand</summary
|
273 |
|
274 |
We recommend using mistral-inference to quickly try out / "vibe-check" Devstral.
|
275 |
|
276 |
+
**_Installation_**
|
277 |
|
278 |
Make sure to have mistral_inference >= 1.6.0 installed.
|
279 |
|
|
|
281 |
pip install mistral_inference --upgrade
|
282 |
```
|
283 |
|
284 |
+
**_Download_**
|
285 |
|
286 |
```python
|
287 |
from huggingface_hub import snapshot_download
|
|
|
290 |
mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
|
291 |
mistral_models_path.mkdir(parents=True, exist_ok=True)
|
292 |
|
293 |
+
snapshot_download(repo_id="mistralai/Devstral-Small-2507", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
|
294 |
```
|
295 |
|
296 |
+
**_Chat_**
|
297 |
|
298 |
You can run the model using the following command:
|
299 |
|
|
|
303 |
|
304 |
You can then prompt it with anything you'd like.
|
305 |
|
306 |
+
</details>
|
307 |
+
|
308 |
|
309 |
+
#### Transformers
|
310 |
+
|
311 |
+
<details>
|
312 |
+
<summary>Expand</summary
|
313 |
+
|
314 |
+
To make the best use of our model with transformers make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.7.0` to use our tokenizer.
|
315 |
|
316 |
```bash
|
317 |
pip install mistral-common --upgrade
|
|
|
336 |
system_prompt = file.read()
|
337 |
return system_prompt
|
338 |
|
339 |
+
model_id = "mistralai/Devstral-Small-2507"
|
|
|
340 |
SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
|
341 |
|
|
|
342 |
|
343 |
+
tokenizer = MistralTokenizer.from_hf_hub(model_id)
|
344 |
model = AutoModelForCausalLM.from_pretrained(model_id)
|
345 |
|
346 |
tokenized = tokenizer.encode_chat_completion(
|
|
|
361 |
print(decoded_output)
|
362 |
```
|
363 |
|
364 |
+
</details>
|
365 |
+
|
366 |
+
|
367 |
+
#### LM Studio
|
368 |
+
|
369 |
+
<details>
|
370 |
+
<summary>Expand</summary
|
371 |
+
|
372 |
+
Download the weights from either:
|
373 |
+
- LM Studio GGUF repository (recommended): https://huggingface.co/lmstudio-community/Devstral-Small-2507-GGUF
|
374 |
+
- our GGUF repository: https://huggingface.co/mistralai/Devstral-Small-2507_gguf
|
375 |
|
376 |
```
|
377 |
pip install -U "huggingface_hub[cli]"
|
378 |
huggingface-cli download \
|
379 |
+
"lmstudio-community/Devstral-Small-2507-GGUF" \ # or mistralai/Devstral-Small-2507_gguf
|
380 |
+
--include "Devstral-Small-2507-Q4_K_M.gguf" \
|
381 |
+
--local-dir "Devstral-Small-2507_gguf/"
|
382 |
```
|
383 |
|
384 |
You can serve the model locally with [LMStudio](https://lmstudio.ai/).
|
385 |
* Download [LM Studio](https://lmstudio.ai/) and install it
|
386 |
* Install `lms cli ~/.lmstudio/bin/lms bootstrap`
|
387 |
+
* In a bash terminal, run `lms import Devstral-Small-2507-Q4_K_M.gguf` in the directory where you've downloaded the model checkpoint (e.g. `Devstral-Small-2507_gguf`)
|
388 |
+
* Open the LM Studio application, click the terminal icon to get into the developer tab. Click select a model to load and select `Devstral Small 2507`. Toggle the status button to start the model, in setting toggle Serve on Local Network to be on.
|
389 |
+
* On the right tab, you will see an API identifier which should be `devstral-small-2507` and an api address under API Usage. Keep note of this address, this is used for OpenHands or Cline.
|
390 |
|
391 |
+
</details>
|
|
|
392 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
393 |
|
394 |
+
#### llama.cpp
|
|
|
395 |
|
396 |
+
<details>
|
397 |
+
<summary>Expand</summary
|
398 |
|
399 |
Download the weights from huggingface:
|
400 |
|
401 |
```
|
402 |
pip install -U "huggingface_hub[cli]"
|
403 |
huggingface-cli download \
|
404 |
+
"mistralai/Devstral-Small-2507_gguf" \
|
405 |
+
--include "Devstral-Small-2507-Q4_K_M.gguf" \
|
406 |
+
--local-dir "mistralai/Devstral-Small-2507_gguf/"
|
407 |
```
|
408 |
|
409 |
+
Then run Devstral using the llama.cpp server.
|
410 |
|
411 |
```bash
|
412 |
+
./llama-server -m mistralai/Devstral-Small-2507_gguf/Devstral-Small-2507-Q4_K_M.gguf -c 0 # -c configure the context size, 0 means model's default, here 128k.
|
413 |
```
|
414 |
|
415 |
+
</details>
|
416 |
+
|
417 |
+
|
418 |
+
### OpenHands (recommended)
|
419 |
+
|
420 |
+
#### Launch a server to deploy Devstral Small 1.1
|
421 |
|
422 |
+
Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.1`.
|
423 |
|
424 |
+
In the case of the tutorial we spineed up a vLLM server running the command:
|
425 |
```bash
|
426 |
+
vllm serve mistralai/Devstral-Small-2507 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
|
427 |
```
|
428 |
|
429 |
+
The server address should be in the following format: `http://<your-server-url>:8000/v1`
|
430 |
+
|
431 |
+
#### Launch OpenHands
|
432 |
+
|
433 |
+
You can follow installation of OpenHands [here](https://docs.all-hands.dev/modules/usage/installation).
|
434 |
+
|
435 |
+
The easiest way to launch OpenHands is to use the Docker image:
|
436 |
+
```bash
|
437 |
+
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.48-nikolaik
|
438 |
+
|
439 |
+
docker run -it --rm --pull=always \
|
440 |
+
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.48-nikolaik \
|
441 |
+
-e LOG_ALL_EVENTS=true \
|
442 |
+
-v /var/run/docker.sock:/var/run/docker.sock \
|
443 |
+
-v ~/.openhands:/.openhands \
|
444 |
+
-p 3000:3000 \
|
445 |
+
--add-host host.docker.internal:host-gateway \
|
446 |
+
--name openhands-app \
|
447 |
+
docker.all-hands.dev/all-hands-ai/openhands:0.48
|
448 |
+
```
|
449 |
+
|
450 |
+
Then, you can access the OpenHands UI at `http://localhost:3000`.
|
451 |
+
|
452 |
+
#### Connect to the server
|
453 |
+
|
454 |
+
When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier.
|
455 |
+
|
456 |
+
Fill the following fields:
|
457 |
+
- **Custom Model**: `openai/mistralai/Devstral-Small-2507`
|
458 |
+
- **Base URL**: `http://<your-server-url>:8000/v1`
|
459 |
+
- **API Key**: `token` (or any other token you used to launch the server if any)
|
460 |
+
|
461 |
+
<details>
|
462 |
+
<summary>See settings</summary>
|
463 |
+
|
464 |
+

|
465 |
+
|
466 |
+
</details>
|
467 |
+
|
468 |
+
|
469 |
+
### Cline
|
470 |
+
|
471 |
+
#### Launch a server to deploy Devstral Small 1.1
|
472 |
+
|
473 |
+
Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.1`.
|
474 |
+
|
475 |
+
In the case of the tutorial we spineed up a vLLM server running the command:
|
476 |
+
```bash
|
477 |
+
vllm serve mistralai/Devstral-Small-2507 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
|
478 |
+
```
|
479 |
+
|
480 |
+
The server address should be in the following format: `http://<your-server-url>:8000/v1`
|
481 |
+
|
482 |
+
#### Launch Cline
|
483 |
+
|
484 |
+
You can follow installation of Cline [here](https://docs.cline.bot/getting-started/installing-cline). Then you can configure the server address in the settings.
|
485 |
+
|
486 |
+
<details>
|
487 |
+
<summary>See settings</summary>
|
488 |
+
|
489 |
+

|
490 |
+
|
491 |
+
</details>
|
492 |
+
|
493 |
|
494 |
See more here:
|
495 |
|
496 |
+
https://huggingface.co/mistralai/Devstral-Small-2507
|
497 |
|
498 |
---
|
499 |
|
|
|
501 |
|
502 |
---
|
503 |
|
504 |
+
<B>Brainstorm 20x</B>
|
505 |
|
506 |
The BRAINSTORM process was developed by David_AU.
|
507 |
|
|
|
514 |
|
515 |
The reasoning center of an LLM is taken apart, reassembled, and expanded.
|
516 |
|
517 |
+
In this case for this model: 20 times
|
518 |
|
519 |
Then these centers are individually calibrated. These "centers" also interact with each other.
|
520 |
This introduces subtle changes into the reasoning process.
|