DavidAU commited on
Commit
506fc61
·
verified ·
1 Parent(s): b206640

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +208 -157
README.md CHANGED
@@ -1,9 +1,32 @@
1
  ---
2
  license: apache-2.0
3
  base_model:
4
- - mistralai/Devstral-Small-2505
5
  language:
6
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pipeline_tag: text-generation
8
  tags:
9
  - merge
@@ -21,14 +44,16 @@ tags:
21
  library_name: transformers
22
  ---
23
 
24
- <h2>Mistral-Devstral-2505-CODER-Brainstorm40x-44B</h2>
 
 
25
 
26
  This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats.
27
  The source code can also be used directly.
28
 
29
  This model contains Brainstorm 40x, combined with Mistral's 24B Coder (instruct model):
30
 
31
- https://huggingface.co/mistralai/Devstral-Small-2505
32
 
33
  Information on the 24B Mistral model below, followed by Brainstorm 40x adapter (by DavidAU) and then a complete help
34
  section for running LLM / AI models.
@@ -54,15 +79,20 @@ For simpler coding problems, lower quants will work well; but for complex/multi-
54
 
55
  ---
56
 
57
- # Devstral Small 1.0
58
 
59
- Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this [benchmark](#benchmark-results).
60
 
61
  It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503), therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed.
62
 
63
  For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
64
 
65
- Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
 
 
 
 
 
66
 
67
 
68
  ## Key Features:
@@ -73,29 +103,31 @@ Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
73
  - **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size.
74
 
75
 
76
-
77
  ## Benchmark Results
78
 
79
  ### SWE-Bench
80
 
81
- Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA by 6%.
82
 
83
- | Model | Scaffold | SWE-Bench Verified (%) |
84
- |------------------|--------------------|------------------------|
85
- | Devstral | OpenHands Scaffold | **46.8** |
86
- | GPT-4.1-mini | OpenAI Scaffold | 23.6 |
87
- | Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
88
- | SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 |
 
 
 
89
 
90
 
91
  When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.
92
 
93
- ![SWE Benchmark](assets/swe_bench.png)
94
 
95
  ## Usage
96
 
97
  We recommend to use Devstral with the [OpenHands](https://github.com/All-Hands-AI/OpenHands/tree/main) scaffold.
98
- You can use it either through our API or by running locally.
99
 
100
  ### API
101
  Follow these [instructions](https://docs.mistral.ai/getting-started/quickstart/#account-setup) to create a Mistral account and get an API key.
@@ -104,19 +136,19 @@ Then run these commands to start the OpenHands docker container.
104
  ```bash
105
  export MISTRAL_API_KEY=<MY_KEY>
106
 
107
- docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik
108
 
109
- mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"mistral/devstral-small-2505","llm_api_key":"'$MISTRAL_API_KEY'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true}' > ~/.openhands-state/settings.json
110
 
111
  docker run -it --rm --pull=always \
112
- -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \
113
  -e LOG_ALL_EVENTS=true \
114
  -v /var/run/docker.sock:/var/run/docker.sock \
115
- -v ~/.openhands-state:/.openhands-state \
116
  -p 3000:3000 \
117
  --add-host host.docker.internal:host-gateway \
118
  --name openhands-app \
119
- docker.all-hands.dev/all-hands-ai/openhands:0.39
120
  ```
121
 
122
  ### Local inference
@@ -130,106 +162,27 @@ The model can also be deployed with the following libraries:
130
  - [`ollama`](https://github.com/ollama/ollama): See [here](#ollama)
131
 
132
 
133
- ### OpenHands (recommended)
134
-
135
- #### Launch a server to deploy Devstral Small 1.0
136
-
137
- Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.0`.
138
-
139
- In the case of the tutorial we spineed up a vLLM server running the command:
140
- ```bash
141
- vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
142
- ```
143
-
144
- The server address should be in the following format: `http://<your-server-url>:8000/v1`
145
-
146
- #### Launch OpenHands
147
-
148
- You can follow installation of OpenHands [here](https://docs.all-hands.dev/modules/usage/installation).
149
-
150
- The easiest way to launch OpenHands is to use the Docker image:
151
- ```bash
152
- docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
153
-
154
- docker run -it --rm --pull=always \
155
- -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
156
- -e LOG_ALL_EVENTS=true \
157
- -v /var/run/docker.sock:/var/run/docker.sock \
158
- -v ~/.openhands-state:/.openhands-state \
159
- -p 3000:3000 \
160
- --add-host host.docker.internal:host-gateway \
161
- --name openhands-app \
162
- docker.all-hands.dev/all-hands-ai/openhands:0.38
163
- ```
164
-
165
-
166
- Then, you can access the OpenHands UI at `http://localhost:3000`.
167
-
168
- #### Connect to the server
169
-
170
- When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier.
171
-
172
- Fill the following fields:
173
- - **Custom Model**: `openai/mistralai/Devstral-Small-2505`
174
- - **Base URL**: `http://<your-server-url>:8000/v1`
175
- - **API Key**: `token` (or any other token you used to launch the server if any)
176
-
177
- #### Use OpenHands powered by Devstral
178
-
179
- Now you're good to use Devstral Small inside OpenHands by **starting a new conversation**. Let's build a To-Do list app.
180
 
181
  <details>
182
- <summary>To-Do list app</summary
183
-
184
- 1. Let's ask Devstral to generate the app with the following prompt:
185
-
186
- ```txt
187
- Build a To-Do list app with the following requirements:
188
- - Built using FastAPI and React.
189
- - Make it a one page app that:
190
- - Allows to add a task.
191
- - Allows to delete a task.
192
- - Allows to mark a task as done.
193
- - Displays the list of tasks.
194
- - Store the tasks in a SQLite database.
195
- ```
196
-
197
- ![Agent prompting](assets/tuto_open_hands/agent_prompting.png)
198
-
199
-
200
- 2. Let's see the result
201
-
202
- You should see the agent construct the app and be able to explore the code it generated.
203
-
204
- If it doesn't do it automatically, ask Devstral to deploy the app or do it manually, and then go the front URL deployment to see the app.
205
-
206
- ![Agent working](assets/tuto_open_hands/agent_working.png)
207
- ![App UI](assets/tuto_open_hands/app_ui.png)
208
-
209
-
210
- 3. Iterate
211
-
212
- Now that you have a first result you can iterate on it by asking your agent to improve it. For example, in the app generated we could click on a task to mark it checked but having a checkbox would improve UX. You could also ask it to add a feature to edit a task, or to add a feature to filter the tasks by status.
213
-
214
- Enjoy building with Devstral Small and OpenHands!
215
-
216
- </details>
217
-
218
-
219
- ### vLLM (recommended)
220
 
221
  We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
222
  to implement production-ready inference pipelines.
223
 
224
  **_Installation_**
225
 
226
- Make sure you install [`vLLM >= 0.8.5`](https://github.com/vllm-project/vllm/releases/tag/v0.8.5):
227
 
228
  ```
229
  pip install vllm --upgrade
230
  ```
231
 
232
- Doing so should automatically install [`mistral_common >= 1.5.5`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.5).
 
 
 
 
233
 
234
  To check:
235
  ```
@@ -238,14 +191,14 @@ python -c "import mistral_common; print(mistral_common.__version__)"
238
 
239
  You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
240
 
241
- #### Server
242
 
243
  We recommand that you use Devstral in a server/client setting.
244
 
245
  1. Spin up a server:
246
 
247
  ```
248
- vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
249
  ```
250
 
251
 
@@ -260,7 +213,7 @@ from huggingface_hub import hf_hub_download
260
  url = "http://<your-server-url>:8000/v1/chat/completions"
261
  headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
262
 
263
- model = "mistralai/Devstral-Small-2505"
264
 
265
  def load_system_prompt(repo_id: str, filename: str) -> str:
266
  file_path = hf_hub_download(repo_id=repo_id, filename=filename)
@@ -285,15 +238,42 @@ messages = [
285
 
286
  data = {"model": model, "messages": messages, "temperature": 0.15}
287
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
288
  response = requests.post(url, headers=headers, data=json.dumps(data))
289
  print(response.json()["choices"][0]["message"]["content"])
290
  ```
 
 
291
 
292
- ### Mistral-inference
 
 
 
293
 
294
  We recommend using mistral-inference to quickly try out / "vibe-check" Devstral.
295
 
296
- #### Install
297
 
298
  Make sure to have mistral_inference >= 1.6.0 installed.
299
 
@@ -301,7 +281,7 @@ Make sure to have mistral_inference >= 1.6.0 installed.
301
  pip install mistral_inference --upgrade
302
  ```
303
 
304
- #### Download
305
 
306
  ```python
307
  from huggingface_hub import snapshot_download
@@ -310,10 +290,10 @@ from pathlib import Path
310
  mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
311
  mistral_models_path.mkdir(parents=True, exist_ok=True)
312
 
313
- snapshot_download(repo_id="mistralai/Devstral-Small-2505", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
314
  ```
315
 
316
- #### Python
317
 
318
  You can run the model using the following command:
319
 
@@ -323,9 +303,15 @@ mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300
323
 
324
  You can then prompt it with anything you'd like.
325
 
326
- ### Transformers
 
327
 
328
- To make the best use of our model with transformers make sure to have [installed](https://github.com/mistralai/mistral-common) ` mistral-common >= 1.5.5` to use our tokenizer.
 
 
 
 
 
329
 
330
  ```bash
331
  pip install mistral-common --upgrade
@@ -350,12 +336,11 @@ def load_system_prompt(repo_id: str, filename: str) -> str:
350
  system_prompt = file.read()
351
  return system_prompt
352
 
353
- model_id = "mistralai/Devstral-Small-2505"
354
- tekken_file = hf_hub_download(repo_id=model_id, filename="tekken.json")
355
  SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
356
 
357
- tokenizer = MistralTokenizer.from_file(tekken_file)
358
 
 
359
  model = AutoModelForCausalLM.from_pretrained(model_id)
360
 
361
  tokenized = tokenizer.encode_chat_completion(
@@ -376,73 +361,139 @@ decoded_output = tokenizer.decode(output[len(tokenized.tokens):])
376
  print(decoded_output)
377
  ```
378
 
379
- ### LMStudio
380
- Download the weights from huggingface:
 
 
 
 
 
 
 
 
 
381
 
382
  ```
383
  pip install -U "huggingface_hub[cli]"
384
  huggingface-cli download \
385
- "mistralai/Devstral-Small-2505_gguf" \
386
- --include "devstralQ4_K_M.gguf" \
387
- --local-dir "mistralai/Devstral-Small-2505_gguf/"
388
  ```
389
 
390
  You can serve the model locally with [LMStudio](https://lmstudio.ai/).
391
  * Download [LM Studio](https://lmstudio.ai/) and install it
392
  * Install `lms cli ~/.lmstudio/bin/lms bootstrap`
393
- * In a bash terminal, run `lms import devstralQ4_K_M.gguf` in the directory where you've downloaded the model checkpoint (e.g. `mistralai/Devstral-Small-2505_gguf`)
394
- * Open the LMStudio application, click the terminal icon to get into the developer tab. Click select a model to load and select Devstral Q4 K M. Toggle the status button to start the model, in setting toggle Serve on Local Network to be on.
395
- * On the right tab, you will see an API identifier which should be devstralq4_k_m and an api address under API Usage. Keep note of this address, we will use it in the next step.
396
 
397
- Launch Openhands
398
- You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker
399
 
400
- ```bash
401
- docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
402
- docker run -it --rm --pull=always \
403
- -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
404
- -e LOG_ALL_EVENTS=true \
405
- -v /var/run/docker.sock:/var/run/docker.sock \
406
- -v ~/.openhands-state:/.openhands-state \
407
- -p 3000:3000 \
408
- --add-host host.docker.internal:host-gateway \
409
- --name openhands-app \
410
- docker.all-hands.dev/all-hands-ai/openhands:0.38
411
- ```
412
 
413
- Click “see advanced setting” on the second line.
414
- In the new tab, toggle advanced to on. Set the custom model to be mistral/devstralq4_k_m and Base URL the api address we get from the last step in LM Studio. Set API Key to dummy. Click save changes.
415
 
416
- ### llama.cpp
 
417
 
418
  Download the weights from huggingface:
419
 
420
  ```
421
  pip install -U "huggingface_hub[cli]"
422
  huggingface-cli download \
423
- "mistralai/Devstral-Small-2505_gguf" \
424
- --include "devstralQ4_K_M.gguf" \
425
- --local-dir "mistralai/Devstral-Small-2505_gguf/"
426
  ```
427
 
428
- Then run Devstral using the llama.cpp CLI.
429
 
430
  ```bash
431
- ./llama-cli -m Devstral-Small-2505_gguf/devstralQ4_K_M.gguf -cnv
432
  ```
433
 
434
- ### Ollama
 
 
 
 
 
435
 
436
- You can run Devstral using the [Ollama](https://ollama.ai/) CLI.
437
 
 
438
  ```bash
439
- ollama run devstral
440
  ```
441
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
442
 
443
  See more here:
444
 
445
- https://huggingface.co/mistralai/Devstral-Small-2505
446
 
447
  ---
448
 
@@ -450,7 +501,7 @@ https://huggingface.co/mistralai/Devstral-Small-2505
450
 
451
  ---
452
 
453
- <B>Brainstorm 40x</B>
454
 
455
  The BRAINSTORM process was developed by David_AU.
456
 
@@ -463,7 +514,7 @@ What is "Brainstorm" ?
463
 
464
  The reasoning center of an LLM is taken apart, reassembled, and expanded.
465
 
466
- In this case for this model: 40 times
467
 
468
  Then these centers are individually calibrated. These "centers" also interact with each other.
469
  This introduces subtle changes into the reasoning process.
 
1
  ---
2
  license: apache-2.0
3
  base_model:
4
+ - mistralai/Devstral-Small-2507
5
  language:
6
  - en
7
+ - fr
8
+ - de
9
+ - es
10
+ - pt
11
+ - it
12
+ - ja
13
+ - ko
14
+ - ru
15
+ - zh
16
+ - ar
17
+ - fa
18
+ - id
19
+ - ms
20
+ - ne
21
+ - pl
22
+ - ro
23
+ - sr
24
+ - sv
25
+ - tr
26
+ - uk
27
+ - vi
28
+ - hi
29
+ - bn
30
  pipeline_tag: text-generation
31
  tags:
32
  - merge
 
44
  library_name: transformers
45
  ---
46
 
47
+ (uploading...)
48
+
49
+ <h2>Mistral-Devstral-2507-CODER-Brainstorm20x-34B</h2>
50
 
51
  This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats.
52
  The source code can also be used directly.
53
 
54
  This model contains Brainstorm 40x, combined with Mistral's 24B Coder (instruct model):
55
 
56
+ https://huggingface.co/mistralai/Devstral-Small-2507
57
 
58
  Information on the 24B Mistral model below, followed by Brainstorm 40x adapter (by DavidAU) and then a complete help
59
  section for running LLM / AI models.
 
79
 
80
  ---
81
 
82
+ # Devstral Small 1.1
83
 
84
+ Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this [benchmark](#benchmark-results).
85
 
86
  It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503), therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed.
87
 
88
  For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
89
 
90
+ Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral-2507).
91
+
92
+ **Updates compared to [`Devstral Small 1.0`](https://huggingface.co/mistralai/Devstral-Small-2505):**
93
+ - Improved performance, please refer to the [benchmark results](#benchmark-results).
94
+ - `Devstral Small 1.1` is still great when paired with OpenHands. This new version also generalizes better to other prompts and coding environments.
95
+ - Supports [Mistral's function calling format](https://mistralai.github.io/mistral-common/usage/tools/).
96
 
97
 
98
  ## Key Features:
 
103
  - **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size.
104
 
105
 
 
106
  ## Benchmark Results
107
 
108
  ### SWE-Bench
109
 
110
+ Devstral Small 1.1 achieves a score of **53.6%** on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%.
111
 
112
+ | Model | Agentic Scaffold | SWE-Bench Verified (%) |
113
+ |--------------------|--------------------|------------------------|
114
+ | Devstral Small 1.1 | OpenHands Scaffold | **53.6** |
115
+ | Devstral Small 1.0 | OpenHands Scaffold | *46.8* |
116
+ | GPT-4.1-mini | OpenAI Scaffold | 23.6 |
117
+ | Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
118
+ | SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 |
119
+ | Skywork SWE | OpenHands Scaffold | 38.0 |
120
+ | DeepSWE | R2E-Gym Scaffold | 42.2 |
121
 
122
 
123
  When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.
124
 
125
+ ![SWE Benchmark](assets/swe_benchmark.png)
126
 
127
  ## Usage
128
 
129
  We recommend to use Devstral with the [OpenHands](https://github.com/All-Hands-AI/OpenHands/tree/main) scaffold.
130
+ You can use it either through our API or by running locally.
131
 
132
  ### API
133
  Follow these [instructions](https://docs.mistral.ai/getting-started/quickstart/#account-setup) to create a Mistral account and get an API key.
 
136
  ```bash
137
  export MISTRAL_API_KEY=<MY_KEY>
138
 
139
+ mkdir -p ~/.openhands && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"mistral/devstral-small-2507","llm_api_key":"'$MISTRAL_API_KEY'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true}' > ~/.openhands-state/settings.json
140
 
141
+ docker pull docker.all-hands.dev/all-hands-ai/runtime:0.48-nikolaik
142
 
143
  docker run -it --rm --pull=always \
144
+ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.48-nikolaik \
145
  -e LOG_ALL_EVENTS=true \
146
  -v /var/run/docker.sock:/var/run/docker.sock \
147
+ -v ~/.openhands:/.openhands \
148
  -p 3000:3000 \
149
  --add-host host.docker.internal:host-gateway \
150
  --name openhands-app \
151
+ docker.all-hands.dev/all-hands-ai/openhands:0.48
152
  ```
153
 
154
  ### Local inference
 
162
  - [`ollama`](https://github.com/ollama/ollama): See [here](#ollama)
163
 
164
 
165
+ #### vLLM (recommended)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
 
167
  <details>
168
+ <summary>Expand</summary
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
 
170
  We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
171
  to implement production-ready inference pipelines.
172
 
173
  **_Installation_**
174
 
175
+ Make sure you install [`vLLM >= 0.9.1`](https://github.com/vllm-project/vllm/releases/tag/v0.9.1):
176
 
177
  ```
178
  pip install vllm --upgrade
179
  ```
180
 
181
+ Also make sure to have installed [`mistral_common >= 1.7.0`](https://github.com/mistralai/mistral-common/releases/tag/v1.7.0).
182
+
183
+ ```
184
+ pip install mistral-common --upgrade
185
+ ```
186
 
187
  To check:
188
  ```
 
191
 
192
  You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
193
 
194
+ **_Launch server_**
195
 
196
  We recommand that you use Devstral in a server/client setting.
197
 
198
  1. Spin up a server:
199
 
200
  ```
201
+ vllm serve mistralai/Devstral-Small-2507 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
202
  ```
203
 
204
 
 
213
  url = "http://<your-server-url>:8000/v1/chat/completions"
214
  headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
215
 
216
+ model = "mistralai/Devstral-Small-2507"
217
 
218
  def load_system_prompt(repo_id: str, filename: str) -> str:
219
  file_path = hf_hub_download(repo_id=repo_id, filename=filename)
 
238
 
239
  data = {"model": model, "messages": messages, "temperature": 0.15}
240
 
241
+ # Devstral Small 1.1 supports tool calling. If you want to use tools, follow this:
242
+ # tools = [ # Define tools for vLLM
243
+ # {
244
+ # "type": "function",
245
+ # "function": {
246
+ # "name": "git_clone",
247
+ # "description": "Clone a git repository",
248
+ # "parameters": {
249
+ # "type": "object",
250
+ # "properties": {
251
+ # "url": {
252
+ # "type": "string",
253
+ # "description": "The url of the git repository",
254
+ # },
255
+ # },
256
+ # "required": ["url"],
257
+ # },
258
+ # },
259
+ # }
260
+ # ]
261
+ # data = {"model": model, "messages": messages, "temperature": 0.15, "tools": tools} # Pass tools to payload.
262
+
263
  response = requests.post(url, headers=headers, data=json.dumps(data))
264
  print(response.json()["choices"][0]["message"]["content"])
265
  ```
266
+ </details>
267
+
268
 
269
+ #### Mistral-inference
270
+
271
+ <details>
272
+ <summary>Expand</summary
273
 
274
  We recommend using mistral-inference to quickly try out / "vibe-check" Devstral.
275
 
276
+ **_Installation_**
277
 
278
  Make sure to have mistral_inference >= 1.6.0 installed.
279
 
 
281
  pip install mistral_inference --upgrade
282
  ```
283
 
284
+ **_Download_**
285
 
286
  ```python
287
  from huggingface_hub import snapshot_download
 
290
  mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
291
  mistral_models_path.mkdir(parents=True, exist_ok=True)
292
 
293
+ snapshot_download(repo_id="mistralai/Devstral-Small-2507", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
294
  ```
295
 
296
+ **_Chat_**
297
 
298
  You can run the model using the following command:
299
 
 
303
 
304
  You can then prompt it with anything you'd like.
305
 
306
+ </details>
307
+
308
 
309
+ #### Transformers
310
+
311
+ <details>
312
+ <summary>Expand</summary
313
+
314
+ To make the best use of our model with transformers make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.7.0` to use our tokenizer.
315
 
316
  ```bash
317
  pip install mistral-common --upgrade
 
336
  system_prompt = file.read()
337
  return system_prompt
338
 
339
+ model_id = "mistralai/Devstral-Small-2507"
 
340
  SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
341
 
 
342
 
343
+ tokenizer = MistralTokenizer.from_hf_hub(model_id)
344
  model = AutoModelForCausalLM.from_pretrained(model_id)
345
 
346
  tokenized = tokenizer.encode_chat_completion(
 
361
  print(decoded_output)
362
  ```
363
 
364
+ </details>
365
+
366
+
367
+ #### LM Studio
368
+
369
+ <details>
370
+ <summary>Expand</summary
371
+
372
+ Download the weights from either:
373
+ - LM Studio GGUF repository (recommended): https://huggingface.co/lmstudio-community/Devstral-Small-2507-GGUF
374
+ - our GGUF repository: https://huggingface.co/mistralai/Devstral-Small-2507_gguf
375
 
376
  ```
377
  pip install -U "huggingface_hub[cli]"
378
  huggingface-cli download \
379
+ "lmstudio-community/Devstral-Small-2507-GGUF" \ # or mistralai/Devstral-Small-2507_gguf
380
+ --include "Devstral-Small-2507-Q4_K_M.gguf" \
381
+ --local-dir "Devstral-Small-2507_gguf/"
382
  ```
383
 
384
  You can serve the model locally with [LMStudio](https://lmstudio.ai/).
385
  * Download [LM Studio](https://lmstudio.ai/) and install it
386
  * Install `lms cli ~/.lmstudio/bin/lms bootstrap`
387
+ * In a bash terminal, run `lms import Devstral-Small-2507-Q4_K_M.gguf` in the directory where you've downloaded the model checkpoint (e.g. `Devstral-Small-2507_gguf`)
388
+ * Open the LM Studio application, click the terminal icon to get into the developer tab. Click select a model to load and select `Devstral Small 2507`. Toggle the status button to start the model, in setting toggle Serve on Local Network to be on.
389
+ * On the right tab, you will see an API identifier which should be `devstral-small-2507` and an api address under API Usage. Keep note of this address, this is used for OpenHands or Cline.
390
 
391
+ </details>
 
392
 
 
 
 
 
 
 
 
 
 
 
 
 
393
 
394
+ #### llama.cpp
 
395
 
396
+ <details>
397
+ <summary>Expand</summary
398
 
399
  Download the weights from huggingface:
400
 
401
  ```
402
  pip install -U "huggingface_hub[cli]"
403
  huggingface-cli download \
404
+ "mistralai/Devstral-Small-2507_gguf" \
405
+ --include "Devstral-Small-2507-Q4_K_M.gguf" \
406
+ --local-dir "mistralai/Devstral-Small-2507_gguf/"
407
  ```
408
 
409
+ Then run Devstral using the llama.cpp server.
410
 
411
  ```bash
412
+ ./llama-server -m mistralai/Devstral-Small-2507_gguf/Devstral-Small-2507-Q4_K_M.gguf -c 0 # -c configure the context size, 0 means model's default, here 128k.
413
  ```
414
 
415
+ </details>
416
+
417
+
418
+ ### OpenHands (recommended)
419
+
420
+ #### Launch a server to deploy Devstral Small 1.1
421
 
422
+ Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.1`.
423
 
424
+ In the case of the tutorial we spineed up a vLLM server running the command:
425
  ```bash
426
+ vllm serve mistralai/Devstral-Small-2507 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
427
  ```
428
 
429
+ The server address should be in the following format: `http://<your-server-url>:8000/v1`
430
+
431
+ #### Launch OpenHands
432
+
433
+ You can follow installation of OpenHands [here](https://docs.all-hands.dev/modules/usage/installation).
434
+
435
+ The easiest way to launch OpenHands is to use the Docker image:
436
+ ```bash
437
+ docker pull docker.all-hands.dev/all-hands-ai/runtime:0.48-nikolaik
438
+
439
+ docker run -it --rm --pull=always \
440
+ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.48-nikolaik \
441
+ -e LOG_ALL_EVENTS=true \
442
+ -v /var/run/docker.sock:/var/run/docker.sock \
443
+ -v ~/.openhands:/.openhands \
444
+ -p 3000:3000 \
445
+ --add-host host.docker.internal:host-gateway \
446
+ --name openhands-app \
447
+ docker.all-hands.dev/all-hands-ai/openhands:0.48
448
+ ```
449
+
450
+ Then, you can access the OpenHands UI at `http://localhost:3000`.
451
+
452
+ #### Connect to the server
453
+
454
+ When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier.
455
+
456
+ Fill the following fields:
457
+ - **Custom Model**: `openai/mistralai/Devstral-Small-2507`
458
+ - **Base URL**: `http://<your-server-url>:8000/v1`
459
+ - **API Key**: `token` (or any other token you used to launch the server if any)
460
+
461
+ <details>
462
+ <summary>See settings</summary>
463
+
464
+ ![OpenHands Settings](assets/open_hands_config.png)
465
+
466
+ </details>
467
+
468
+
469
+ ### Cline
470
+
471
+ #### Launch a server to deploy Devstral Small 1.1
472
+
473
+ Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.1`.
474
+
475
+ In the case of the tutorial we spineed up a vLLM server running the command:
476
+ ```bash
477
+ vllm serve mistralai/Devstral-Small-2507 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
478
+ ```
479
+
480
+ The server address should be in the following format: `http://<your-server-url>:8000/v1`
481
+
482
+ #### Launch Cline
483
+
484
+ You can follow installation of Cline [here](https://docs.cline.bot/getting-started/installing-cline). Then you can configure the server address in the settings.
485
+
486
+ <details>
487
+ <summary>See settings</summary>
488
+
489
+ ![Cline Settings](assets/cline_config.png)
490
+
491
+ </details>
492
+
493
 
494
  See more here:
495
 
496
+ https://huggingface.co/mistralai/Devstral-Small-2507
497
 
498
  ---
499
 
 
501
 
502
  ---
503
 
504
+ <B>Brainstorm 20x</B>
505
 
506
  The BRAINSTORM process was developed by David_AU.
507
 
 
514
 
515
  The reasoning center of an LLM is taken apart, reassembled, and expanded.
516
 
517
+ In this case for this model: 20 times
518
 
519
  Then these centers are individually calibrated. These "centers" also interact with each other.
520
  This introduces subtle changes into the reasoning process.