danielhanchen commited on
Commit
0ee9e18
·
verified ·
1 Parent(s): 1555035

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +547 -0
README.md ADDED
@@ -0,0 +1,547 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - de
6
+ - es
7
+ - pt
8
+ - it
9
+ - ja
10
+ - ko
11
+ - ru
12
+ - zh
13
+ - ar
14
+ - fa
15
+ - id
16
+ - ms
17
+ - ne
18
+ - pl
19
+ - ro
20
+ - sr
21
+ - sv
22
+ - tr
23
+ - uk
24
+ - vi
25
+ - hi
26
+ - bn
27
+ license: apache-2.0
28
+ library_name: vllm
29
+ inference: false
30
+ base_model:
31
+ - mistralai/Devstrall-Small-2505
32
+ extra_gated_description: >-
33
+ If you want to learn more about how we process your personal data, please read
34
+ our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
35
+ pipeline_tag: text2text-generation
36
+ ---
37
+
38
+ # Model Card for mistralai/Devstrall-Small-2505
39
+
40
+ Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this [benchmark](#benchmark-results).
41
+
42
+ It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503), therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed.
43
+
44
+ For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
45
+
46
+ Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
47
+
48
+
49
+ ## Key Features:
50
+ - **Agentic coding**: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.
51
+ - **lightweight**: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.
52
+ - **Apache 2.0 License**: Open license allowing usage and modification for both commercial and non-commercial purposes.
53
+ - **Context Window**: A 128k context window.
54
+ - **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size.
55
+
56
+
57
+
58
+ ## Benchmark Results
59
+
60
+ ### SWE-Bench
61
+
62
+ Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA by 6%.
63
+
64
+ | Model | Scaffold | SWE-Bench Verified (%) |
65
+ |------------------|--------------------|------------------------|
66
+ | Devstral | OpenHands Scaffold | **46.8** |
67
+ | GPT-4.1-mini | OpenAI Scaffold | 23.6 |
68
+ | Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
69
+ | SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 |
70
+
71
+
72
+ When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.
73
+
74
+ ![SWE Benchmark](assets/swe_bench.png)
75
+
76
+ ## Usage
77
+
78
+ We recommend to use Devstral with the [OpenHands](https://github.com/All-Hands-AI/OpenHands/tree/main) scaffold.
79
+ You can use it either through our API or by running locally.
80
+
81
+ ### API
82
+ Follow these [instructions](https://docs.mistral.ai/getting-started/quickstart/#account-setup) to create a Mistral account and get an API key.
83
+
84
+ Then run these commands to start the OpenHands docker container.
85
+ ```bash
86
+ export MISTRAL_API_KEY=<MY_KEY>
87
+
88
+ docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik
89
+
90
+ mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"mistral/devstral-small-2505","llm_api_key":"'$MISTRAL_API_KEY'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true}' > ~/.openhands-state/settings.json
91
+
92
+ docker run -it --rm --pull=always \
93
+ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \
94
+ -e LOG_ALL_EVENTS=true \
95
+ -v /var/run/docker.sock:/var/run/docker.sock \
96
+ -v ~/.openhands-state:/.openhands-state \
97
+ -p 3000:3000 \
98
+ --add-host host.docker.internal:host-gateway \
99
+ --name openhands-app \
100
+ docker.all-hands.dev/all-hands-ai/openhands:0.39
101
+ ```
102
+
103
+ ### Local inference
104
+
105
+ You can also run the model locally. It can be done with LMStudio or other providers listed below.
106
+
107
+ Launch Openhands
108
+ You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker
109
+
110
+ ```bash
111
+ docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
112
+ docker run -it --rm --pull=always \
113
+ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
114
+ -e LOG_ALL_EVENTS=true \
115
+ -v /var/run/docker.sock:/var/run/docker.sock \
116
+ -v ~/.openhands-state:/.openhands-state \
117
+ -p 3000:3000 \
118
+ --add-host host.docker.internal:host-gateway \
119
+ --name openhands-app \
120
+ docker.all-hands.dev/all-hands-ai/openhands:0.38
121
+ ```
122
+
123
+ The server will start at http://0.0.0.0:3000. Open it in your browser and you will see a tab AI Provider Configuration.
124
+ Now you can start a new conversation with the agent by clicking on the plus sign on the left bar.
125
+
126
+
127
+ The model can also be deployed with the following libraries:
128
+ - [`LMStudio (recommended for quantized model)`](https://lmstudio.ai/): See [here](#lmstudio)
129
+ - [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm)
130
+ - [`ollama`](https://github.com/ollama/ollama): See [here](#ollama)
131
+ - [`mistral-inference`](https://github.com/mistralai/mistral-inference): See [here](#mistral-inference)
132
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
133
+
134
+ ### OpenHands (recommended)
135
+
136
+ #### Launch a server to deploy Devstral-Small-2505
137
+
138
+ Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral-Small-2505`.
139
+
140
+ In the case of the tutorial we spineed up a vLLM server running the command:
141
+ ```bash
142
+ vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
143
+ ```
144
+
145
+ The server address should be in the following format: `http://<your-server-url>:8000/v1`
146
+
147
+ #### Launch OpenHands
148
+
149
+ You can follow installation of OpenHands [here](https://docs.all-hands.dev/modules/usage/installation).
150
+
151
+ The easiest way to launch OpenHands is to use the Docker image:
152
+ ```bash
153
+ docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
154
+
155
+ docker run -it --rm --pull=always \
156
+ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
157
+ -e LOG_ALL_EVENTS=true \
158
+ -v /var/run/docker.sock:/var/run/docker.sock \
159
+ -v ~/.openhands-state:/.openhands-state \
160
+ -p 3000:3000 \
161
+ --add-host host.docker.internal:host-gateway \
162
+ --name openhands-app \
163
+ docker.all-hands.dev/all-hands-ai/openhands:0.38
164
+ ```
165
+
166
+
167
+ Then, you can access the OpenHands UI at `http://localhost:3000`.
168
+
169
+ #### Connect to the server
170
+
171
+ When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier.
172
+
173
+ Fill the following fields:
174
+ - **Custom Model**: `openai/mistralai/Devstral-Small-2505`
175
+ - **Base URL**: `http://<your-server-url>:8000/v1`
176
+ - **API Key**: `token` (or any other token you used to launch the server if any)
177
+
178
+ #### Use OpenHands powered by Devstral
179
+
180
+ Now you're good to use Devstral Small inside OpenHands by **starting a new conversation**. Let's build a To-Do list app.
181
+
182
+ <details>
183
+ <summary>To-Do list app</summary
184
+
185
+ 1. Let's ask Devstral to generate the app with the following prompt:
186
+
187
+ ```txt
188
+ Build a To-Do list app with the following requirements:
189
+ - Built using FastAPI and React.
190
+ - Make it a one page app that:
191
+ - Allows to add a task.
192
+ - Allows to delete a task.
193
+ - Allows to mark a task as done.
194
+ - Displays the list of tasks.
195
+ - Store the tasks in a SQLite database.
196
+ ```
197
+
198
+ ![Agent prompting](assets/tuto_open_hands/agent_prompting.png)
199
+
200
+
201
+ 2. Let's see the result
202
+
203
+ You should see the agent construct the app and be able to explore the code it generated.
204
+
205
+ If it doesn't do it automatically, ask Devstral to deploy the app or do it manually, and then go the front URL deployment to see the app.
206
+
207
+ ![Agent working](assets/tuto_open_hands/agent_working.png)
208
+ ![App UI](assets/tuto_open_hands/app_ui.png)
209
+
210
+
211
+ 3. Iterate
212
+
213
+ Now that you have a first result you can iterate on it by asking your agent to improve it. For example, in the app generated we could click on a task to mark it checked but having a checkbox would improve UX. You could also ask it to add a feature to edit a task, or to add a feature to filter the tasks by status.
214
+
215
+ Enjoy building with Devstral Small and OpenHands!
216
+
217
+ </details>
218
+
219
+
220
+ ### LMStudio (recommended for quantized model)
221
+ Download the weights from huggingface:
222
+
223
+ ```
224
+ pip install -U "huggingface_hub[cli]"
225
+ huggingface-cli download \
226
+ "mistralai/Devstral-Small-2505_gguf" \
227
+ --include "devstralQ4_K_M.gguf" \
228
+ --local-dir "mistralai/Devstral-Small-2505_gguf/"
229
+ ```
230
+
231
+ You can serve the model locally with [LMStudio](https://lmstudio.ai/).
232
+ * Download [LM Studio](https://lmstudio.ai/) and install it
233
+ * Install `lms cli ~/.lmstudio/bin/lms bootstrap`
234
+ * In a bash terminal, run `lms import devstralQ4_K_M.ggu` in the directory where you've downloaded the model checkpoint (e.g. `mistralai/Devstral-Small-2505_gguf`)
235
+ * Open the LMStudio application, click the terminal icon to get into the developer tab. Click select a model to load and select Devstral Q4 K M. Toggle the status button to start the model, in setting oggle Serve on Local Network to be on.
236
+ * On the right tab, you will see an API identifier which should be devstralq4_k_m and an api address under API Usage. Keep note of this address, we will use it in the next step.
237
+
238
+ Launch Openhands
239
+ You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker
240
+
241
+ ```bash
242
+ docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
243
+ docker run -it --rm --pull=always \
244
+ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
245
+ -e LOG_ALL_EVENTS=true \
246
+ -v /var/run/docker.sock:/var/run/docker.sock \
247
+ -v ~/.openhands-state:/.openhands-state \
248
+ -p 3000:3000 \
249
+ --add-host host.docker.internal:host-gateway \
250
+ --name openhands-app \
251
+ docker.all-hands.dev/all-hands-ai/openhands:0.38
252
+ ```
253
+
254
+ Click “see advanced setting” on the second line.
255
+ In the new tab, toggle advanced to on. Set the custom model to be mistral/devstralq4_k_m and Base URL the api address we get from the last step in LM Studio. Set API Key to dummy. Click save changes.
256
+
257
+ ### vLLM (recommended)
258
+
259
+ We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
260
+ to implement production-ready inference pipelines.
261
+
262
+ **_Installation_**
263
+
264
+ Make sure you install [`vLLM >= 0.8.5`](https://github.com/vllm-project/vllm/releases/tag/v0.8.5):
265
+
266
+ ```
267
+ pip install vllm --upgrade
268
+ ```
269
+
270
+ Doing so should automatically install [`mistral_common >= 1.5.4`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.4).
271
+
272
+ To check:
273
+ ```
274
+ python -c "import mistral_common; print(mistral_common.__version__)"
275
+ ```
276
+
277
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
278
+
279
+ #### Server
280
+
281
+ We recommand that you use Devstral in a server/client setting.
282
+
283
+ 1. Spin up a server:
284
+
285
+ ```
286
+ vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
287
+ ```
288
+
289
+
290
+ 2. To ping the client you can use a simple Python snippet.
291
+
292
+ ```py
293
+ import requests
294
+ import json
295
+ from huggingface_hub import hf_hub_download
296
+
297
+
298
+ url = "http://<your-server-url>:8000/v1/chat/completions"
299
+ headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
300
+
301
+ model = "mistralai/Devstral-Small-2505"
302
+
303
+ def load_system_prompt(repo_id: str, filename: str) -> str:
304
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
305
+ with open(file_path, "r") as file:
306
+ system_prompt = file.read()
307
+ return system_prompt
308
+
309
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
310
+
311
+ messages = [
312
+ {"role": "system", "content": SYSTEM_PROMPT},
313
+ {
314
+ "role": "user",
315
+ "content": [
316
+ {
317
+ "type": "text",
318
+ "text": "Write a function that computes fibonacci in Python.",
319
+ },
320
+ ],
321
+ },
322
+ ]
323
+
324
+ data = {"model": model, "messages": messages, "temperature": 0.15}
325
+
326
+ response = requests.post(url, headers=headers, data=json.dumps(data))
327
+ print(response.json()["choices"][0]["message"]["content"])
328
+ ```
329
+
330
+ <details>
331
+ <summary>Output</summary>
332
+
333
+ Certainly! The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, usually starting with 0 and 1. Here's a simple Python function to compute the Fibonacci sequence:
334
+
335
+ ### Iterative Approach
336
+ This approach uses a loop to compute the Fibonacci number iteratively.
337
+
338
+ ```python
339
+ def fibonacci(n):
340
+ if n <= 0:
341
+ return "Input should be a positive integer."
342
+ elif n == 1:
343
+ return 0
344
+ elif n == 2:
345
+ return 1
346
+
347
+ a, b = 0, 1
348
+ for _ in range(2, n):
349
+ a, b = b, a + b
350
+ return b
351
+
352
+ # Example usage:
353
+ print(fibonacci(10)) # Output: 34
354
+ ```
355
+
356
+ ### Recursive Approach
357
+ This approach uses recursion to compute the Fibonacci number. Note that this is less efficient for large `n` due to repeated calculations.
358
+
359
+ ```python
360
+ def fibonacci_recursive(n):
361
+ if n <= 0:
362
+ return "Input should be a positive integer."
363
+ elif n == 1:
364
+ return 0
365
+ elif n == 2:
366
+ return 1
367
+ else:
368
+ return fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)
369
+
370
+ # Example usage:
371
+ print(fibonacci_recursive(10)) # Output: 34
372
+ ```
373
+
374
+ \### Memoization Approach
375
+ This approach uses memoization to store previously computed Fibonacci numbers, making it more efficient than the simple recursive approach.
376
+
377
+ ```python
378
+ def fibonacci_memo(n, memo={}):
379
+ if n <= 0:
380
+ return "Input should be a positive integer."
381
+ elif n == 1:
382
+ return 0
383
+ elif n == 2:
384
+ return 1
385
+ elif n in memo:
386
+ return memo[n]
387
+
388
+ memo[n] = fibonacci_memo(n - 1, memo) + fibonacci_memo(n - 2, memo)
389
+ return memo[n]
390
+
391
+ # Example usage:
392
+ print(fibonacci_memo(10)) # Output: 34
393
+ ```
394
+
395
+ \### Dynamic Programming Approach
396
+ This approach uses an array to store the Fibonacci numbers up to `n`.
397
+
398
+ ```python
399
+ def fibonacci_dp(n):
400
+ if n <= 0:
401
+ return "Input should be a positive integer."
402
+ elif n == 1:
403
+ return 0
404
+ elif n == 2:
405
+ return 1
406
+
407
+ fib = [0, 1] + [0] * (n - 2)
408
+ for i in range(2, n):
409
+ fib[i] = fib[i - 1] + fib[i - 2]
410
+ return fib[n - 1]
411
+
412
+ # Example usage:
413
+ print(fibonacci_dp(10)) # Output: 34
414
+ ```
415
+
416
+ You can choose any of these approaches based on your needs. The iterative and dynamic programming approaches are generally more efficient for larger values of `n`.
417
+
418
+ </details>
419
+
420
+
421
+ ### Mistral-inference
422
+
423
+ We recommend using mistral-inference to quickly try out / "vibe-check" Devstral.
424
+
425
+ #### Install
426
+
427
+ Make sure to have mistral_inference >= 1.6.0 installed.
428
+
429
+ ```bash
430
+ pip install mistral_inference --upgrade
431
+ ```
432
+
433
+ #### Download
434
+
435
+ ```python
436
+ from huggingface_hub import snapshot_download
437
+ from pathlib import Path
438
+
439
+ mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
440
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
441
+
442
+ snapshot_download(repo_id="mistralai/Devstral-Small-2505", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
443
+ ```
444
+
445
+ #### Python
446
+
447
+ You can run the model using the following command:
448
+
449
+ ```bash
450
+ mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300
451
+ ```
452
+
453
+ If you prompt it with "Write me a unique and efficient function that computes fibonacci in Python", the model should generate something along the following lines:
454
+
455
+ <details>
456
+ <summary>Output</summary>
457
+
458
+ Certainly! A common and efficient way to compute Fibonacci numbers is by using memoization to store previously computed values. This avoids redundant calculations and significantly improves performance. Below is a Python function that uses memoization to compute Fibonacci numbers efficiently:
459
+
460
+ ```python
461
+ def fibonacci(n, memo=None):
462
+ if memo is None:
463
+ memo = {}
464
+
465
+ if n in memo:
466
+ return memo[n]
467
+
468
+ if n <= 1:
469
+ return n
470
+
471
+ memo[n] = fibonacci(n - 1, memo) + fibonacci(n - 2, memo)
472
+ return memo[n]
473
+
474
+ # Example usage:
475
+ n = 10
476
+ print(f"Fibonacci number at position {n} is {fibonacci(n)}")
477
+ ```
478
+
479
+ ### Explanation:
480
+
481
+ 1. **Base Case**: If `n` is 0 or 1, the function returns `n` because the Fibonacci sequence starts with 0 and 1.
482
+ 2. **Memoization**: The function uses a dictionary `memo` to store the results of previously computed Fibonacci numbers.
483
+ 3. **Recursive Case**: For other values of `n`, the function recursively computes the Fibonacci number by summing the results of `fibonacci(n - 1)` and `fibonacci(n)`
484
+
485
+ </details>
486
+
487
+ ### Ollama
488
+
489
+ You can run Devstral using the [Ollama](https://ollama.ai/) CLI.
490
+
491
+ ```bash
492
+ ollama run devstral
493
+ ```
494
+
495
+ ### Transformers
496
+
497
+ To make the best use of our model with transformers make sure to have [installed](https://github.com/mistralai/mistral-common) ` mistral-common >= 1.5.5` to use our tokenizer.
498
+
499
+ ```bash
500
+ pip install mistral-common --upgrade
501
+ ```
502
+
503
+ Then load our tokenizer along with the model and generate:
504
+
505
+ ```python
506
+ import torch
507
+
508
+ from mistral_common.protocol.instruct.messages import (
509
+ SystemMessage, UserMessage
510
+ )
511
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
512
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
513
+ from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
514
+ from huggingface_hub import hf_hub_download
515
+ from transformers import AutoModelForCausalLM
516
+
517
+ def load_system_prompt(repo_id: str, filename: str) -> str:
518
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
519
+ with open(file_path, "r") as file:
520
+ system_prompt = file.read()
521
+ return system_prompt
522
+
523
+ model_id = "mistralai/Devstral-Small-2505"
524
+ tekken_file = hf_hub_download(repo_id=model_id, filename="tekken.json")
525
+ SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
526
+
527
+ tokenizer = MistralTokenizer.from_file(tekken_file)
528
+
529
+ model = AutoModelForCausalLM.from_pretrained(model_id)
530
+
531
+ tokenized = tokenizer.encode_chat_completion(
532
+ ChatCompletionRequest(
533
+ messages=[
534
+ SystemMessage(content=SYSTEM_PROMPT),
535
+ UserMessage(content="Write me a function that computes fibonacci in Python."),
536
+ ],
537
+ )
538
+ )
539
+
540
+ output = model.generate(
541
+ input_ids=torch.tensor([tokenized.tokens]),
542
+ max_new_tokens=1000,
543
+ )[0]
544
+
545
+ decoded_output = tokenizer.decode(output[len(tokenized.tokens):])
546
+ print(decoded_output)
547
+ ```