Add LM Studio Setup Guide - Fix EOS token issue with proper chat formatting
Browse files- LM_Studio_Setup_Guide.md +127 -0
LM_Studio_Setup_Guide.md
ADDED
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 🛠️ LM Studio Setup Guide for Sarvam-M 4-bit MLX
|
2 |
+
|
3 |
+
## 🔍 Problem Diagnosis
|
4 |
+
The "EOS token issue" where the model stops after a few words is caused by **incorrect prompt formatting**, not the EOS token itself.
|
5 |
+
|
6 |
+
## ✅ Solution: Proper Chat Format
|
7 |
+
|
8 |
+
### **Option 1: Use Chat Mode in LM Studio**
|
9 |
+
1. **Load the model** in LM Studio
|
10 |
+
2. **Switch to Chat mode** (not Playground mode)
|
11 |
+
3. **Set Chat Template** to "Custom" or "Mistral"
|
12 |
+
4. **Configure these settings:**
|
13 |
+
```
|
14 |
+
System Prompt: "You are a helpful assistant."
|
15 |
+
|
16 |
+
Chat Template Format:
|
17 |
+
<s>[SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST]
|
18 |
+
```
|
19 |
+
|
20 |
+
### **Option 2: Manual Prompt Format**
|
21 |
+
If using Playground mode, format your prompts like this:
|
22 |
+
|
23 |
+
**Simple Format:**
|
24 |
+
```
|
25 |
+
[INST] Your question here [/INST]
|
26 |
+
```
|
27 |
+
|
28 |
+
**With System Prompt:**
|
29 |
+
```
|
30 |
+
<s>[SYSTEM_PROMPT]You are a helpful assistant.[/SYSTEM_PROMPT][INST]Your question here[/INST]
|
31 |
+
```
|
32 |
+
|
33 |
+
### **Option 3: Thinking Mode (Advanced)**
|
34 |
+
For reasoning tasks, use:
|
35 |
+
```
|
36 |
+
<s>[SYSTEM_PROMPT]You are a helpful assistant. Think deeply before answering the user's question. Do the thinking inside <think>...</think> tags.[/SYSTEM_PROMPT][INST]Your question here[/INST]<think>
|
37 |
+
```
|
38 |
+
|
39 |
+
## 🎛️ Recommended LM Studio Settings
|
40 |
+
|
41 |
+
### **Generation Parameters:**
|
42 |
+
- **Max Tokens:** 512-1024
|
43 |
+
- **Temperature:** 0.7-0.8
|
44 |
+
- **Top P:** 0.9
|
45 |
+
- **Repetition Penalty:** 1.1
|
46 |
+
- **Context Length:** 4096
|
47 |
+
|
48 |
+
### **Stop Sequences:**
|
49 |
+
Add these stop sequences:
|
50 |
+
- `</s>`
|
51 |
+
- `[/INST]`
|
52 |
+
- `\n\nUser:`
|
53 |
+
- `\n\nHuman:`
|
54 |
+
|
55 |
+
### **MLX Settings:**
|
56 |
+
- ✅ Enable MLX acceleration
|
57 |
+
- ✅ Use GPU memory
|
58 |
+
- Set batch size to 1-4
|
59 |
+
|
60 |
+
## 🧪 Test Examples
|
61 |
+
|
62 |
+
### **Test 1: Basic Math**
|
63 |
+
```
|
64 |
+
Prompt: [INST] What is 2+2? Please explain your answer. [/INST]
|
65 |
+
Expected: The sum of 2 and 2 is **4**. [explanation follows]
|
66 |
+
```
|
67 |
+
|
68 |
+
### **Test 2: Reasoning**
|
69 |
+
```
|
70 |
+
Prompt: <s>[SYSTEM_PROMPT]Think before answering.[/SYSTEM_PROMPT][INST]Why is the sky blue?[/INST]<think>
|
71 |
+
Expected: <think>[reasoning]</think> The sky appears blue because...
|
72 |
+
```
|
73 |
+
|
74 |
+
### **Test 3: Hindi Language**
|
75 |
+
```
|
76 |
+
Prompt: [INST] भारत की राजधानी क्या है? [/INST]
|
77 |
+
Expected: भारत की राजधानी **नई दिल्ली** है...
|
78 |
+
```
|
79 |
+
|
80 |
+
## 🚨 Common Issues & Fixes
|
81 |
+
|
82 |
+
| Issue | Cause | Solution |
|
83 |
+
|-------|-------|----------|
|
84 |
+
| Empty responses | No chat template | Use `[INST]...[/INST]` format |
|
85 |
+
| Stops after few words | Wrong stop tokens | Remove `</s>` from stop sequences |
|
86 |
+
| Repeating text | Low repetition penalty | Increase to 1.1-1.2 |
|
87 |
+
| Slow responses | CPU inference | Enable MLX acceleration |
|
88 |
+
|
89 |
+
## 📝 Model Information
|
90 |
+
- **Format:** MLX 4-bit quantized
|
91 |
+
- **Languages:** English + 10 Indic languages
|
92 |
+
- **Context:** 4096 tokens
|
93 |
+
- **Based on:** Mistral Small architecture
|
94 |
+
- **Special Features:** Thinking mode, multi-language support
|
95 |
+
|
96 |
+
## 🔗 Working Example Commands
|
97 |
+
|
98 |
+
### **MLX-LM (Command Line):**
|
99 |
+
```bash
|
100 |
+
# Basic chat
|
101 |
+
python -m mlx_lm.generate --model Jimmi42/sarvam-m-4bit-mlx --prompt "[INST] Hello, how are you? [/INST]" --max-tokens 100
|
102 |
+
|
103 |
+
# With thinking
|
104 |
+
python -m mlx_lm.generate --model Jimmi42/sarvam-m-4bit-mlx --prompt "<s>[SYSTEM_PROMPT]Think deeply.[/SYSTEM_PROMPT][INST]Explain quantum physics[/INST]<think>" --max-tokens 200
|
105 |
+
```
|
106 |
+
|
107 |
+
### **Python Code:**
|
108 |
+
```python
|
109 |
+
from mlx_lm import load, generate
|
110 |
+
|
111 |
+
model, tokenizer = load('Jimmi42/sarvam-m-4bit-mlx')
|
112 |
+
|
113 |
+
# Format prompt correctly
|
114 |
+
messages = [{"role": "user", "content": "What is AI?"}]
|
115 |
+
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
116 |
+
|
117 |
+
response = generate(model, tokenizer, prompt, max_tokens=100)
|
118 |
+
print(response)
|
119 |
+
```
|
120 |
+
|
121 |
+
## ✅ Success Checklist
|
122 |
+
- [ ] Model loads in LM Studio with MLX enabled
|
123 |
+
- [ ] Chat template is set to Custom/Mistral format
|
124 |
+
- [ ] Test prompt: `[INST] Hello [/INST]` generates response
|
125 |
+
- [ ] Stop sequences configured correctly
|
126 |
+
- [ ] Generation parameters optimized
|
127 |
+
- [ ] Multi-language capability tested
|