Jimmi42 commited on
Commit
f6c7c50
·
verified ·
1 Parent(s): 0fd7176

Add LM Studio Setup Guide - Fix EOS token issue with proper chat formatting

Browse files
Files changed (1) hide show
  1. LM_Studio_Setup_Guide.md +127 -0
LM_Studio_Setup_Guide.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🛠️ LM Studio Setup Guide for Sarvam-M 4-bit MLX
2
+
3
+ ## 🔍 Problem Diagnosis
4
+ The "EOS token issue" where the model stops after a few words is caused by **incorrect prompt formatting**, not the EOS token itself.
5
+
6
+ ## ✅ Solution: Proper Chat Format
7
+
8
+ ### **Option 1: Use Chat Mode in LM Studio**
9
+ 1. **Load the model** in LM Studio
10
+ 2. **Switch to Chat mode** (not Playground mode)
11
+ 3. **Set Chat Template** to "Custom" or "Mistral"
12
+ 4. **Configure these settings:**
13
+ ```
14
+ System Prompt: "You are a helpful assistant."
15
+
16
+ Chat Template Format:
17
+ <s>[SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST]
18
+ ```
19
+
20
+ ### **Option 2: Manual Prompt Format**
21
+ If using Playground mode, format your prompts like this:
22
+
23
+ **Simple Format:**
24
+ ```
25
+ [INST] Your question here [/INST]
26
+ ```
27
+
28
+ **With System Prompt:**
29
+ ```
30
+ <s>[SYSTEM_PROMPT]You are a helpful assistant.[/SYSTEM_PROMPT][INST]Your question here[/INST]
31
+ ```
32
+
33
+ ### **Option 3: Thinking Mode (Advanced)**
34
+ For reasoning tasks, use:
35
+ ```
36
+ <s>[SYSTEM_PROMPT]You are a helpful assistant. Think deeply before answering the user's question. Do the thinking inside <think>...</think> tags.[/SYSTEM_PROMPT][INST]Your question here[/INST]<think>
37
+ ```
38
+
39
+ ## 🎛️ Recommended LM Studio Settings
40
+
41
+ ### **Generation Parameters:**
42
+ - **Max Tokens:** 512-1024
43
+ - **Temperature:** 0.7-0.8
44
+ - **Top P:** 0.9
45
+ - **Repetition Penalty:** 1.1
46
+ - **Context Length:** 4096
47
+
48
+ ### **Stop Sequences:**
49
+ Add these stop sequences:
50
+ - `</s>`
51
+ - `[/INST]`
52
+ - `\n\nUser:`
53
+ - `\n\nHuman:`
54
+
55
+ ### **MLX Settings:**
56
+ - ✅ Enable MLX acceleration
57
+ - ✅ Use GPU memory
58
+ - Set batch size to 1-4
59
+
60
+ ## 🧪 Test Examples
61
+
62
+ ### **Test 1: Basic Math**
63
+ ```
64
+ Prompt: [INST] What is 2+2? Please explain your answer. [/INST]
65
+ Expected: The sum of 2 and 2 is **4**. [explanation follows]
66
+ ```
67
+
68
+ ### **Test 2: Reasoning**
69
+ ```
70
+ Prompt: <s>[SYSTEM_PROMPT]Think before answering.[/SYSTEM_PROMPT][INST]Why is the sky blue?[/INST]<think>
71
+ Expected: <think>[reasoning]</think> The sky appears blue because...
72
+ ```
73
+
74
+ ### **Test 3: Hindi Language**
75
+ ```
76
+ Prompt: [INST] भारत की राजधानी क्या है? [/INST]
77
+ Expected: भारत की राजधानी **नई दिल्ली** है...
78
+ ```
79
+
80
+ ## 🚨 Common Issues & Fixes
81
+
82
+ | Issue | Cause | Solution |
83
+ |-------|-------|----------|
84
+ | Empty responses | No chat template | Use `[INST]...[/INST]` format |
85
+ | Stops after few words | Wrong stop tokens | Remove `</s>` from stop sequences |
86
+ | Repeating text | Low repetition penalty | Increase to 1.1-1.2 |
87
+ | Slow responses | CPU inference | Enable MLX acceleration |
88
+
89
+ ## 📝 Model Information
90
+ - **Format:** MLX 4-bit quantized
91
+ - **Languages:** English + 10 Indic languages
92
+ - **Context:** 4096 tokens
93
+ - **Based on:** Mistral Small architecture
94
+ - **Special Features:** Thinking mode, multi-language support
95
+
96
+ ## 🔗 Working Example Commands
97
+
98
+ ### **MLX-LM (Command Line):**
99
+ ```bash
100
+ # Basic chat
101
+ python -m mlx_lm.generate --model Jimmi42/sarvam-m-4bit-mlx --prompt "[INST] Hello, how are you? [/INST]" --max-tokens 100
102
+
103
+ # With thinking
104
+ python -m mlx_lm.generate --model Jimmi42/sarvam-m-4bit-mlx --prompt "<s>[SYSTEM_PROMPT]Think deeply.[/SYSTEM_PROMPT][INST]Explain quantum physics[/INST]<think>" --max-tokens 200
105
+ ```
106
+
107
+ ### **Python Code:**
108
+ ```python
109
+ from mlx_lm import load, generate
110
+
111
+ model, tokenizer = load('Jimmi42/sarvam-m-4bit-mlx')
112
+
113
+ # Format prompt correctly
114
+ messages = [{"role": "user", "content": "What is AI?"}]
115
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
116
+
117
+ response = generate(model, tokenizer, prompt, max_tokens=100)
118
+ print(response)
119
+ ```
120
+
121
+ ## ✅ Success Checklist
122
+ - [ ] Model loads in LM Studio with MLX enabled
123
+ - [ ] Chat template is set to Custom/Mistral format
124
+ - [ ] Test prompt: `[INST] Hello [/INST]` generates response
125
+ - [ ] Stop sequences configured correctly
126
+ - [ ] Generation parameters optimized
127
+ - [ ] Multi-language capability tested