# 🛠️ LM Studio Setup Guide for Sarvam-M 4-bit MLX ## 🔍 Problem Diagnosis The "EOS token issue" where the model stops after a few words is caused by **incorrect prompt formatting**, not the EOS token itself. ## ✅ Solution: Proper Chat Format ### **Option 1: Use Chat Mode in LM Studio** 1. **Load the model** in LM Studio 2. **Switch to Chat mode** (not Playground mode) 3. **Set Chat Template** to "Custom" or "Mistral" 4. **Configure these settings:** ``` System Prompt: "You are a helpful assistant." Chat Template Format: [SYSTEM_PROMPT]{system}[/SYSTEM_PROMPT][INST]{user}[/INST] ``` ### **Option 2: Manual Prompt Format** If using Playground mode, format your prompts like this: **Simple Format:** ``` [INST] Your question here [/INST] ``` **With System Prompt:** ``` [SYSTEM_PROMPT]You are a helpful assistant.[/SYSTEM_PROMPT][INST]Your question here[/INST] ``` ### **Option 3: Thinking Mode (Advanced)** For reasoning tasks, use: ``` [SYSTEM_PROMPT]You are a helpful assistant. Think deeply before answering the user's question. Do the thinking inside ... tags.[/SYSTEM_PROMPT][INST]Your question here[/INST] ``` ## 🎛️ Recommended LM Studio Settings ### **Generation Parameters:** - **Max Tokens:** 512-1024 - **Temperature:** 0.7-0.8 - **Top P:** 0.9 - **Repetition Penalty:** 1.1 - **Context Length:** 4096 ### **Stop Sequences:** Add these stop sequences: - `` - `[/INST]` - `\n\nUser:` - `\n\nHuman:` ### **MLX Settings:** - ✅ Enable MLX acceleration - ✅ Use GPU memory - Set batch size to 1-4 ## 🧪 Test Examples ### **Test 1: Basic Math** ``` Prompt: [INST] What is 2+2? Please explain your answer. [/INST] Expected: The sum of 2 and 2 is **4**. [explanation follows] ``` ### **Test 2: Reasoning** ``` Prompt: [SYSTEM_PROMPT]Think before answering.[/SYSTEM_PROMPT][INST]Why is the sky blue?[/INST] Expected: [reasoning] The sky appears blue because... ``` ### **Test 3: Hindi Language** ``` Prompt: [INST] भारत की राजधानी क्या है? [/INST] Expected: भारत की राजधानी **नई दिल्ली** है... ``` ## 🚨 Common Issues & Fixes | Issue | Cause | Solution | |-------|-------|----------| | Empty responses | No chat template | Use `[INST]...[/INST]` format | | Stops after few words | Wrong stop tokens | Remove `` from stop sequences | | Repeating text | Low repetition penalty | Increase to 1.1-1.2 | | Slow responses | CPU inference | Enable MLX acceleration | ## 📝 Model Information - **Format:** MLX 4-bit quantized - **Languages:** English + 10 Indic languages - **Context:** 4096 tokens - **Based on:** Mistral Small architecture - **Special Features:** Thinking mode, multi-language support ## 🔗 Working Example Commands ### **MLX-LM (Command Line):** ```bash # Basic chat python -m mlx_lm.generate --model Jimmi42/sarvam-m-4bit-mlx --prompt "[INST] Hello, how are you? [/INST]" --max-tokens 100 # With thinking python -m mlx_lm.generate --model Jimmi42/sarvam-m-4bit-mlx --prompt "[SYSTEM_PROMPT]Think deeply.[/SYSTEM_PROMPT][INST]Explain quantum physics[/INST]" --max-tokens 200 ``` ### **Python Code:** ```python from mlx_lm import load, generate model, tokenizer = load('Jimmi42/sarvam-m-4bit-mlx') # Format prompt correctly messages = [{"role": "user", "content": "What is AI?"}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) response = generate(model, tokenizer, prompt, max_tokens=100) print(response) ``` ## ✅ Success Checklist - [ ] Model loads in LM Studio with MLX enabled - [ ] Chat template is set to Custom/Mistral format - [ ] Test prompt: `[INST] Hello [/INST]` generates response - [ ] Stop sequences configured correctly - [ ] Generation parameters optimized - [ ] Multi-language capability tested