Discussion Regarding the model (Important)

#10
by UJJAWAL-TYAGI - opened

You guys claim that your base model is Mistral 3.1 24B params, but Sarvam-M is 23.6B. Did you prune parameters? Or was this reduction from fine-tuning. Or you fine-tune with LoRA ? Or you guys removed adapters, attention heads? Or are you just using system prompts + RAG for language adaptation?

Sign up or log in to comment