--- tags: - text-generation - transformer license: apache-2.0 library_name: transformers pipeline_tag: text-generation --- # Saanvi-C0-12B šŸ¤–āš” ![License](https://img.shields.io/badge/License-Apache%202.0-blue) ![Python 3.8+](https://img.shields.io/badge/Python-3.8%2B-green) ![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Hub-yellow) **A next-generation 12B LLM optimized for speed, efficiency, and contextual accuracy.** _Powered by RAG-based enhancements ā€¢ 4-bit quantization ā€¢ Flash Attention 2 ā€¢ bfloat16 ā€¢ 128k context window_ --- ## šŸš€ Why Upgrade to Saanvi-C0-12B? Saanvi-C0-12B brings a **huge leap in capability** over smaller models, maintaining efficiency while significantly improving reasoning, fluency, and task completion and math! | Feature | Benefit | | --------------------- | --------------------------- | | āš” Flash Attention 2 | Up to **2.7Ɨ faster** inference | | šŸ§  4-bit Quantization | **Runs on 8GB VRAM** GPUs | | šŸŽÆ Instruction-Tuned | **Better task performance** | | šŸ”„ RAG-Enhanced | **More precise contextual retrieval** | | āž— Math-Expert | **Precise Mathematics knowledge** | ### šŸ–„ļø Optimized for Mid-Tier GPUs - **Runs on mid-range GPUs with 8GB+ VRAM** (RTX 3050, RTX 2060, etc.). - **More robust than our 3B model** with better contextual retention and instruction-following. - **4-bit quantization** minimizes VRAM usage without sacrificing quality. --- ## āš” Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_path = "riple-saanvi-lab/Saanvi-C0-12B" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="bfloat16", device_map="auto") while True: user_input = input("\nšŸ‘¤ You: ").strip() if user_input.lower() == "exit": break inputs = tokenizer(user_input, return_tensors="pt").to(model.device) output = model.generate(**inputs, max_length=2048, do_sample=True) print("šŸ¤– AI:", tokenizer.decode(output[0], skip_special_tokens=True)) ``` --- ## šŸ“¦ Installation ```bash pip install torch transformers ``` --- ## šŸ“Š Benchmarks **A100-40GB Performance** | Batch Size | Throughput | Latency | VRAM Usage | | ---------- | ----------- | ------- | ---------- | | 1 | 42 tok/sec | 85ms | 8.2GB | | 8 | 218 tok/sec | 430ms | 12.5GB | **šŸš€ On Mid-Tier GPUs (RTX 3050, RTX 2060, RTX 3060 12GB)** - **VRAM Usage**: ~8.2GB (single batch) - **Speed**: ~10-15 tok/sec - **Best Practices**: Stick to **smaller batch sizes** for best performance. --- ## šŸ“œ License Licensed under the [Apache 2.0 License](LICENSE). See the [LICENSE](LICENSE) file for details. šŸ’” **Pro Tip**: For **maximum efficiency**, use `torch.compile()` and CUDA graphs on high-end GPUs! ---