--- # 🪐 Circe-1.5B license: mit library_name: transformers pipeline_tag: text-generation tags: - bilingual - lora - rl - cost-efficient - tiny-models language: - en - es ---

Circe-1.5B schematic

**Circe-1.5B** is a single-checkpoint, 1.5 B-parameter language model that asks a simple question: > _“How far can you push tiny models on a tiny budget?”_ | ⚙️ Spec | Value | |---------|-------| | Base model | `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` | | Trainable params | 4 M (LoRA) | | Post-training cost | **≈ US $12** on 1×L40S | | Training recipe | 8 h SFT → 4 h GRPO | | Context length | up to **4 k tokens** (tested) | | RAM @ bf16 | ~9 GB (≤ 3 GB 4-bit GPTQ) | | Throughput | ~55 tok / s on 1×A6000 (fp16, no compile) | It keeps DeepSeek-R1’s strong reasoning depth but adds **fluent bilingual chat** (English & Spanish) in a checkpoint that fits on a laptop GPU. We intend to use it as a reproducible waypoint on the road to real-time speech-to-speech reasoning systems. --- # 🔭 Intended Use * **Base for new LoRAs** — domain adaptation, longer-context studies. * **Research** into cost-efficient RL for reasoning. * **Not** for high-stakes or production tasks. See the [⚙️ Limitations](#️-limitations--bias) section before use. --- # ⚡ Quickstart ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe-1.5B", torch_dtype="bfloat16") tok = AutoTokenizer.from_pretrained("PaletLabs/Circe-1.5B") prompt = "<|user|>¿Cómo se dice “tiny model” en español?<|assistant|>" out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64) print(tok.decode(out[0], skip_special_tokens=True)) ``` --- # 🛠️ Installation ```bash git clone https://github.com/palet-global/circe cd circe python -m venv venv && source venv/bin/activate pip install . ``` ## 🏗️ Re-Training Pipeline ### Data ```bash python data/fetch_datasets.py --out data/processed ``` ### Supervised LoRA ```bash accelerate config default # one-time accelerate launch train/sft.py \ --data_dir data/processed \ --output_dir checkpoints/sft ``` ### RL (GRPO) ```bash accelerate launch train/rl_grpo.py \ --data_dir data/processed \ --output_dir checkpoints/grpo \ --init_ckpt checkpoints/sft/checkpoint-13000 \ --num_steps 3000 --save_steps 500 --group 4 ``` ### Merge and Tokenizer ```bash python train/merge_lora.py \ --ckpt_dir checkpoints/grpo \ --base deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B ``` ### SQuAD Sanity Checks ```bash python eval/quick_squad_eval.py --model ./merged --dataset squad python eval/quick_squad_eval.py --model ./merged --dataset squad_es ``` ### Upload ```bash python train/upload_to_hub.py \ --model_dir merged \ --repo PaletLabs/Circe-1.5B \ --token $HF_TOKEN ``` --- # 💻 Hardware & Inference Tips - **bf16 / fp16**: Needs ~9 GB VRAM. - **4-bit GPTQ**: < 3 GB. `bitsandbytes` works out-of-the-box. - Compile once (`torch.compile`) for **+10–15 %** throughput. --- # ✍️ Current Evaluation Status Formal **lighteval / MMLU / GSM-8K** runs are queued. Preliminary spot-checks show Circe retains DeepSeek-R1’s chain-of-thought depth on reasoning-heavy QA while adding smooth bilingual generation. --- ## ⚙️ Limitations & Bias - No reward-model alignment. - Long-context (> 4 k) stability untested. - Training data bias from public QA pairs. Spanish coverage favors Latin American variants. - Minimal safety filters so **you** have to wrap with your own guardrails for production. --- # 🔮 Roadmap - Publish full reasoning benchmark suite & eval scripts. - Release code-reasoning and doc-QA adapters. - Attach a **24 kHz neural codec** → real-time, full-duplex voice chat without ASR → TTS hops. --- # 🪪 License This project is licensed under the [MIT](https://opensource.org/licenses/MIT) License. Attribution appreciated but not required.