File size: 3,288 Bytes

---
license: apache-2.0
datasets:
- sequelbox/Celestia3-DeepSeek-R1-0528
base_model:
- HuggingFaceTB/SmolLM2-360M-Instruct
library_name: transformers
language:
- en
pipeline_tag: text-generation
tags:
- trl
- text-generation-inference
- r1
- re-think
---

![Add a heading.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/HWLZRJqFt1tOH8IjOyDHf.png)

# **SmolLM2-Rethink-360M**

> **SmolLM2-Rethink-360M** is an experimental lightweight reasoning model trained on the **Celestia3-DeepSeek-R1-0528** dataset. Built on top of the **SmolLM2-135M-Instruct** architecture and scaled to 360M parameters, it is designed to enhance lightweight reasoning, logical deduction, and structured response generation—all while maintaining efficiency for resource-constrained environments.

---

## **Key Highlights**

1. **Compact Yet Powerful**
   With 360M parameters, the model balances performance and efficiency, offering solid reasoning capabilities with fast inference speeds.

2. **Reasoning-Oriented Training**
   Fine-tuned on instruction-tuned datasets like **Celestia3-DeepSeek-R1-0528**, optimized for logical step-by-step thinking.

3. **Optimized for Edge & Research**
   Usable on mid-range GPUs or CPU environments, making it ideal for experimentation, teaching, and lightweight deployment.

4. **Structured Generation Support**
   Capable of outputting well-organized content such as JSON, lists, workflows, and tabular formats.

---

## **Quickstart with 🤗 Transformers**

```python
%%capture
!pip install transformers
```

```py
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "prithivMLmods/SmolLM2-Rethink-360M"
device = "cuda"  # or "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is gravity?"}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)

inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    temperature=0.2,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0]))
```

---

## **Intended Use**

* **Lightweight Reasoning Tasks**
  Suitable for compact agents needing reasoning abilities without high compute requirements.

* **Educational & Research Assistants**
  Ideal for logic tutors, student aides, or research prototypes.

* **Instruction Following & Structured QA**
  Excels in scenarios requiring concise, step-by-step or well-formatted responses.

* **Microservices & Embedded AI**
  Can be embedded in systems with modest hardware, enabling distributed or modular AI.

---

## **Limitations**

1. **Knowledge Scope**
   Smaller models naturally have less factual coverage compared to large-scale LLMs.

2. **Context Length**
   Best used with shorter prompts and outputs due to token and memory constraints.

3. **Variability in Creative Tasks**
   Less suited for imaginative writing or nuanced creative expression.

4. **Limited Real-World Awareness**
   Model does not have real-time or post-training data awareness.

5. **Prompt Sensitivity**
   Outputs can vary based on phrasing; best results come from clear, guided prompts.