metadata

license: apache-2.0
datasets:
  - sequelbox/Celestia3-DeepSeek-R1-0528
base_model:
  - HuggingFaceTB/SmolLM2-360M-Instruct
library_name: transformers
language:
  - en
pipeline_tag: text-generation
tags:
  - trl
  - text-generation-inference
  - r1
  - re-think

SmolLM2-Rethink-360M

SmolLM2-Rethink-360M is an experimental lightweight reasoning model trained on the Celestia3-DeepSeek-R1-0528 dataset. Built on top of the SmolLM2-135M-Instruct architecture and scaled to 360M parameters, it is designed to enhance lightweight reasoning, logical deduction, and structured response generation—all while maintaining efficiency for resource-constrained environments.

Key Highlights

Compact Yet Powerful With 360M parameters, the model balances performance and efficiency, offering solid reasoning capabilities with fast inference speeds.
Reasoning-Oriented Training Fine-tuned on instruction-tuned datasets like Celestia3-DeepSeek-R1-0528, optimized for logical step-by-step thinking.
Optimized for Edge & Research Usable on mid-range GPUs or CPU environments, making it ideal for experimentation, teaching, and lightweight deployment.
Structured Generation Support Capable of outputting well-organized content such as JSON, lists, workflows, and tabular formats.

Quickstart with 🤗 Transformers

%%capture
!pip install transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "prithivMLmods/SmolLM2-Rethink-360M"
device = "cuda"  # or "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is gravity?"}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)

inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    temperature=0.2,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0]))

Intended Use

Lightweight Reasoning Tasks Suitable for compact agents needing reasoning abilities without high compute requirements.
Educational & Research Assistants Ideal for logic tutors, student aides, or research prototypes.
Instruction Following & Structured QA Excels in scenarios requiring concise, step-by-step or well-formatted responses.
Microservices & Embedded AI Can be embedded in systems with modest hardware, enabling distributed or modular AI.

Limitations

Knowledge Scope Smaller models naturally have less factual coverage compared to large-scale LLMs.
Context Length Best used with shorter prompts and outputs due to token and memory constraints.
Variability in Creative Tasks Less suited for imaginative writing or nuanced creative expression.
Limited Real-World Awareness Model does not have real-time or post-training data awareness.
Prompt Sensitivity Outputs can vary based on phrasing; best results come from clear, guided prompts.