prithivMLmods
/

SmolLM2-Rethink-360M

Text Generation

text-generation-inference

Model card Files Files and versions

SmolLM2-Rethink-360M / README.md

prithivMLmods's picture

Update README.md

501e1d4 verified 3 days ago

|

history blame contribute delete

3.29 kB

	---
	license: apache-2.0
	datasets:
	- sequelbox/Celestia3-DeepSeek-R1-0528
	base_model:
	- HuggingFaceTB/SmolLM2-360M-Instruct
	library_name: transformers
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- trl
	- text-generation-inference
	- r1
	- re-think
	---

	![Add a heading.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/HWLZRJqFt1tOH8IjOyDHf.png)

	# SmolLM2-Rethink-360M

	> SmolLM2-Rethink-360M is an experimental lightweight reasoning model trained on the Celestia3-DeepSeek-R1-0528 dataset. Built on top of the SmolLM2-135M-Instruct architecture and scaled to 360M parameters, it is designed to enhance lightweight reasoning, logical deduction, and structured response generation—all while maintaining efficiency for resource-constrained environments.

	---

	## Key Highlights

	1. Compact Yet Powerful
	With 360M parameters, the model balances performance and efficiency, offering solid reasoning capabilities with fast inference speeds.

	2. Reasoning-Oriented Training
	Fine-tuned on instruction-tuned datasets like Celestia3-DeepSeek-R1-0528, optimized for logical step-by-step thinking.

	3. Optimized for Edge & Research
	Usable on mid-range GPUs or CPU environments, making it ideal for experimentation, teaching, and lightweight deployment.

	4. Structured Generation Support
	Capable of outputting well-organized content such as JSON, lists, workflows, and tabular formats.

	---

	## Quickstart with 🤗 Transformers

	```python
	%%capture
	!pip install transformers
	```

	```py
	from transformers import AutoModelForCausalLM, AutoTokenizer

	checkpoint = "prithivMLmods/SmolLM2-Rethink-360M"
	device = "cuda" # or "cpu"

	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

	messages = [{"role": "user", "content": "What is gravity?"}]
	input_text = tokenizer.apply_chat_template(messages, tokenize=False)
	print(input_text)

	inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
	outputs = model.generate(
	inputs,
	max_new_tokens=1024,
	temperature=0.2,
	top_p=0.9,
	do_sample=True
	)

	print(tokenizer.decode(outputs[0]))
	```

	---

	## Intended Use

	* Lightweight Reasoning Tasks
	Suitable for compact agents needing reasoning abilities without high compute requirements.

	* Educational & Research Assistants
	Ideal for logic tutors, student aides, or research prototypes.

	* Instruction Following & Structured QA
	Excels in scenarios requiring concise, step-by-step or well-formatted responses.

	* Microservices & Embedded AI
	Can be embedded in systems with modest hardware, enabling distributed or modular AI.

	---

	## Limitations

	1. Knowledge Scope
	Smaller models naturally have less factual coverage compared to large-scale LLMs.

	2. Context Length
	Best used with shorter prompts and outputs due to token and memory constraints.

	3. Variability in Creative Tasks
	Less suited for imaginative writing or nuanced creative expression.

	4. Limited Real-World Awareness
	Model does not have real-time or post-training data awareness.

	5. Prompt Sensitivity
	Outputs can vary based on phrasing; best results come from clear, guided prompts.