---
base_model:
- google/gemma-3-1b-it
tags:
- text-generation-inference
- transformers
- unsloth
- GRPO
- conversational
- gemma3_text
- reasoning
license: apache-2.0
language:
- en
datasets:
- NuclearAi/HyperThink-Mini-50K
---
# About Model
- **Developed by:** NuclearAi
- **License:** apache-2.0
- **Finetuned from model :** google/gemma-3-1b-it
**Gemma** is a family of lightweight, state-of-the-art open models from Google, built using the same research and technology as the **Gemini** models. However, Gemma lacks in the **reasoning** capabilities, making it less advanced compared to some other models.
At **Nuclear AI**, we enhance Gemma’s abilities by leveraging **GRPO** and providing it with a specialized dataset to improve its reasoning skills. Our previous version [Testing] of thinking model of Gemma3-1B, have used only 150 rows of high-quality dataset , **but this time we finetuned it on even more dataset. we trained it on 5000 Rows of High-Quality dataset which takes around 70 minutes**.
We would love to hear your feedback so we can work on fine-tuning a larger version with more steps and greater computational power.
---
## Installing Libraries
```python
# 1. Install the specific Gemma 3 compatible transformers
pip install --no-deps git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3
# 2. Install Unsloth (adjust based on your environment - e.g., remove [colab-new] if not on Colab)
pip install "unsloth[colab-new]@git+https://github.com/unslothai/unsloth.git"
# 3. Install PyTorch (select command based on your CUDA version from https://pytorch.org/)
# Example for CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Example for CPU only:
# pip install torch torchvision torchaudio
# 4. Install accelerate and bitsandbytes
pip install accelerate bitsandbytes
```
## Code To Run
```python
import torch
from unsloth import FastModel
from transformers import TextStreamer
# 1. Model and Tokenizer Loading
max_seq_length = 1024
model_name = "NuclearAi/Nuke_X_Gemma3_1B_Reasoner_v1.0"
print(f"Loading model: {model_name}...")
model, tokenizer = FastModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = None, # Let Unsloth choose the best dtype (float16, bf16, float32)
load_in_4bit = False, # Set to True if you want 4-bit quantization
device_map = "auto", # Automatically use GPU if available
)
print("Model loaded.")
# 2. Define Prompt Structure
reasoning_start = ""
reasoning_end = ""
solution_start = ""
solution_end = ""
system_prompt = \
f"""You are given a problem.
Think about the problem and provide your working out.
Place it between {reasoning_start} and {reasoning_end}.
Then, provide your solution between {solution_start}{solution_end}"""
# 3. User Input
user_question = "Write a short story about a cat who learns to fly." # Try another question
# 4. Format Input for Chat Model
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_question},
]
text_input = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True # Important for generation
)
# 5. Tokenize and Prepare for Generation
device = model.device if hasattr(model, 'device') else ('cuda' if torch.cuda.is_available() else 'cpu')
inputs = tokenizer([text_input], return_tensors="pt").to(device)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
# 6. Generate Response
print("\n--- Model Response ---")
with torch.no_grad():
outputs = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
top_k=50,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
print("\n--- End of Response ---")
```
---
Thank you for your support !
**Jay Shree Ram 🚩🚩**