--- base_model: - google/gemma-3-1b-it tags: - text-generation-inference - transformers - unsloth - GRPO - conversational - gemma3_text - reasoning license: apache-2.0 language: - en datasets: - NuclearAi/HyperThink-Mini-50K --- # About Model - **Developed by:** NuclearAi - **License:** apache-2.0 - **Finetuned from model :** google/gemma-3-1b-it **Gemma** is a family of lightweight, state-of-the-art open models from Google, built using the same research and technology as the **Gemini** models. However, Gemma lacks in the **reasoning** capabilities, making it less advanced compared to some other models. At **Nuclear AI**, we enhance Gemma’s abilities by leveraging **GRPO** and providing it with a specialized dataset to improve its reasoning skills. Our previous version [Testing] of thinking model of Gemma3-1B, have used only 150 rows of high-quality dataset , **but this time we finetuned it on even more dataset. we trained it on 5000 Rows of High-Quality dataset which takes around 70 minutes**. We would love to hear your feedback so we can work on fine-tuning a larger version with more steps and greater computational power. --- ## Installing Libraries ```python # 1. Install the specific Gemma 3 compatible transformers pip install --no-deps git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3 # 2. Install Unsloth (adjust based on your environment - e.g., remove [colab-new] if not on Colab) pip install "unsloth[colab-new]@git+https://github.com/unslothai/unsloth.git" # 3. Install PyTorch (select command based on your CUDA version from https://pytorch.org/) # Example for CUDA 12.1: # pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # Example for CPU only: # pip install torch torchvision torchaudio # 4. Install accelerate and bitsandbytes pip install accelerate bitsandbytes ``` ## Code To Run ```python import torch from unsloth import FastModel from transformers import TextStreamer # 1. Model and Tokenizer Loading max_seq_length = 1024 model_name = "NuclearAi/Nuke_X_Gemma3_1B_Reasoner_v1.0" print(f"Loading model: {model_name}...") model, tokenizer = FastModel.from_pretrained(     model_name = model_name,     max_seq_length = max_seq_length,     dtype = None,         # Let Unsloth choose the best dtype (float16, bf16, float32)     load_in_4bit = False, # Set to True if you want 4-bit quantization     device_map = "auto",  # Automatically use GPU if available ) print("Model loaded.") # 2. Define Prompt Structure reasoning_start = "" reasoning_end   = "" solution_start = "" solution_end = "" system_prompt = \ f"""You are given a problem. Think about the problem and provide your working out. Place it between {reasoning_start} and {reasoning_end}. Then, provide your solution between {solution_start}{solution_end}""" # 3. User Input user_question = "Write a short story about a cat who learns to fly." # Try another question # 4. Format Input for Chat Model messages = [     {"role": "system", "content": system_prompt},     {"role": "user",   "content": user_question}, ] text_input = tokenizer.apply_chat_template(     messages,     tokenize=False,     add_generation_prompt=True # Important for generation ) # 5. Tokenize and Prepare for Generation device = model.device if hasattr(model, 'device') else ('cuda' if torch.cuda.is_available() else 'cpu') inputs = tokenizer([text_input], return_tensors="pt").to(device) streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) # 6. Generate Response print("\n--- Model Response ---") with torch.no_grad():     outputs = model.generate(         **inputs,         streamer=streamer,         max_new_tokens=1024,         temperature=0.7,         top_p=0.9,         top_k=50,         do_sample=True,         pad_token_id=tokenizer.eos_token_id     ) print("\n--- End of Response ---") ``` --- Thank you for your support ! **Jay Shree Ram 🚩🚩**