SEOcrate Logo    SEOcrate      powered by    WordLift Logo

SEOcrate-4B_grpo_new_01

Gemma 3 4B Fine-tuned for SEO Reasoning

SEOcrate spells like "Sòcrate" (the Greek philosopher), blending SEO expertise with deep reasoning.


This model is a fine-tuned version of unsloth/gemma-3-4b-it-bnb-4bit (a 4-bit quantized version of google/gemma-3-4b-it) specifically adapted for Search Engine Optimization (SEO) reasoning tasks using Group Policy Optimization (GRPO).

Project Goal: To create a model capable of understanding SEO-related prompts, applying concepts from schema.org and the SEOntology (seovoc), and generating structured explanations and answers.

Model Capabilities:

  • Structured Output: Designed to generate responses in a specific XML-like format:
    <reasoning>
    [Step-by-step explanation using SEO concepts]
    </reasoning>
    <answer>
    [Concise answer to the prompt]
    </answer>
    
  • SEO Task Focus: Trained on prompts covering various SEO tasks, including:
    • Meta Description Optimization
    • Internal Link Suggestion
    • Query Performance Trend Analysis
    • Schema.org Type Suggestion
    • Named Entity Recognition (SEO context)
    • Title Tag Optimization
    • Keyword Intent Classification
    • Robots.txt Rule Suggestion
    • Canonical Tag Decisions
    • E-E-A-T Assessment
    • Potentially others depending on final dataset used
  • Ontology Awareness: The fine-tuning process aimed to incorporate understanding and usage of terms from schema.org and seovoc.

Fine-tuning Details

  • Base Model: unsloth/gemma-3-4b-it-bnb-4bit
  • Fine-tuning Method: Group Policy Optimization (GRPO) via the trl library, accelerated with Unsloth.
  • Dataset: A custom synthetic dataset (cyberandy/seo-grpo-reasoning-dataset-1000 or a later version) containing SEO task prompts. Reward signals were generated using Gemini 1.5 Pro as an LLM-as-a-Judge, evaluating generated reasoning/answers against SEO best practices and ontology concepts.
  • Training Steps: 500 steps.
  • Key Hyperparameters:
    • Learning Rate: 5e-6 (with cosine decay)
    • Batch Size (Effective): 8 (per_device_train_batch_size=8, gradient_accumulation_steps=1)
    • Optimizer: adamw_8bit
    • Gradient Checkpointing: Disabled (False)
    • Sequence Length: 2048
    • Reward Function: Custom Python function evaluating format correctness, keyword usage (including seovoc terms), length penalties, etc., with tanh scaling.
  • Hardware: NVIDIA A100 40GB

How to Use

This model expects prompts formatted for chat, ideally including a system prompt defining the expected output structure.

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
import torch

# Use the Hub ID of this repository
model_id = "cyberandy/SEOcrate-4B_grpo_new_01"
device = "cuda" # Or "cpu"

# Load tokenizer (Use original base model for stability if needed)
# tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-4b-it")
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Configure tokenizer padding
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'left'

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16, # Or torch.bfloat16 if supported
    device_map="auto"
    # attn_implementation="flash_attention_2" # Optional if FA2 is available
)
model.eval()

# System Prompt (Crucial for format)
system_prompt = """
Act as an expert SEO analyst familiar with the seovoc ontology (https://w3id.org/seovoc/) which extends schema.org.
Based on the provided input, perform the specified SEO task.
Output your analysis and suggestion in the specified XML format:
<reasoning>
Explain your reasoning step-by-step. Use seovoc/schema.org concepts where relevant.
</reasoning>
<answer>
Provide only the final answer in the requested format.
</answer>
"""

# Example User Prompt
user_prompt = "Suggest an appropriate schema.org type for a webpage that lists local business hours, address, and phone number."

# Format messages
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user",   "content": user_prompt},
]

# Apply chat template
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt"
).to(device)

# Generation Config (Use correct token IDs)
gen_config = GenerationConfig(
    max_new_tokens=512,
    temperature=0.1,
    top_p=0.9,
    do_sample=False,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is not None else 1 # Use tokenizer EOS or default Gemma EOS
)

# Generate
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
print(f"Prompt: {user_prompt}\n---")
with torch.no_grad():
    _ = model.generate(input_ids=inputs, generation_config=gen_config, streamer=text_streamer)
---

Evaluation

Initial qualitative evaluation after ~400 training steps showed promising results on tasks like schema suggestion and meta description optimization, demonstrating adherence to the required format and basic application of SEO concepts. Performance gaps compared to GPT-4o were observed on more complex tasks like entity identification within SEO contexts and nuanced schema selection. The model struggled with format adherence on multi-step numerical reasoning tasks (e.g., Query Trend Analysis) at 400 steps. LLM-as-a-Judge (Gemini 1.5 Pro) scores indicated strong performance on simpler tasks but lower scores on the identified weaker areas.

Limitations and Bias

The model was trained on a relatively small (960 examples, potentially cleaned) synthetic dataset. Performance may vary on real-world, complex SEO scenarios not represented in the training data. The quality of the reasoning is heavily influenced by the examples generated by the initial LLM (Gemini 1.5 Pro) and the subsequent reward scores. The model might generate plausible-sounding but incorrect SEO advice. Always verify outputs with expert knowledge or official documentation. Potential biases present in the base Gemma model or the data generation process may persist. The model's understanding of seovoc is based on its inclusion in prompts and reward signals, not necessarily a deep ontological grounding.

Disclaimer

This model is provided for research and experimentation purposes. Use its outputs as a starting point or assistant, but always apply critical thinking and expert review before implementing SEO strategies based on its suggestions.

Acknowledgements

Thanks to Google Cloud for the GPU credits and to the WordLift team for shaping the future of AI Search. Built upon Google's Gemma 3 model. Utilizes concepts from schema.org and the SEOntology (seovoc).

Downloads last month
17
Safetensors
Model size
4.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cyberandy/SEOcrate-4B_grpo_new_01

Finetuned
(5)
this model

Dataset used to train cyberandy/SEOcrate-4B_grpo_new_01