Update README.md

Browse files

Files changed (1) hide show

README.md +144 -13

README.md CHANGED Viewed

@@ -1,21 +1,152 @@
 ---
-base_model: unsloth/gemma-3-1b-it
 tags:
-- text-generation-inference
-- transformers
 - unsloth
-- gemma3_text
-license: apache-2.0
-language:
-- en
 ---
-# Uploaded finetuned  model
-- **Developed by:** cyberandy
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/gemma-3-1b-it
-This gemma3_text model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+license: apache-2.0 # Or appropriate license
+language: en
+library_name: transformers
+pipeline_tag: text-generation
 tags:
+- gemma3
 - unsloth
+- grpo
+- rlft
+- seo
+- reasoning
+- instruction-tuning
+- text-generation
+- experimental
+- wordlift
+- seontology
 ---
+# Gemma 3 1B - SEO Reasoning (GRPO Iteration 1) - `cyberandy/gemma3-1b-feliSEO-2run`
+## Model Description
+This model is an **experimental fine-tune** of the `unsloth/gemma-3-1b-it` model, developed as part of research within the **WordLift Lab**. It was trained using Group Policy Optimization (GRPO) with the `trl` library and Unsloth optimizations.
+The primary goal of this fine-tuning iteration was to teach the model to perform basic SEO-related reasoning tasks, guided by concepts from the **SEOntology project (https://github.com/seontology/)**, and structure its output using specific XML-like tags: `<reasoning>` and `<answer>`.
+**This is an early iteration based on a very small dataset (100 examples) and limited training steps. It demonstrates partial success in format adherence but requires significantly more data and training for robust content generation.**
+*   **Developed by:** cyberandy (WordLift Lab)
+*   **Base Model:** `unsloth/gemma-3-1b-it`
+*   **Fine-tuning Method:** Group Policy Optimization (GRPO)
+*   **Language(s):** English
+*   **License:** Likely Apache 2.0 (inherited from base model, confirm license compatibility)
+## Training Data
+The model was fine-tuned on a custom dataset consisting of **100 examples**. This dataset was created through a two-step process:
+1.  **Synthetic Generation:** Initial SEO task prompts (covering meta descriptions, schema suggestions, keyword analysis, etc., inspired by real-world SEO challenges) were used with the Gemini 1.5 Pro API to generate initial responses containing `<reasoning>` and `<answer>` sections.
+2.  **LLM-as-a-Judge Evaluation:** The generated synthetic examples were then evaluated by Gemini 1.5 Pro (acting as a judge based on SEO best practices and concepts derived from the **SEOntology's `seovoc` vocabulary (https://w3id.org/seovoc/)**) to assign a reward score (0.0 to 1.0) to each example.
+3.  **Final Format:** The data was processed into `{'prompt': str, 'reward': float}` format, where `prompt` contained the system and user turns formatted using the Gemma 3 chat template.
+Due to the small dataset size, the diversity of SEO tasks covered is limited.
+## Training Procedure
+*   **Framework:** `unsloth`, `trl`, `peft`, `transformers`, `torch`
+*   **Algorithm:** GRPO (`trl.GRPOTrainer`)
+*   **Model:** `unsloth/gemma-3-1b-it` loaded without 4-bit quantization (float32/bfloat16) using `FastLanguageModel`.
+*   **PEFT:** LoRA adapters applied with `r=8`, `lora_alpha=8`.
+*   **Reward Functions:** Custom Python functions were used during training to provide rewards based on the model's generated completions:
+    *   `soft_format_reward_func_partial`: Rewarded adherence to the `<reasoning>`/`<answer>` structure, with partial credit for individual tags.
+    *   `ontology_keyword_reward_func_partial`: Gave small, diminishing rewards for finding relevant SEO/SEOntology keywords (e.g., 'seovoc:', 'schema.org', 'ctr', 'entity') within the `<reasoning>` section.
+    *   Combined scores were rescaled using `tanh` into approximately `[-1, 1]`.
+*   **Key Hyperparameters (`GRPOConfig`):**
+    *   `learning_rate`: 5e-6
+    *   `per_device_train_batch_size`: 2
+    *   `gradient_accumulation_steps`: 2 (Effective batch size: 4)
+    *   `gradient_checkpointing`: True (using Unsloth's implementation)
+    *   `num_generations`: 2
+    *   `max_prompt_length`: 512
+    *   `max_completion_length`: 512
+    *   `max_steps`: 500 (Intended, actual run might have varied)
+    *   `optim`: adamw_8bit
+    *   `lr_scheduler_type`: cosine
+    *   `warmup_ratio`: 0.1
+*   **Hardware:** NVIDIA A100 (40GB VRAM)
+## Evaluation Results
+*   **Quantitative:** No formal evaluation metrics were calculated for this iteration.
+*   **Training Metrics:** Positive average rewards (~0.4-0.6+) were observed, indicating the reward functions provided a learning signal. Training completed the intended steps. W&B Run: [Link to your W&B run if public]
+*   **Qualitative:** Inference shows the model **adheres well to the requested `<reasoning>/<answer>` format**. Reasoning content often shows logical steps. Answer content is sometimes relevant but can suffer from **hallucinations or incoherence**, especially near the end of generation. Requires further training on more diverse data and potentially reward function refinement.
+## Intended Use
+*   **Primary Use:** Research and experimentation by WordLift Lab on applying GRPO for teaching structured SEO reasoning guided by SEOntology concepts. Demonstrating the fine-tuning process.
+*   **Out-of-Scope:** **NOT suitable for production use.** Reliability and factual accuracy are not guaranteed.
+## Limitations and Bias
+*   **Small, Synthetic Dataset:** Trained on only 100 synthetic examples. Generalization is limited.
+*   **Content Quality:** Prone to hallucinations and incoherence. Factual accuracy not verified.
+*   **Reward Function Simplicity:** Rewards focused on format and basic keywords, not deep semantic correctness.
+*   **Training Instability/Speed:** Significant performance challenges were encountered during development, particularly with the 4B model. The successful 1B run required specific configurations impacting speed.
+*   **Inherited Bias:** Inherits biases from the base `gemma-3-1b-it` model and the Gemini 1.5 Pro model used for data generation and evaluation.
+## How to Use
+Load the merged model (this repository contains the full 16-bit merged weights) using `transformers`:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
+import torch
+model_id = "cyberandy/gemma3-1b-feliSEO-2run"
+device = "cuda" if torch.cuda.is_available() else "cpu"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+# Load in bfloat16 or float16
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16, # Or torch.float16
+    device_map="auto", # Use GPU if available
+    attn_implementation="eager" # Recommended for Gemma 3 stability
+)
+model.eval()
+# --- Define System Prompt Used During Training/Expected by Model ---
+system_prompt = """Respond in the following format:
+<reasoning>
+Explain your thinking step-by-step. Use relevant SEO concepts.
+</reasoning>
+<answer>
+Provide the final answer to the question or the requested SEO element.
+</answer>"""
+# --- Prepare Input ---
+user_query = "What schema.org type should be used for a local dental clinic's homepage?"
+messages = [
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": user_query}
+]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    tokenize=True,
+    return_tensors="pt"
+).to(model.device)
+# --- Configure Generation ---
+gen_config = GenerationConfig(
+    max_new_tokens=512,
+    temperature=0.7,
+    top_p=0.95,
+    do_sample=True,
+    pad_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is not None else 1, # Use EOS ID for pad
+    eos_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is not None else 1,
+)
+# --- Generate ---
+with torch.no_grad():
+    outputs = model.generate(input_ids=inputs, generation_config=gen_config)
+# --- Decode ---
+generated_ids = outputs[0][inputs.shape[-1]:]
+generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True)
+print(f"Prompt: {user_query}")
+print("-" * 20)
+print(f"Generated Output:\n{generated_text}")