cyberandy
/

SEOcrate-4B_grpo_new_01

Model card Files Files and versions Community

cyberandy commited on Apr 27

Commit

9556c16

·

verified ·

1 Parent(s): 299fa43

Update README.md

Files changed (1) hide show

README.md +14 -12

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-license: apache-2.0 # Or choose another appropriate license
 language: en
 library_name: transformers
 pipeline_tag: text-generation
@@ -13,19 +13,21 @@ tags:
 - seovoc
 - schema.org
 - wordlift
-- 4bit # If merged from 4bit
-# - merged_16bit # Based on save format
-base_model: unsloth/gemma-3-4b-it-bnb-4bit # Or google/gemma-3-4b-it if base wasn't unsloth's
 ---
-<a href="https://wordlift.io/" target="_blank">
-  <img src="SEOcrate-llm-logo-wordlift.png" alt="SEOcrate - A Reasoning LLM by WordLift" width="100">
-</a>
-<a href="https://wordlift.io/" target="_blank">
-  <img src="https://upload.wikimedia.org/wikipedia/commons/4/48/WordLift-logo-horizontal-2024.png" alt="SEOcrate Logo" width="60">
-</a>
-# SEOcrate-4B_grpo_new_01 - Gemma 3 4B Fine-tuned for SEO Reasoning
 This model is a fine-tuned version of `unsloth/gemma-3-4b-it-bnb-4bit` (a 4-bit quantized version of `google/gemma-3-4b-it`) specifically adapted for Search Engine Optimization (SEO) reasoning tasks using Group Policy Optimization (GRPO).
@@ -60,7 +62,7 @@ This model is a fine-tuned version of `unsloth/gemma-3-4b-it-bnb-4bit` (a 4-bit
 *   **Base Model:** `unsloth/gemma-3-4b-it-bnb-4bit`
 *   **Fine-tuning Method:** Group Policy Optimization (GRPO) via the `trl` library, accelerated with Unsloth.
-*   **Dataset:** A custom synthetic dataset (`cyberandy/seo-grpo-reasoning-dataset-100` or a later version) containing SEO task prompts. Reward signals were generated using Gemini 1.5 Pro as an LLM-as-a-Judge, evaluating generated reasoning/answers against SEO best practices and ontology concepts.
 *   **Training Steps:** `500` steps.
 *   **Key Hyperparameters:**
     *   Learning Rate: `5e-6` (with cosine decay)

 ---
+license: apache-2.0
 language: en
 library_name: transformers
 pipeline_tag: text-generation
 - seovoc
 - schema.org
 - wordlift
+- 4bit
+base_model: unsloth/gemma-3-4b-it-bnb-4bit
+datasets:
+- cyberandy/seo-grpo-reasoning-dataset-1000
 ---
+<p align="center">
+  <img src="SEOcrate-llm-logo-wordlift.png" alt="SEOcrate Logo" width="120" style="vertical-align: middle;"/>
+  &nbsp;&nbsp;&nbsp;
+  <img src="https://upload.wikimedia.org/wikipedia/commons/4/48/WordLift-logo-horizontal-2024.png" alt="WordLift Logo" width="100" style="vertical-align: middle;"/>
+</p>
+<h1 align="center">SEOcrate-4B_grpo_new_01</h1>
+<h3 align="center">Gemma 3 4B Fine-tuned for SEO Reasoning</h3>
+---
 This model is a fine-tuned version of `unsloth/gemma-3-4b-it-bnb-4bit` (a 4-bit quantized version of `google/gemma-3-4b-it`) specifically adapted for Search Engine Optimization (SEO) reasoning tasks using Group Policy Optimization (GRPO).
 *   **Base Model:** `unsloth/gemma-3-4b-it-bnb-4bit`
 *   **Fine-tuning Method:** Group Policy Optimization (GRPO) via the `trl` library, accelerated with Unsloth.
+*   **Dataset:** A custom synthetic dataset (`cyberandy/seo-grpo-reasoning-dataset-1000` or a later version) containing SEO task prompts. Reward signals were generated using Gemini 1.5 Pro as an LLM-as-a-Judge, evaluating generated reasoning/answers against SEO best practices and ontology concepts.
 *   **Training Steps:** `500` steps.
 *   **Key Hyperparameters:**
     *   Learning Rate: `5e-6` (with cosine decay)