cyberandy commited on
Commit
9556c16
·
verified ·
1 Parent(s): 299fa43

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -12
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- license: apache-2.0 # Or choose another appropriate license
3
  language: en
4
  library_name: transformers
5
  pipeline_tag: text-generation
@@ -13,19 +13,21 @@ tags:
13
  - seovoc
14
  - schema.org
15
  - wordlift
16
- - 4bit # If merged from 4bit
17
- # - merged_16bit # Based on save format
18
- base_model: unsloth/gemma-3-4b-it-bnb-4bit # Or google/gemma-3-4b-it if base wasn't unsloth's
 
19
  ---
20
- <a href="https://wordlift.io/" target="_blank">
21
- <img src="SEOcrate-llm-logo-wordlift.png" alt="SEOcrate - A Reasoning LLM by WordLift" width="100">
22
- </a>
 
 
23
 
24
- <a href="https://wordlift.io/" target="_blank">
25
- <img src="https://upload.wikimedia.org/wikipedia/commons/4/48/WordLift-logo-horizontal-2024.png" alt="SEOcrate Logo" width="60">
26
- </a>
27
 
28
- # SEOcrate-4B_grpo_new_01 - Gemma 3 4B Fine-tuned for SEO Reasoning
29
 
30
  This model is a fine-tuned version of `unsloth/gemma-3-4b-it-bnb-4bit` (a 4-bit quantized version of `google/gemma-3-4b-it`) specifically adapted for Search Engine Optimization (SEO) reasoning tasks using Group Policy Optimization (GRPO).
31
 
@@ -60,7 +62,7 @@ This model is a fine-tuned version of `unsloth/gemma-3-4b-it-bnb-4bit` (a 4-bit
60
 
61
  * **Base Model:** `unsloth/gemma-3-4b-it-bnb-4bit`
62
  * **Fine-tuning Method:** Group Policy Optimization (GRPO) via the `trl` library, accelerated with Unsloth.
63
- * **Dataset:** A custom synthetic dataset (`cyberandy/seo-grpo-reasoning-dataset-100` or a later version) containing SEO task prompts. Reward signals were generated using Gemini 1.5 Pro as an LLM-as-a-Judge, evaluating generated reasoning/answers against SEO best practices and ontology concepts.
64
  * **Training Steps:** `500` steps.
65
  * **Key Hyperparameters:**
66
  * Learning Rate: `5e-6` (with cosine decay)
 
1
  ---
2
+ license: apache-2.0
3
  language: en
4
  library_name: transformers
5
  pipeline_tag: text-generation
 
13
  - seovoc
14
  - schema.org
15
  - wordlift
16
+ - 4bit
17
+ base_model: unsloth/gemma-3-4b-it-bnb-4bit
18
+ datasets:
19
+ - cyberandy/seo-grpo-reasoning-dataset-1000
20
  ---
21
+ <p align="center">
22
+ <img src="SEOcrate-llm-logo-wordlift.png" alt="SEOcrate Logo" width="120" style="vertical-align: middle;"/>
23
+ &nbsp;&nbsp;&nbsp;
24
+ <img src="https://upload.wikimedia.org/wikipedia/commons/4/48/WordLift-logo-horizontal-2024.png" alt="WordLift Logo" width="100" style="vertical-align: middle;"/>
25
+ </p>
26
 
27
+ <h1 align="center">SEOcrate-4B_grpo_new_01</h1>
28
+ <h3 align="center">Gemma 3 4B Fine-tuned for SEO Reasoning</h3>
 
29
 
30
+ ---
31
 
32
  This model is a fine-tuned version of `unsloth/gemma-3-4b-it-bnb-4bit` (a 4-bit quantized version of `google/gemma-3-4b-it`) specifically adapted for Search Engine Optimization (SEO) reasoning tasks using Group Policy Optimization (GRPO).
33
 
 
62
 
63
  * **Base Model:** `unsloth/gemma-3-4b-it-bnb-4bit`
64
  * **Fine-tuning Method:** Group Policy Optimization (GRPO) via the `trl` library, accelerated with Unsloth.
65
+ * **Dataset:** A custom synthetic dataset (`cyberandy/seo-grpo-reasoning-dataset-1000` or a later version) containing SEO task prompts. Reward signals were generated using Gemini 1.5 Pro as an LLM-as-a-Judge, evaluating generated reasoning/answers against SEO best practices and ontology concepts.
66
  * **Training Steps:** `500` steps.
67
  * **Key Hyperparameters:**
68
  * Learning Rate: `5e-6` (with cosine decay)