Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
language: en
|
4 |
library_name: transformers
|
5 |
pipeline_tag: text-generation
|
@@ -13,19 +13,21 @@ tags:
|
|
13 |
- seovoc
|
14 |
- schema.org
|
15 |
- wordlift
|
16 |
-
- 4bit
|
17 |
-
|
18 |
-
|
|
|
19 |
---
|
20 |
-
<
|
21 |
-
<img src="SEOcrate-llm-logo-wordlift.png" alt="SEOcrate
|
22 |
-
|
|
|
|
|
23 |
|
24 |
-
<
|
25 |
-
|
26 |
-
</a>
|
27 |
|
28 |
-
|
29 |
|
30 |
This model is a fine-tuned version of `unsloth/gemma-3-4b-it-bnb-4bit` (a 4-bit quantized version of `google/gemma-3-4b-it`) specifically adapted for Search Engine Optimization (SEO) reasoning tasks using Group Policy Optimization (GRPO).
|
31 |
|
@@ -60,7 +62,7 @@ This model is a fine-tuned version of `unsloth/gemma-3-4b-it-bnb-4bit` (a 4-bit
|
|
60 |
|
61 |
* **Base Model:** `unsloth/gemma-3-4b-it-bnb-4bit`
|
62 |
* **Fine-tuning Method:** Group Policy Optimization (GRPO) via the `trl` library, accelerated with Unsloth.
|
63 |
-
* **Dataset:** A custom synthetic dataset (`cyberandy/seo-grpo-reasoning-dataset-
|
64 |
* **Training Steps:** `500` steps.
|
65 |
* **Key Hyperparameters:**
|
66 |
* Learning Rate: `5e-6` (with cosine decay)
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
language: en
|
4 |
library_name: transformers
|
5 |
pipeline_tag: text-generation
|
|
|
13 |
- seovoc
|
14 |
- schema.org
|
15 |
- wordlift
|
16 |
+
- 4bit
|
17 |
+
base_model: unsloth/gemma-3-4b-it-bnb-4bit
|
18 |
+
datasets:
|
19 |
+
- cyberandy/seo-grpo-reasoning-dataset-1000
|
20 |
---
|
21 |
+
<p align="center">
|
22 |
+
<img src="SEOcrate-llm-logo-wordlift.png" alt="SEOcrate Logo" width="120" style="vertical-align: middle;"/>
|
23 |
+
|
24 |
+
<img src="https://upload.wikimedia.org/wikipedia/commons/4/48/WordLift-logo-horizontal-2024.png" alt="WordLift Logo" width="100" style="vertical-align: middle;"/>
|
25 |
+
</p>
|
26 |
|
27 |
+
<h1 align="center">SEOcrate-4B_grpo_new_01</h1>
|
28 |
+
<h3 align="center">Gemma 3 4B Fine-tuned for SEO Reasoning</h3>
|
|
|
29 |
|
30 |
+
---
|
31 |
|
32 |
This model is a fine-tuned version of `unsloth/gemma-3-4b-it-bnb-4bit` (a 4-bit quantized version of `google/gemma-3-4b-it`) specifically adapted for Search Engine Optimization (SEO) reasoning tasks using Group Policy Optimization (GRPO).
|
33 |
|
|
|
62 |
|
63 |
* **Base Model:** `unsloth/gemma-3-4b-it-bnb-4bit`
|
64 |
* **Fine-tuning Method:** Group Policy Optimization (GRPO) via the `trl` library, accelerated with Unsloth.
|
65 |
+
* **Dataset:** A custom synthetic dataset (`cyberandy/seo-grpo-reasoning-dataset-1000` or a later version) containing SEO task prompts. Reward signals were generated using Gemini 1.5 Pro as an LLM-as-a-Judge, evaluating generated reasoning/answers against SEO best practices and ontology concepts.
|
66 |
* **Training Steps:** `500` steps.
|
67 |
* **Key Hyperparameters:**
|
68 |
* Learning Rate: `5e-6` (with cosine decay)
|