cyberandy commited on
Commit
1ffc437
·
verified ·
1 Parent(s): a1bc69c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +144 -13
README.md CHANGED
@@ -1,21 +1,152 @@
1
  ---
2
- base_model: unsloth/gemma-3-1b-it
 
 
 
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
  - unsloth
7
- - gemma3_text
8
- license: apache-2.0
9
- language:
10
- - en
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** cyberandy
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/gemma-3-1b-it
18
 
19
- This gemma3_text model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0 # Or appropriate license
3
+ language: en
4
+ library_name: transformers
5
+ pipeline_tag: text-generation
6
  tags:
7
+ - gemma3
 
8
  - unsloth
9
+ - grpo
10
+ - rlft
11
+ - seo
12
+ - reasoning
13
+ - instruction-tuning
14
+ - text-generation
15
+ - experimental
16
+ - wordlift
17
+ - seontology
18
  ---
19
 
20
+ # Gemma 3 1B - SEO Reasoning (GRPO Iteration 1) - `cyberandy/gemma3-1b-feliSEO-2run`
21
 
22
+ ## Model Description
 
 
23
 
24
+ This model is an **experimental fine-tune** of the `unsloth/gemma-3-1b-it` model, developed as part of research within the **WordLift Lab**. It was trained using Group Policy Optimization (GRPO) with the `trl` library and Unsloth optimizations.
25
 
26
+ The primary goal of this fine-tuning iteration was to teach the model to perform basic SEO-related reasoning tasks, guided by concepts from the **SEOntology project (https://github.com/seontology/)**, and structure its output using specific XML-like tags: `<reasoning>` and `<answer>`.
27
+
28
+ **This is an early iteration based on a very small dataset (100 examples) and limited training steps. It demonstrates partial success in format adherence but requires significantly more data and training for robust content generation.**
29
+
30
+ * **Developed by:** cyberandy (WordLift Lab)
31
+ * **Base Model:** `unsloth/gemma-3-1b-it`
32
+ * **Fine-tuning Method:** Group Policy Optimization (GRPO)
33
+ * **Language(s):** English
34
+ * **License:** Likely Apache 2.0 (inherited from base model, confirm license compatibility)
35
+
36
+ ## Training Data
37
+
38
+ The model was fine-tuned on a custom dataset consisting of **100 examples**. This dataset was created through a two-step process:
39
+
40
+ 1. **Synthetic Generation:** Initial SEO task prompts (covering meta descriptions, schema suggestions, keyword analysis, etc., inspired by real-world SEO challenges) were used with the Gemini 1.5 Pro API to generate initial responses containing `<reasoning>` and `<answer>` sections.
41
+ 2. **LLM-as-a-Judge Evaluation:** The generated synthetic examples were then evaluated by Gemini 1.5 Pro (acting as a judge based on SEO best practices and concepts derived from the **SEOntology's `seovoc` vocabulary (https://w3id.org/seovoc/)**) to assign a reward score (0.0 to 1.0) to each example.
42
+ 3. **Final Format:** The data was processed into `{'prompt': str, 'reward': float}` format, where `prompt` contained the system and user turns formatted using the Gemma 3 chat template.
43
+
44
+ Due to the small dataset size, the diversity of SEO tasks covered is limited.
45
+
46
+ ## Training Procedure
47
+
48
+ * **Framework:** `unsloth`, `trl`, `peft`, `transformers`, `torch`
49
+ * **Algorithm:** GRPO (`trl.GRPOTrainer`)
50
+ * **Model:** `unsloth/gemma-3-1b-it` loaded without 4-bit quantization (float32/bfloat16) using `FastLanguageModel`.
51
+ * **PEFT:** LoRA adapters applied with `r=8`, `lora_alpha=8`.
52
+ * **Reward Functions:** Custom Python functions were used during training to provide rewards based on the model's generated completions:
53
+ * `soft_format_reward_func_partial`: Rewarded adherence to the `<reasoning>`/`<answer>` structure, with partial credit for individual tags.
54
+ * `ontology_keyword_reward_func_partial`: Gave small, diminishing rewards for finding relevant SEO/SEOntology keywords (e.g., 'seovoc:', 'schema.org', 'ctr', 'entity') within the `<reasoning>` section.
55
+ * Combined scores were rescaled using `tanh` into approximately `[-1, 1]`.
56
+ * **Key Hyperparameters (`GRPOConfig`):**
57
+ * `learning_rate`: 5e-6
58
+ * `per_device_train_batch_size`: 2
59
+ * `gradient_accumulation_steps`: 2 (Effective batch size: 4)
60
+ * `gradient_checkpointing`: True (using Unsloth's implementation)
61
+ * `num_generations`: 2
62
+ * `max_prompt_length`: 512
63
+ * `max_completion_length`: 512
64
+ * `max_steps`: 500 (Intended, actual run might have varied)
65
+ * `optim`: adamw_8bit
66
+ * `lr_scheduler_type`: cosine
67
+ * `warmup_ratio`: 0.1
68
+ * **Hardware:** NVIDIA A100 (40GB VRAM)
69
+
70
+ ## Evaluation Results
71
+
72
+ * **Quantitative:** No formal evaluation metrics were calculated for this iteration.
73
+ * **Training Metrics:** Positive average rewards (~0.4-0.6+) were observed, indicating the reward functions provided a learning signal. Training completed the intended steps. W&B Run: [Link to your W&B run if public]
74
+ * **Qualitative:** Inference shows the model **adheres well to the requested `<reasoning>/<answer>` format**. Reasoning content often shows logical steps. Answer content is sometimes relevant but can suffer from **hallucinations or incoherence**, especially near the end of generation. Requires further training on more diverse data and potentially reward function refinement.
75
+
76
+ ## Intended Use
77
+
78
+ * **Primary Use:** Research and experimentation by WordLift Lab on applying GRPO for teaching structured SEO reasoning guided by SEOntology concepts. Demonstrating the fine-tuning process.
79
+ * **Out-of-Scope:** **NOT suitable for production use.** Reliability and factual accuracy are not guaranteed.
80
+
81
+ ## Limitations and Bias
82
+
83
+ * **Small, Synthetic Dataset:** Trained on only 100 synthetic examples. Generalization is limited.
84
+ * **Content Quality:** Prone to hallucinations and incoherence. Factual accuracy not verified.
85
+ * **Reward Function Simplicity:** Rewards focused on format and basic keywords, not deep semantic correctness.
86
+ * **Training Instability/Speed:** Significant performance challenges were encountered during development, particularly with the 4B model. The successful 1B run required specific configurations impacting speed.
87
+ * **Inherited Bias:** Inherits biases from the base `gemma-3-1b-it` model and the Gemini 1.5 Pro model used for data generation and evaluation.
88
+
89
+ ## How to Use
90
+
91
+ Load the merged model (this repository contains the full 16-bit merged weights) using `transformers`:
92
+
93
+ ```python
94
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
95
+ import torch
96
+
97
+ model_id = "cyberandy/gemma3-1b-feliSEO-2run"
98
+ device = "cuda" if torch.cuda.is_available() else "cpu"
99
+
100
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
101
+ # Load in bfloat16 or float16
102
+ model = AutoModelForCausalLM.from_pretrained(
103
+ model_id,
104
+ torch_dtype=torch.bfloat16, # Or torch.float16
105
+ device_map="auto", # Use GPU if available
106
+ attn_implementation="eager" # Recommended for Gemma 3 stability
107
+ )
108
+ model.eval()
109
+
110
+ # --- Define System Prompt Used During Training/Expected by Model ---
111
+ system_prompt = """Respond in the following format:
112
+ <reasoning>
113
+ Explain your thinking step-by-step. Use relevant SEO concepts.
114
+ </reasoning>
115
+ <answer>
116
+ Provide the final answer to the question or the requested SEO element.
117
+ </answer>"""
118
+
119
+ # --- Prepare Input ---
120
+ user_query = "What schema.org type should be used for a local dental clinic's homepage?"
121
+ messages = [
122
+ {"role": "system", "content": system_prompt},
123
+ {"role": "user", "content": user_query}
124
+ ]
125
+ inputs = tokenizer.apply_chat_template(
126
+ messages,
127
+ add_generation_prompt=True,
128
+ tokenize=True,
129
+ return_tensors="pt"
130
+ ).to(model.device)
131
+
132
+ # --- Configure Generation ---
133
+ gen_config = GenerationConfig(
134
+ max_new_tokens=512,
135
+ temperature=0.7,
136
+ top_p=0.95,
137
+ do_sample=True,
138
+ pad_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is not None else 1, # Use EOS ID for pad
139
+ eos_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is not None else 1,
140
+ )
141
+
142
+ # --- Generate ---
143
+ with torch.no_grad():
144
+ outputs = model.generate(input_ids=inputs, generation_config=gen_config)
145
+
146
+ # --- Decode ---
147
+ generated_ids = outputs[0][inputs.shape[-1]:]
148
+ generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True)
149
+
150
+ print(f"Prompt: {user_query}")
151
+ print("-" * 20)
152
+ print(f"Generated Output:\n{generated_text}")