Eric Hackathorn commited on
Commit
3947ba8
·
1 Parent(s): efc9917

updated model card

Browse files
Files changed (1) hide show
  1. README.md +261 -0
README.md CHANGED
@@ -1,3 +1,264 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - phi2
8
+ - lora
9
+ - science-on-a-sphere
10
+ - sos
11
+ - earth-science
12
+ - question-answering
13
+ base_model: microsoft/phi-2
14
+ datasets:
15
+ - HacksHaven/science-on-a-sphere-prompt-completions
16
  ---
17
+
18
+ # Model Card for Model ID
19
+
20
+ This is a LoRA fine-tuned version of Phi-2-2.7b, adapted for educational and scientific question-answering.
21
+ The model has been fine-tuned on the Science On a Sphere (SOS) QA Dataset, which includes thousands of prompt/completion pairs derived
22
+ from NOAA’s Science On a Sphere support content and dataset catalog.
23
+ The model is designed to support Earth science education and enable AI-powered SOS content experiences.
24
+
25
+ ## Model Details
26
+
27
+ Base Model: microsoft/phi-2
28
+ Fine-Tuned by: Eric Hackathorn (NOAA)
29
+ Architecture: Transformer decoder-only (Phi-2)
30
+ Finetuning Type: Parameter-efficient fine-tuning using LoRA
31
+ Language(s): English
32
+ License: MIT
33
+
34
+ ### Model Description
35
+
36
+ **Model Status: Work in Progress**
37
+
38
+ This model is currently under active development. Please note:
39
+
40
+ - The “More Information” URLs are provisional — they currently overemphasize support pages rather than high-level "What is..." resources.
41
+
42
+ - The links will be refined in upcoming updates to better align with the model's purpose and intended audience.
43
+
44
+ - Feedback is welcome to help improve this aspect and others.
45
+
46
+ This model is a LoRA fine-tuned version of microsoft/phi-2, optimized for question answering over content related to NOAA’s Science On a Sphere (SOS) initiative,
47
+ including Earth science metadata, dataset descriptions, support documentation, and educational guidance.
48
+ It is designed to be integrated into museum kiosks, classroom assistants, educational chatbots, and SOS Explorer environments to make complex environmental
49
+ data more accessible and engaging.
50
+
51
+ - Developed by: Eric Hackathorn (NOAA Global Systems Laboratory)
52
+ - Shared by: https://huggingface.co/HacksHaven/phi-2-science-on-a-sphere
53
+ - Model type: Decoder-only transformer (LLM) with LoRA fine-tuning
54
+ - Language(s): English
55
+ - License: MIT
56
+ - Finetuned from model: microsoft/phi-2
57
+
58
+ ## Uses
59
+
60
+ 1. Educational Chatbots
61
+
62
+ **Use**: Plug into an LLM-powered assistant (like ChatGPT or a custom app) in a science museum, classroom, or mobile app.
63
+
64
+ **Example**:
65
+ Student: “What causes a tsunami?”
66
+ Model: Tsunamis are typically caused by underwater earthquakes, often at subduction zones. More information: https://sos.noaa.gov/catalog/datasets/tsunami-locations-2000-bce-2014/
67
+
68
+ 2. Interactive Museum Kiosks
69
+
70
+ **Use**: Replace static displays with conversational kiosks powered by your model.
71
+
72
+ **Example**: A touchscreen exhibit next to an SOS globe where users ask, “What does this animation show?” and the model responds with a summary of that dataset.
73
+
74
+ 3. SOS Explorer Integration
75
+
76
+ **Use**: Embed QA inside SOS Explorer or a future AI-powered version to describe datasets, provide learning guidance, or guide exploratory interactions.
77
+
78
+ **Example**: When a user clicks on a dataset, a bot could summarize it, suggest classroom activities, or quiz the user.
79
+
80
+ 4. Curriculum and Lesson Plan Support
81
+
82
+ **Use**: Teachers ask the model for summaries, concepts, or classroom activities based on a specific dataset.
83
+
84
+ **Example**: “Describe a classroom activity using the dataset about ocean acidification.”
85
+
86
+ 5. Research Assistant for Outreach Teams
87
+
88
+ **Use**: Internal NOAA outreach and comms teams use the model to quickly surface descriptions, summaries, related content, or activity suggestions.
89
+
90
+ 6. Voice-activated Assistants
91
+
92
+ **Use**: Deploy in AR/VR environments or installations with voice input, e.g., “Tell me about sea surface temperature datasets.”
93
+
94
+ ### Direct Use
95
+
96
+ This model is optimized for:
97
+
98
+ - Question-answering on Earth science content
99
+ - SOS educational kiosk applications
100
+ - Embedding into chatbots or classroom tools for informal STEM education
101
+
102
+ ### Downstream Use
103
+
104
+ It can be further fine-tuned for:
105
+
106
+ - Domain-specific science outreach bots
107
+ - Custom SOS Explorer content recommendation engines
108
+ - Multimodal extensions (e.g., image+QA)
109
+
110
+ ### Out-of-Scope Use
111
+
112
+ - Real-time decision-making or scientific analysis requiring exact precision
113
+ - High-stakes classroom assessment without human verification
114
+ - Non-English QA without additional fine-tuning
115
+
116
+ ## Bias, Risks, and Limitations
117
+
118
+ - Some responses may oversimplify complex topics
119
+ - Answers are based on generated content, not human-authored explanations
120
+ - May reflect biases from the underlying LLM or training set structure
121
+
122
+ ### Recommendations
123
+
124
+ - Use model outputs with educator supervision in formal settings
125
+ - Cross-check completions against authoritative SOS materials
126
+ - Avoid deployment in mission-critical scenarios without further vetting
127
+
128
+ ## How to Get Started with the Model
129
+
130
+ This is a merged and quantization-ready version of Qwen3-4B fine-tuned on the Science On a Sphere (SOS) instruction dataset using LoRA + PEFT. You can load it using:
131
+
132
+ ```python
133
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
134
+
135
+ bnb_config = BitsAndBytesConfig(
136
+ load_in_4bit=True,
137
+ bnb_4bit_compute_dtype=torch.bfloat16,
138
+ bnb_4bit_use_double_quant=True,
139
+ bnb_4bit_quant_type="nf4"
140
+ )
141
+
142
+ model = AutoModelForCausalLM.from_pretrained(
143
+ "HacksHaven/phi-2-science-on-a-sphere",
144
+ quantization_config=bnb_config,
145
+ device_map="auto",
146
+ trust_remote_code=True,
147
+ torch_dtype=torch.bfloat16,
148
+ )
149
+
150
+ tokenizer = AutoTokenizer.from_pretrained("HacksHaven/phi-2-science-on-a-sphere", trust_remote_code=True)
151
+ ```
152
+
153
+ Use the code below to chat with the model.
154
+
155
+ ``` python
156
+ qa = pipeline("text-generation", model=model, tokenizer=tokenizer)
157
+ qa("What is NOAA's Science On a Sphere?")
158
+ ```
159
+
160
+ ## Training Details
161
+
162
+ ### Training Data
163
+
164
+ - Source Website: https://sos.noaa.gov/
165
+ - Repository: https://huggingface.co/datasets/HacksHaven/science-on-a-sphere-prompt-completions/
166
+
167
+ #### Preprocessing
168
+
169
+ Prompts and completions were embedded in a Phi-2-friendly conversational format using simple User: / Assistant: prefixes, with no special tokens.
170
+
171
+ ``` python
172
+ User: [Prompt text]
173
+ Assistant: [Completion text]
174
+ ```
175
+
176
+ - Tokenization used padding="longest" and max_length=2048.
177
+ - Labels were copied directly from input IDs for causal language modeling.
178
+
179
+ #### Training Hyperparameters
180
+
181
+ | Parameter | Value |
182
+ | ----------------------- | ------------------------------------------------------------- |
183
+ | Base model | `microsoft/phi-2` |
184
+ | Finetuning method | LoRA (Low-Rank Adaptation) |
185
+ | LoRA Rank (`r`) | 8 |
186
+ | LoRA Alpha | 32 |
187
+ | LoRA Dropout | 0.05 |
188
+ | Target Modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `dense`, `fc1`, `fc2` |
189
+ | Gradient Checkpointing | Enabled |
190
+ | Max sequence length | 2048 |
191
+ | Precision | float32 (for CPU deployment compatibility) |
192
+ | Quantization | 4-bit NF4 via BitsAndBytes |
193
+ | Optimizer | `paged_adamw_8bit` |
194
+ | Learning Rate | 2e-4 |
195
+ | Epochs | 3 |
196
+ | Batch Size | 1 (with gradient accumulation = 4) |
197
+ | Logging & Eval Strategy | Every 10 steps |
198
+ | Evaluation Metric | `bertscore_f1` (maximize) |
199
+ | Load Best Model at End | ✅ Yes |
200
+ Yes |
201
+
202
+ ## Evaluation
203
+
204
+ ### Testing Data, Factors & Metrics
205
+
206
+ #### Testing Data
207
+
208
+ Evaluated on a 10% held-out split of the training dataset (stratified).
209
+
210
+ #### Factors
211
+
212
+ This model was fine-tuned to support instructional content for NOAA's Science On a Sphere (SOS) exhibits, which span a diverse set of topics and audiences. Relevant factors that may affect model performance include:
213
+
214
+ - **Scientific Domain**: The model has seen examples across atmospheric science, oceanography, climate change, space weather, and Earth system interactions. Responses may vary depending on the domain depth in the fine-tuning set.
215
+
216
+ - **Instruction Type**: Prompts vary in style, including explanations of scientific processes, definitions, causal reasoning, and narrative-style descriptions for public displays.
217
+
218
+ - **Intended Audience**: While many prompts are written at a general public or middle school level, the model may perform differently for early learners, specialists, or multilingual audiences.
219
+
220
+ - **Data Origin**: The training set draws from curated NOAA science narratives, educational materials, and exhibit scripts. Domains or tones not represented in these sources may yield less accurate responses.
221
+
222
+ Future evaluations could assess performance across these axes to better understand model reliability in SOS-like deployment environments.
223
+
224
+ #### Metrics
225
+
226
+ - ROUGE-1, ROUGE-2, ROUGE-L: N-gram overlap
227
+ - BLEU: Token-based overlap precision
228
+ - BERTScore F1: Semantic similarity of completions
229
+ - Perplexity: If eval loss is available
230
+
231
+ ### Results
232
+
233
+ Evaluation was performed using ROUGE, BLEU, BERTScore, and perplexity on a held-out 10% test set.
234
+ BERTScore F1 was used to select the best checkpoint during training. Unfortunately it made my GPU
235
+ burst into flames.
236
+
237
+ Quantitative results TBD in future update.
238
+
239
+ #### Summary
240
+
241
+ Summary will be added when quantitative evaluation is complete.
242
+
243
+ ## Citation
244
+
245
+ **BibTeX:**
246
+
247
+ ```
248
+ @model{hackathorn_2025_sosqwen,
249
+ title = {Science On a Sphere QA Model (Phi-2, LoRA)},
250
+ author = {Hackathorn, Eric},
251
+ year = {2025},
252
+ url = {https://huggingface.co/HacksHaven/phi-2-science-on-a-sphere}
253
+ }
254
+ ```
255
+
256
+ **APA:**
257
+
258
+ Hackathorn, E. (2025). Science On a Sphere QA Model (Phi-2, LoRA). Hugging Face. https://huggingface.co/HacksHaven/phi-2-science-on-a-sphere
259
+
260
+ ## Model Card Contact
261
+
262
+ Author: Eric Hackathorn
263
264
+ Affiliation: NOAA Global Systems Laboratory