basiphobe commited on
Commit
425ebcd
·
verified ·
1 Parent(s): 856bdbf

Upload merged SCI Assistant model (OpenHermes-2.5-Mistral-7B + SCI LoRA)

Browse files
README.md CHANGED
@@ -1,380 +1,65 @@
1
- ---
2
- base_model: teknium/OpenHermes-2.5-Mistral-7B
3
- library_name: peft
4
- pipeline_tag: text-generation
5
- tags:
6
- - base_model:adapter:teknium/OpenHermes-2.5-Mistral-7B
7
- - lora
8
- - medical
9
- - spinal-cord-injury
10
- - healthcare
11
- - assistant
12
- ---
13
 
14
- # SCI Assistant - Spinal Cord Injury Specialized AI Assistant
15
- A specialized AI assistant fine-tuned specifically for people with spinal cord injuries (SCI). This model is based on OpenHermes-2.5-Mistral-7B and has been trained using a two-phase approach with LoRA (Low-Rank Adaptation) to provide contextually appropriate and medically-informed responses for the SCI community.
16
 
17
  ## Model Description
18
 
19
- This model was fine-tuned using a two-phase training approach:
20
- 1. **Phase 1**: Domain pretraining on SCI-related medical texts and resources
21
- 2. **Phase 2**: Instruction tuning on conversational SCI-focused Q&A pairs
22
 
23
- The model understands the unique challenges, medical realities, and daily life considerations of individuals living with spinal cord injuries.
 
 
 
 
24
 
25
- ## Training Details
26
 
27
- - **Base Model**: teknium/OpenHermes-2.5-Mistral-7B
28
- - **Training Method**: QLoRA (4-bit quantization with LoRA adapters)
29
- - **Training Data**: 119,117 total entries (35,779 domain text + 83,337 instruction pairs)
30
- - **Hardware**: RTX 4070 Super (8GB VRAM)
31
- - **Training Time**: ~20 hours total (Phase 1 + Phase 2)
32
 
33
  ## Usage
34
 
35
  ```python
36
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
37
- from peft import PeftModel
38
- import torch
39
-
40
- # Load model
41
- bnb_config = BitsAndBytesConfig(
42
- load_in_4bit=True,
43
- bnb_4bit_compute_dtype=torch.float16,
44
- )
45
-
46
- base_model = AutoModelForCausalLM.from_pretrained(
47
- "teknium/OpenHermes-2.5-Mistral-7B",
48
- quantization_config=bnb_config,
49
- device_map="auto"
50
- )
51
-
52
- model = PeftModel.from_pretrained(base_model, "basiphobe/sci-assistant")
53
- tokenizer = AutoTokenizer.from_pretrained("basiphobe/sci-assistant")
54
-
55
- # Format prompt with SCI context
56
- system_context = "You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI."
57
 
58
- prompt = f"{system_context}\n\n### Instruction:\n{your_question}\n\n### Response:\n"
 
59
 
60
- # Generate response
 
61
  inputs = tokenizer(prompt, return_tensors="pt")
62
- outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
63
- response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
64
  ```
65
 
66
  ## Intended Use
67
 
68
- This model is designed to:
69
- - Provide SCI-specific information and guidance
70
- - Answer questions about daily life with spinal cord injuries
71
- - Offer practical advice for common SCI challenges
72
- - Support the SCI community with contextually appropriate responses
73
 
74
  ## Limitations
75
 
76
- - This model is for informational purposes only and should not replace professional medical advice
77
- - Always consult with healthcare providers for medical decisions
78
- - The model may not have information about the latest medical developments
79
- - Responses should be verified with medical professionals when making health-related decisions
80
-
81
- ## Direct Use
82
-
83
- This model can be used directly for:
84
- - Educational purposes about spinal cord injuries
85
- - Providing general information and support to the SCI community
86
- - Research into specialized medical AI assistants
87
- - Personal use by individuals seeking SCI-related information
88
-
89
- The model is designed to provide contextually appropriate responses that consider the unique challenges and medical realities of spinal cord injuries.
90
-
91
- ### Downstream Use
92
-
93
- This model can be fine-tuned further for:
94
- - Integration into healthcare applications
95
- - Specialized medical chatbots for rehabilitation centers
96
- - Educational platforms for SCI awareness and training
97
- - Research applications in medical AI
98
- - Custom applications for SCI support organizations
99
-
100
- When used in downstream applications, implementers should:
101
- - Maintain the medical disclaimer requirements
102
- - Ensure proper supervision by medical professionals
103
- - Implement appropriate safety measures and content filtering
104
- - Validate outputs for medical accuracy in their specific use case
105
-
106
- ### Out-of-Scope Use
107
-
108
- This model should NOT be used for:
109
- - **Medical diagnosis or treatment decisions** - Always consult healthcare professionals
110
- - **Emergency medical situations** - Seek immediate professional medical help
111
- - **Legal or financial advice** related to SCI cases
112
- - **Replacement for professional medical consultation**
113
- - **Clinical decision-making** without physician oversight
114
- - **Applications targeting vulnerable populations** without proper safeguards
115
- - **Commercial medical applications** without appropriate medical validation and oversight
116
-
117
- ## Bias, Risks, and Limitations
118
-
119
- ### Medical Limitations
120
- - **Not a substitute for medical professionals**: All medical advice should be verified with qualified healthcare providers
121
- - **Training data limitations**: May not include the most recent medical research or treatments
122
- - **Individual variation**: SCI affects individuals differently; responses may not apply to all cases
123
- - **Geographic bias**: Training data may be biased toward certain healthcare systems or regions
124
-
125
- ### Technical Limitations
126
- - **Hallucination risk**: Like all language models, may generate plausible-sounding but incorrect information
127
- - **Context limitations**: Limited by input context window and may not retain information across long conversations
128
- - **Language limitations**: Primarily trained on English content
129
- - **Update lag**: Cannot access real-time medical research or current events
130
-
131
- ### Bias Considerations
132
- - **Training data bias**: Reflects biases present in source medical literature and online content
133
- - **Demographic representation**: May not equally represent all demographics within the SCI community
134
- - **Healthcare access bias**: May reflect biases toward certain types of healthcare systems
135
- - **Severity bias**: May be more informed about certain types or severities of SCI
136
-
137
- ### Risk Mitigation
138
- - Always include medical disclaimers when using this model
139
- - Implement content filtering for harmful or dangerous advice
140
- - Regular evaluation by medical professionals is recommended
141
- - Monitor outputs for accuracy and appropriateness
142
-
143
- ### Recommendations
144
-
145
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
146
-
147
- ### Recommendations
148
-
149
- Users should be aware of the following recommendations:
150
-
151
- **For Direct Users:**
152
- - Always verify medical information with qualified healthcare professionals
153
- - Use responses as educational/informational starting points, not definitive advice
154
- - Be aware that individual SCI experiences vary significantly
155
- - Seek immediate professional help for urgent medical concerns
156
-
157
- **For Developers/Implementers:**
158
- - Implement clear medical disclaimers in any application using this model
159
- - Provide easy access to professional medical resources alongside model responses
160
- - Consider implementing content filtering for potentially harmful advice
161
- - Regular review by medical professionals is strongly recommended
162
- - Ensure compliance with relevant healthcare regulations (HIPAA, etc.)
163
-
164
- **For Healthcare Organizations:**
165
- - Professional medical oversight is essential when implementing in clinical settings
166
- - Regular validation of model outputs against current medical standards
167
- - Integration should complement, not replace, professional medical consultation
168
- - Staff training on AI limitations and appropriate use cases
169
-
170
- ## Training Details
171
-
172
- ### Training Data
173
-
174
- The training dataset consisted of 119,117 carefully curated entries focused on spinal cord injury information:
175
-
176
- **Domain Pretraining Data (35,779 entries):**
177
- - Medical literature and research papers on SCI
178
- - Educational materials from reputable SCI organizations
179
- - Clinical guidelines and treatment protocols
180
- - Rehabilitation and therapy documentation
181
- - Patient education resources
182
-
183
- **Instruction Tuning Data (83,337 entries):**
184
- - SCI-focused question-answer pairs
185
- - Conversational examples with appropriate medical context
186
- - Real-world scenarios and practical advice situations
187
- - Educational Q&A formatted for instruction following
188
-
189
- All training data was filtered and curated to ensure:
190
- - Sources from reputable medical organizations and healthcare professionals
191
- - Content originally created or reviewed by medical professionals in the SCI field
192
- - Appropriate tone and sensitivity for SCI community
193
- - Removal of potentially harmful or dangerous advice
194
- - Proper medical disclaimers and context
195
-
196
- **Note**: While the source materials were created by medical professionals, this model itself has not undergone independent medical validation.
197
-
198
- ### Training Procedure
199
-
200
- The model was trained using a two-phase approach with QLoRA (Quantized Low-Rank Adaptation):
201
-
202
- **Phase 1 - Domain Pretraining:**
203
- - Focus: Medical terminology and SCI-specific knowledge
204
- - Duration: 2 epochs (~8 hours)
205
- - Data: 35,779 domain text entries
206
- - Objective: Adapt base model to SCI medical domain
207
-
208
- **Phase 2 - Instruction Tuning:**
209
- - Focus: Conversational abilities and response formatting
210
- - Duration: 2 epochs (~12 hours)
211
- - Data: 83,337 instruction-response pairs
212
- - Objective: Teach appropriate response patterns and tone
213
-
214
- #### Preprocessing
215
-
216
- Training data underwent extensive preprocessing:
217
- - Content sourced from materials created by healthcare professionals
218
- - Sensitive content filtering and safety checks
219
- - Standardized formatting for instruction-following
220
- - Quality filtering to remove low-quality or inappropriate content
221
- - Tokenization optimization for efficient training
222
-
223
- #### Training Hyperparameters
224
-
225
- - **Training regime:** 4-bit quantization with LoRA adapters (QLoRA)
226
- - **Learning rate:** 2e-4 with cosine scheduling
227
- - **LoRA rank:** 16
228
- - **LoRA alpha:** 32
229
- - **LoRA dropout:** 0.05
230
- - **Target modules:** q_proj, v_proj
231
- - **Batch size:** 4 with gradient accumulation
232
- - **Max sequence length:** 512 tokens
233
- - **Optimizer:** AdamW with weight decay
234
 
235
- #### Speeds, Sizes, Times
236
 
237
- - **Total training time:** ~20 hours (8h Phase 1 + 12h Phase 2)
238
- - **Hardware:** RTX 4070 Super (8GB VRAM)
239
- - **Final model size:** 30MB (LoRA adapter only)
240
- - **Base model size:** 7B parameters (not included in adapter)
241
- - **Training throughput:** ~3.5 samples/second average
242
- - **Memory usage:** 6-7GB VRAM during training
243
-
244
- ## Evaluation
245
-
246
- ### Testing Data, Factors & Metrics
247
-
248
- #### Testing Data
249
-
250
- The model was evaluated using:
251
- - Held-out test set of SCI-related questions (500 samples)
252
- - Manual review of response quality and appropriateness
253
- - Comparative analysis against general-purpose models on SCI topics
254
- - Assessment of domain-specific knowledge retention
255
-
256
- **Note**: Evaluation was conducted by the model developer, not independent medical professionals.
257
-
258
- #### Factors
259
-
260
- Evaluation considered multiple factors:
261
- - **Medical accuracy**: Correctness of SCI-related information
262
- - **Appropriateness**: Sensitivity and tone for SCI community
263
- - **Contextual relevance**: Understanding of SCI-specific challenges
264
- - **Safety**: Avoidance of harmful or dangerous advice
265
- - **Completeness**: Comprehensive responses to complex questions
266
-
267
- #### Metrics
268
-
269
- - **Medical accuracy score**: Based on consistency with source medical literature (not independently validated)
270
- - **Appropriateness rating**: Developer assessment of tone and sensitivity (4.2/5.0 subjective rating)
271
- - **Response relevance**: SCI-specific context understanding (82% relevance score)
272
- - **Safety compliance**: No obviously harmful medical advice detected in test samples
273
- - **Response quality**: Perplexity improvements over base model for SCI domain
274
-
275
- ### Results
276
-
277
- **Quantitative Results:**
278
- - 40% improvement in SCI domain perplexity over base model
279
- - Responses demonstrate consistency with source medical literature
280
- - 95% safety compliance (no obviously harmful medical advice detected)
281
- - 82% average relevance score for SCI-specific contexts
282
-
283
- **Qualitative Results:**
284
- - Responses demonstrate clear understanding of SCI terminology and concepts
285
- - Appropriate tone and sensitivity for disability community
286
- - Consistent inclusion of medical disclaimers
287
- - Good balance between being helpful and cautious about medical advice
288
-
289
- **Limitations of Evaluation:**
290
- - Evaluation conducted by model developer, not independent medical experts
291
- - No formal clinical validation or testing with SCI patients
292
- - Results based on consistency with training sources, not independent medical verification
293
-
294
- ## Environmental Impact
295
-
296
- Training carbon emissions estimated using energy consumption data:
297
-
298
- - **Hardware Type:** RTX 4070 Super (8GB VRAM)
299
- - **Hours used:** ~20 hours total training time
300
- - **Cloud Provider:** Local training (personal hardware)
301
- - **Compute Region:** North America
302
- - **Carbon Emitted:** Approximately 2.1 kg CO2eq (estimated based on local energy grid)
303
-
304
- The use of QLoRA significantly reduced training time and energy consumption compared to full fine-tuning methods, making this a relatively efficient training approach.
305
-
306
- ## Technical Specifications
307
-
308
- ### Model Architecture and Objective
309
-
310
- - **Base Architecture:** Mistral 7B transformer model
311
- - **Adaptation Method:** QLoRA (Quantized Low-Rank Adaptation)
312
- - **Objective:** Causal language modeling with SCI domain specialization
313
- - **Quantization:** 4-bit precision for memory efficiency
314
- - **LoRA Configuration:** Rank-16 adapters on attention projection layers
315
-
316
- ### Compute Infrastructure
317
-
318
- #### Hardware
319
-
320
- - **GPU:** NVIDIA RTX 4070 Super (8GB VRAM)
321
- - **CPU:** Modern multi-core processor
322
- - **RAM:** 32GB system memory
323
- - **Storage:** NVMe SSD for fast data loading
324
-
325
- #### Software
326
-
327
- - **Framework:** Transformers 4.36+, PEFT 0.16.0
328
- - **Training:** QLoRA with bitsandbytes quantization
329
- - **Environment:** Python 3.10+, PyTorch 2.0+, CUDA 12.1
330
-
331
- ## Citation
332
-
333
- If you use this model in your research or applications, please cite:
334
-
335
- **BibTeX:**
336
- ```bibtex
337
- @misc{sci_assistant_2025,
338
- title={SCI Assistant: A Specialized AI Assistant for Spinal Cord Injury Support},
339
- author={basiphobe},
340
- year={2025},
341
- howpublished={Hugging Face Model Repository},
342
- url={https://huggingface.co/basiphobe/sci-assistant}
343
- }
344
- ```
345
-
346
- **APA:**
347
- basiphobe. (2025). *SCI Assistant: A Specialized AI Assistant for Spinal Cord Injury Support*. Hugging Face. https://huggingface.co/basiphobe/sci-assistant
348
-
349
- ## Glossary
350
-
351
- **SCI**: Spinal Cord Injury - damage to the spinal cord that results in temporary or permanent changes in function
352
-
353
- **QLoRA**: Quantized Low-Rank Adaptation - an efficient fine-tuning method that reduces memory requirements
354
-
355
- **Domain Pretraining**: Training phase focused on learning domain-specific terminology and knowledge
356
-
357
- **Instruction Tuning**: Training phase focused on learning conversational patterns and response formatting
358
-
359
- **Perplexity**: A metric measuring how well a language model predicts text (lower is better)
360
-
361
- **LoRA**: Low-Rank Adaptation - parameter-efficient fine-tuning technique
362
-
363
- ## Model Card Authors
364
-
365
- **Primary Author:** basiphobe
366
- **Model Development:** Individual research project for SCI community support
367
- **Data Sources:** Curated from medical literature and educational materials created by healthcare professionals
368
- **Validation Status:** Model has not undergone independent medical professional validation
369
 
370
- ## Model Card Contact
371
 
372
- For questions, issues, or feedback regarding this model:
373
- - **Hugging Face:** https://huggingface.co/basiphobe/sci-assistant
374
- - **Issues:** Please report issues through Hugging Face model repository
375
- - **Medical Concerns:** Always consult qualified healthcare professionals
376
 
377
- **Important Note:** This model is provided for educational and informational purposes. Always seek professional medical advice for health-related questions and decisions.
378
- ### Framework versions
379
 
380
- - PEFT 0.16.0
 
 
1
+ # SCI Assistant 7B
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ A specialized language model for spinal cord injury (SCI) information and support, based on OpenHermes-2.5-Mistral-7B with custom LoRA fine-tuning.
 
4
 
5
  ## Model Description
6
 
7
+ This model has been fine-tuned specifically to provide accurate, helpful information about spinal cord injuries, including:
 
 
8
 
9
+ - **Medical information** about SCI conditions and symptoms
10
+ - **Practical advice** for daily living with SCI
11
+ - **Equipment recommendations** for wheelchairs, adaptive technology, etc.
12
+ - **Exercise and rehabilitation** guidance
13
+ - **Emotional support** and community resources
14
 
15
+ ## Training Data
16
 
17
+ The model was trained on curated SCI-related content including:
18
+ - Medical literature and research papers
19
+ - Patient education materials
20
+ - Community forums and discussions
21
+ - Rehabilitation guides and resources
22
 
23
  ## Usage
24
 
25
  ```python
26
+ from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
+ model = AutoModelForCausalLM.from_pretrained("your-username/sci-assistant-7b")
29
+ tokenizer = AutoTokenizer.from_pretrained("your-username/sci-assistant-7b")
30
 
31
+ # Example usage
32
+ prompt = "What are the signs of autonomic dysreflexia?"
33
  inputs = tokenizer(prompt, return_tensors="pt")
34
+ outputs = model.generate(**inputs, max_length=200)
35
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
36
  ```
37
 
38
  ## Intended Use
39
 
40
+ - **Educational purposes** - Learning about SCI conditions and management
41
+ - **Community support** - Providing accessible information to SCI community
42
+ - **Research** - Supporting SCI-related research and development
 
 
43
 
44
  ## Limitations
45
 
46
+ - This model provides educational information only
47
+ - Always consult healthcare professionals for medical advice
48
+ - Not a replacement for professional medical care
49
+ - May not reflect the most recent medical developments
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
+ ## Technical Details
52
 
53
+ - **Base Model**: teknium/OpenHermes-2.5-Mistral-7B
54
+ - **Fine-tuning**: LoRA (Low-Rank Adaptation)
55
+ - **Parameters**: ~7 billion
56
+ - **Precision**: FP16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
+ ## License
59
 
60
+ Please respect the original OpenHermes-2.5 license terms.
 
 
 
61
 
62
+ ## Acknowledgments
 
63
 
64
+ Built on the excellent OpenHermes-2.5-Mistral-7B model by Teknium.
65
+ Training data curated from publicly available SCI educational resources.
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 32000,
8
+ "head_dim": 128,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 14336,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "num_key_value_heads": 8,
18
+ "rms_norm_eps": 1e-05,
19
+ "rope_theta": 10000.0,
20
+ "sliding_window": 4096,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "float16",
23
+ "transformers_version": "4.50.3",
24
+ "use_cache": false,
25
+ "vocab_size": 32002
26
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 32000,
5
+ "transformers_version": "4.50.3"
6
+ }
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0af3ba118f0a9418e007b7dfcb2b06cb43c229fd83687c11e9a62739844aeed9
3
+ size 4943178624
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:051612a9761d6d906d2317c79b6b51938f98495c73606ba383cb66ba3c98423f
3
+ size 4999819232
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a019c618c1d3c5fe330a4afe1bda6100fcd8bde456aa7ea18e1f937112ea833
3
+ size 4540532640
model.safetensors.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 14483496960
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00003-of-00003.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00003.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
242
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
243
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
244
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
245
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
246
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
247
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
248
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
249
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
250
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
251
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
252
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
253
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
254
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
255
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
256
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
257
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
258
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
259
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
260
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
261
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
262
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
263
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
264
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
265
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
266
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
267
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
268
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
269
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
270
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
271
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
272
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
273
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
274
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
275
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
276
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
277
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
278
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
279
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
280
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
281
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
282
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
283
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
284
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
285
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
286
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
287
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
288
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
289
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
290
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
291
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
292
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
293
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
294
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
295
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
296
+ "model.norm.weight": "model-00003-of-00003.safetensors"
297
+ }
298
+ }
special_tokens_map.json CHANGED
@@ -13,13 +13,6 @@
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
- "pad_token": {
17
- "content": "<|im_end|>",
18
- "lstrip": false,
19
- "normalized": false,
20
- "rstrip": false,
21
- "single_word": false
22
- },
23
  "unk_token": {
24
  "content": "<unk>",
25
  "lstrip": false,
 
13
  "rstrip": false,
14
  "single_word": false
15
  },
 
 
 
 
 
 
 
16
  "unk_token": {
17
  "content": "<unk>",
18
  "lstrip": false,
tokenizer_config.json CHANGED
@@ -52,7 +52,7 @@
52
  "extra_special_tokens": {},
53
  "legacy": true,
54
  "model_max_length": 1000000000000000019884624838656,
55
- "pad_token": "<|im_end|>",
56
  "sp_model_kwargs": {},
57
  "spaces_between_special_tokens": false,
58
  "tokenizer_class": "LlamaTokenizer",
 
52
  "extra_special_tokens": {},
53
  "legacy": true,
54
  "model_max_length": 1000000000000000019884624838656,
55
+ "pad_token": null,
56
  "sp_model_kwargs": {},
57
  "spaces_between_special_tokens": false,
58
  "tokenizer_class": "LlamaTokenizer",