Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,136 @@ base_model:
|
|
10 |
- FlameF0X/SnowflakeCore-G1-Tiny
|
11 |
pipeline_tag: text-generation
|
12 |
library_name: transformers
|
|
|
|
|
|
|
13 |
---
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
### More meta-data
|
16 |
- Release date: July 10 2025
|
|
|
10 |
- FlameF0X/SnowflakeCore-G1-Tiny
|
11 |
pipeline_tag: text-generation
|
12 |
library_name: transformers
|
13 |
+
tags:
|
14 |
+
- pre_train
|
15 |
+
- costume_code
|
16 |
---
|
17 |
|
18 |
+
# SnowflakeCore-G1-Tiny-Instruct
|
19 |
+
|
20 |
+
A custom GPT-style transformer language model built from scratch using PyTorch.
|
21 |
+
|
22 |
+
## Model Overview
|
23 |
+
|
24 |
+
SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with **~407M parameters** (407,334,912) designed for text generation tasks.
|
25 |
+
|
26 |
+
### Key Features
|
27 |
+
- **2048 token context window** for extended conversations
|
28 |
+
- **Mixed precision training** (BF16/FP16) for efficiency
|
29 |
+
- **Custom attention implementation** with fused operations
|
30 |
+
- **Early stopping mechanisms** N/A
|
31 |
+
- **Gradient accumulation** for effective large batch training
|
32 |
+
|
33 |
+
### Architecture Specifications
|
34 |
+
|
35 |
+
| Component | Value |
|
36 |
+
|-----------|-------|
|
37 |
+
| Model Type | Autoregressive Transformer |
|
38 |
+
| Parameters | ~407M |
|
39 |
+
| Layers | 24 |
|
40 |
+
| Hidden Size | 1024 |
|
41 |
+
| Attention Heads | 16 |
|
42 |
+
| Head Dimension | 64 |
|
43 |
+
| FFN Dimension | 4096 |
|
44 |
+
| Context Length | 2048 tokens |
|
45 |
+
| Vocabulary Size | 50,257 (GPT-2 tokenizer) |
|
46 |
+
|
47 |
+
## Quick Start
|
48 |
+
|
49 |
+
### Installation
|
50 |
+
|
51 |
+
```bash
|
52 |
+
pip install torch transformers # if not already installed
|
53 |
+
```
|
54 |
+
|
55 |
+
### Basic Usage
|
56 |
+
|
57 |
+
```python
|
58 |
+
# N/A
|
59 |
+
```
|
60 |
+
|
61 |
+
## Training Details
|
62 |
+
|
63 |
+
### Dataset
|
64 |
+
- **Source**:
|
65 |
+
|
66 |
+
### Training Configuration
|
67 |
+
- **Framework**: PyTorch with mixed precision (BF16/FP16)
|
68 |
+
- **Optimizer**: AdamW (learning rate: 2e-4)
|
69 |
+
- **Batch Size**: N/A
|
70 |
+
- **Context Window**: 2048 tokens or 512 tokens
|
71 |
+
- **Validation Split**: N/A
|
72 |
+
- **Early Stopping**: N/A
|
73 |
+
### Performance Monitoring
|
74 |
+
- Training loss tracked per epoch with perplexity calculation
|
75 |
+
- Full validation after each epoch
|
76 |
+
- Step-level monitoring every 500 steps
|
77 |
+
- Comprehensive metrics saved in `training_metrics.json`
|
78 |
+
|
79 |
+
## Technical Implementation
|
80 |
+
|
81 |
+
### Attention Mechanism
|
82 |
+
- **Causal Masking**: Supports autoregressive generation
|
83 |
+
- **Key Padding Mask**: Enables batched inference
|
84 |
+
- **Scaled Dot-Product**: Head dimension normalization included
|
85 |
+
|
86 |
+
### Memory Optimization
|
87 |
+
- **Fused Operations**: Reduces memory fragmentation
|
88 |
+
- **Mixed Precision**: 30-40% memory reduction
|
89 |
+
- **Gradient Accumulation**: Simulates larger batch sizes
|
90 |
+
- **Optional Quantization**: Further model compression
|
91 |
+
|
92 |
+
### Training Stability
|
93 |
+
- **Gradient Clipping**: Prevents exploding gradients
|
94 |
+
- **Automatic Loss Scaling**: Mixed precision stability
|
95 |
+
- **Early Stopping**: Prevents overfitting with patience mechanisms
|
96 |
+
|
97 |
+
## System Requirements
|
98 |
+
|
99 |
+
### Memory Requirements
|
100 |
+
- **Training**: 16-24GB VRAM (precision dependent)
|
101 |
+
- **Inference**: 4-6GB VRAM for standard generation
|
102 |
+
- **Context**: Maximum 2048 tokens input length
|
103 |
+
|
104 |
+
### Generation Parameters
|
105 |
+
|
106 |
+
Default configuration:
|
107 |
+
```json
|
108 |
+
{
|
109 |
+
"do_sample": true,
|
110 |
+
"temperature": 1.0,
|
111 |
+
"top_p": 0.9,
|
112 |
+
"top_k": 50,
|
113 |
+
"max_new_tokens": 50,
|
114 |
+
"pad_token_id": 50256,
|
115 |
+
"eos_token_id": 50256
|
116 |
+
}
|
117 |
+
```
|
118 |
+
|
119 |
+
|
120 |
+
## Limitations
|
121 |
+
|
122 |
+
- **No HuggingFace `.generate()` support**: Use custom generation function
|
123 |
+
- **Output Quality**: May produce repetitive or nonsensical text for some prompts
|
124 |
+
- **Hardware Requirements**: GPU recommended for practical inference
|
125 |
+
- **Context Window**: Limited to 2048 tokens (or 512 tokens)
|
126 |
+
|
127 |
+
## Example Output
|
128 |
+
|
129 |
+
```
|
130 |
+
# WIP
|
131 |
+
```
|
132 |
+
|
133 |
+
## License & Acknowledgments
|
134 |
+
|
135 |
+
- **License**: [NCSAM](https://github.com/FlameF0X/NCSAM)
|
136 |
+
- **Framework**: Built using PyTorch
|
137 |
+
- **Dataset**:
|
138 |
+
-
|
139 |
+
## Support Me
|
140 |
+
|
141 |
+
You can support me via [Ko-fi](https://ko-fi.com/flamef0x) or you can try my [Vast.ai](https://cloud.vast.ai/?ref_id=222345&creator_id=222345&name=Efficient%20Pretraining%20GPU%20Template) template!
|
142 |
+
|
143 |
+
|
144 |
### More meta-data
|
145 |
- Release date: July 10 2025
|