FlameF0X commited on
Commit
34de5c6
·
verified ·
1 Parent(s): bd53e23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md CHANGED
@@ -10,7 +10,136 @@ base_model:
10
  - FlameF0X/SnowflakeCore-G1-Tiny
11
  pipeline_tag: text-generation
12
  library_name: transformers
 
 
 
13
  ---
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ### More meta-data
16
  - Release date: July 10 2025
 
10
  - FlameF0X/SnowflakeCore-G1-Tiny
11
  pipeline_tag: text-generation
12
  library_name: transformers
13
+ tags:
14
+ - pre_train
15
+ - costume_code
16
  ---
17
 
18
+ # SnowflakeCore-G1-Tiny-Instruct
19
+
20
+ A custom GPT-style transformer language model built from scratch using PyTorch.
21
+
22
+ ## Model Overview
23
+
24
+ SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with **~407M parameters** (407,334,912) designed for text generation tasks.
25
+
26
+ ### Key Features
27
+ - **2048 token context window** for extended conversations
28
+ - **Mixed precision training** (BF16/FP16) for efficiency
29
+ - **Custom attention implementation** with fused operations
30
+ - **Early stopping mechanisms** N/A
31
+ - **Gradient accumulation** for effective large batch training
32
+
33
+ ### Architecture Specifications
34
+
35
+ | Component | Value |
36
+ |-----------|-------|
37
+ | Model Type | Autoregressive Transformer |
38
+ | Parameters | ~407M |
39
+ | Layers | 24 |
40
+ | Hidden Size | 1024 |
41
+ | Attention Heads | 16 |
42
+ | Head Dimension | 64 |
43
+ | FFN Dimension | 4096 |
44
+ | Context Length | 2048 tokens |
45
+ | Vocabulary Size | 50,257 (GPT-2 tokenizer) |
46
+
47
+ ## Quick Start
48
+
49
+ ### Installation
50
+
51
+ ```bash
52
+ pip install torch transformers # if not already installed
53
+ ```
54
+
55
+ ### Basic Usage
56
+
57
+ ```python
58
+ # N/A
59
+ ```
60
+
61
+ ## Training Details
62
+
63
+ ### Dataset
64
+ - **Source**:
65
+
66
+ ### Training Configuration
67
+ - **Framework**: PyTorch with mixed precision (BF16/FP16)
68
+ - **Optimizer**: AdamW (learning rate: 2e-4)
69
+ - **Batch Size**: N/A
70
+ - **Context Window**: 2048 tokens or 512 tokens
71
+ - **Validation Split**: N/A
72
+ - **Early Stopping**: N/A
73
+ ### Performance Monitoring
74
+ - Training loss tracked per epoch with perplexity calculation
75
+ - Full validation after each epoch
76
+ - Step-level monitoring every 500 steps
77
+ - Comprehensive metrics saved in `training_metrics.json`
78
+
79
+ ## Technical Implementation
80
+
81
+ ### Attention Mechanism
82
+ - **Causal Masking**: Supports autoregressive generation
83
+ - **Key Padding Mask**: Enables batched inference
84
+ - **Scaled Dot-Product**: Head dimension normalization included
85
+
86
+ ### Memory Optimization
87
+ - **Fused Operations**: Reduces memory fragmentation
88
+ - **Mixed Precision**: 30-40% memory reduction
89
+ - **Gradient Accumulation**: Simulates larger batch sizes
90
+ - **Optional Quantization**: Further model compression
91
+
92
+ ### Training Stability
93
+ - **Gradient Clipping**: Prevents exploding gradients
94
+ - **Automatic Loss Scaling**: Mixed precision stability
95
+ - **Early Stopping**: Prevents overfitting with patience mechanisms
96
+
97
+ ## System Requirements
98
+
99
+ ### Memory Requirements
100
+ - **Training**: 16-24GB VRAM (precision dependent)
101
+ - **Inference**: 4-6GB VRAM for standard generation
102
+ - **Context**: Maximum 2048 tokens input length
103
+
104
+ ### Generation Parameters
105
+
106
+ Default configuration:
107
+ ```json
108
+ {
109
+ "do_sample": true,
110
+ "temperature": 1.0,
111
+ "top_p": 0.9,
112
+ "top_k": 50,
113
+ "max_new_tokens": 50,
114
+ "pad_token_id": 50256,
115
+ "eos_token_id": 50256
116
+ }
117
+ ```
118
+
119
+
120
+ ## Limitations
121
+
122
+ - **No HuggingFace `.generate()` support**: Use custom generation function
123
+ - **Output Quality**: May produce repetitive or nonsensical text for some prompts
124
+ - **Hardware Requirements**: GPU recommended for practical inference
125
+ - **Context Window**: Limited to 2048 tokens (or 512 tokens)
126
+
127
+ ## Example Output
128
+
129
+ ```
130
+ # WIP
131
+ ```
132
+
133
+ ## License & Acknowledgments
134
+
135
+ - **License**: [NCSAM](https://github.com/FlameF0X/NCSAM)
136
+ - **Framework**: Built using PyTorch
137
+ - **Dataset**:
138
+ -
139
+ ## Support Me
140
+
141
+ You can support me via [Ko-fi](https://ko-fi.com/flamef0x) or you can try my [Vast.ai](https://cloud.vast.ai/?ref_id=222345&creator_id=222345&name=Efficient%20Pretraining%20GPU%20Template) template!
142
+
143
+
144
  ### More meta-data
145
  - Release date: July 10 2025