Update README.md
Browse files
README.md
CHANGED
@@ -53,58 +53,7 @@ The **Lumo-DeepSeek-R1-8B** model is a fine-tuned version of DeepSeek-R1-Distill
|
|
53 |
### **Training Workflow**
|
54 |
The model was fine-tuned using parameter-efficient methods with **LoRA** to adapt to the Solana-specific domain. Below is a visualization of the training process:
|
55 |
|
56 |
-
|
57 |
-
graph TD
|
58 |
-
%% Base Model Section
|
59 |
-
A[Base Model: DeepSeek-R1-Distill-Llama-8B]
|
60 |
-
style A fill:#f9f,stroke:#333,stroke-width:4px
|
61 |
-
|
62 |
-
%% Architecture Details
|
63 |
-
A -->|Architecture Details| B[Model Architecture]
|
64 |
-
B --> B1[8B Parameters]
|
65 |
-
B --> B2[4-bit Quantization]
|
66 |
-
B --> B3[NF4 Quant Type]
|
67 |
-
B --> B4[FP16 Compute]
|
68 |
-
|
69 |
-
%% LoRA Configuration
|
70 |
-
A -->|LoRA Config| C[LoRA Parameters]
|
71 |
-
C --> C1[Rank: 8]
|
72 |
-
C --> C2[Alpha: 32]
|
73 |
-
C --> C3[Dropout: 0.01]
|
74 |
-
C --> C4[Adapter Size: ~10MB]
|
75 |
-
|
76 |
-
%% Training Configuration
|
77 |
-
A -->|Training Setup| D[Training Config]
|
78 |
-
D --> D1[Learning Rate: 3e-4]
|
79 |
-
D --> D2[Batch Size: 1]
|
80 |
-
D --> D3[Gradient Accum: 4]
|
81 |
-
D --> D4[Epochs: 2]
|
82 |
-
|
83 |
-
%% Optimization Flow
|
84 |
-
D -->|Optimization| E[Training Process]
|
85 |
-
E --> E1[AdamW Optimizer]
|
86 |
-
E --> E2[StepLR Scheduler]
|
87 |
-
E --> E3[FP16 Training]
|
88 |
-
E --> E4[Fast Kernels: SDPA]
|
89 |
-
|
90 |
-
%% Final Model
|
91 |
-
E -->|Results In| F[Lumo-DeepSeek-R1-8B]
|
92 |
-
style F fill:#9ef,stroke:#333,stroke-width:4px
|
93 |
-
|
94 |
-
%% Technical Implementation
|
95 |
-
F -->|Implementation| G[Technical Features]
|
96 |
-
G --> G1[BitsAndBytes 4-bit]
|
97 |
-
G --> G2[Auto Device Mapping]
|
98 |
-
G --> G3[Gradient Checkpointing]
|
99 |
-
G --> G4[Packing Strategy]
|
100 |
-
|
101 |
-
classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px;
|
102 |
-
classDef highlight fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
|
103 |
-
classDef config fill:#fff3e0,stroke:#e65100,stroke-width:2px;
|
104 |
-
|
105 |
-
class B,C,D,E config;
|
106 |
-
class F highlight;
|
107 |
-
```
|
108 |
|
109 |
### **Dataset Sources**
|
110 |
The dataset comprises curated documentation, cookbooks, and API references from the following sources:
|
|
|
53 |
### **Training Workflow**
|
54 |
The model was fine-tuned using parameter-efficient methods with **LoRA** to adapt to the Solana-specific domain. Below is a visualization of the training process:
|
55 |
|
56 |
+

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
### **Dataset Sources**
|
59 |
The dataset comprises curated documentation, cookbooks, and API references from the following sources:
|