Add languages tag
Browse files
README.md
CHANGED
@@ -1,59 +1,178 @@
|
|
1 |
-
---
|
2 |
-
tags:
|
3 |
-
- unsloth
|
4 |
-
base_model:
|
5 |
-
- Qwen/Qwen3-30B-A3B-Base
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
-
|
15 |
-
-
|
16 |
-
-
|
17 |
-
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
-
|
23 |
-
-
|
24 |
-
-
|
25 |
-
-
|
26 |
-
-
|
27 |
-
-
|
28 |
-
-
|
29 |
-
-
|
30 |
-
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
```
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- unsloth
|
4 |
+
base_model:
|
5 |
+
- Qwen/Qwen3-30B-A3B-Base
|
6 |
+
language:
|
7 |
+
- eng
|
8 |
+
- fra
|
9 |
+
- por
|
10 |
+
- deu
|
11 |
+
- ron
|
12 |
+
- swe
|
13 |
+
- dan
|
14 |
+
- bul
|
15 |
+
- rus
|
16 |
+
- ces
|
17 |
+
- ell
|
18 |
+
- ukr
|
19 |
+
- spa
|
20 |
+
- nld
|
21 |
+
- slk
|
22 |
+
- hrv
|
23 |
+
- pol
|
24 |
+
- lit
|
25 |
+
- nob
|
26 |
+
- nno
|
27 |
+
- fas
|
28 |
+
- slv
|
29 |
+
- guj
|
30 |
+
- lav
|
31 |
+
- ita
|
32 |
+
- oci
|
33 |
+
- nep
|
34 |
+
- mar
|
35 |
+
- bel
|
36 |
+
- srp
|
37 |
+
- ltz
|
38 |
+
- vec
|
39 |
+
- asm
|
40 |
+
- cym
|
41 |
+
- szl
|
42 |
+
- ast
|
43 |
+
- hne
|
44 |
+
- awa
|
45 |
+
- mai
|
46 |
+
- bho
|
47 |
+
- snd
|
48 |
+
- gle
|
49 |
+
- fao
|
50 |
+
- hin
|
51 |
+
- pan
|
52 |
+
- ben
|
53 |
+
- ori
|
54 |
+
- tgk
|
55 |
+
- ydd
|
56 |
+
- lmo
|
57 |
+
- lij
|
58 |
+
- scn
|
59 |
+
- fur
|
60 |
+
- srd
|
61 |
+
- glg
|
62 |
+
- cat
|
63 |
+
- isl
|
64 |
+
- als
|
65 |
+
- lim
|
66 |
+
- prs
|
67 |
+
- afr
|
68 |
+
- mkd
|
69 |
+
- sin
|
70 |
+
- urd
|
71 |
+
- mag
|
72 |
+
- bos
|
73 |
+
- hye
|
74 |
+
- zho
|
75 |
+
- yue
|
76 |
+
- mya
|
77 |
+
- ara
|
78 |
+
- ars
|
79 |
+
- apc
|
80 |
+
- arz
|
81 |
+
- ary
|
82 |
+
- acm
|
83 |
+
- acq
|
84 |
+
- aeb
|
85 |
+
- heb
|
86 |
+
- mlt
|
87 |
+
- ind
|
88 |
+
- zsm
|
89 |
+
- tgl
|
90 |
+
- ceb
|
91 |
+
- jav
|
92 |
+
- sun
|
93 |
+
- min
|
94 |
+
- ban
|
95 |
+
- bjn
|
96 |
+
- pag
|
97 |
+
- ilo
|
98 |
+
- war
|
99 |
+
- tam
|
100 |
+
- tel
|
101 |
+
- kan
|
102 |
+
- mal
|
103 |
+
- tur
|
104 |
+
- azj
|
105 |
+
- uzn
|
106 |
+
- kaz
|
107 |
+
- bak
|
108 |
+
- tat
|
109 |
+
- tha
|
110 |
+
- lao
|
111 |
+
- fin
|
112 |
+
- est
|
113 |
+
- hun
|
114 |
+
- vie
|
115 |
+
- khm
|
116 |
+
- jpn
|
117 |
+
- kor
|
118 |
+
- kat
|
119 |
+
- eus
|
120 |
+
- hat
|
121 |
+
- pap
|
122 |
+
- kea
|
123 |
+
- tpi
|
124 |
+
- swa
|
125 |
+
---
|
126 |
+
# Qwen3-30B-A3B
|
127 |
+
|
128 |
+
## Qwen3 Highlights
|
129 |
+
|
130 |
+
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.
|
131 |
+
Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:
|
132 |
+
|
133 |
+
- **Expanded Higher-Quality Pre-training Corpus:** Qwen3 is pre-trained on 36 trillion tokens across 119 languages β tripling the language coverage of Qwen2.5 β with a much richer mix of high-quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
|
134 |
+
- **Training Techniques and Model Architecture:** Qwen3 incorporates a series of training techiques and architectural refinements, including global-batch load balancing loss for MoE models and qk layernorm for all models, leading to improved stability and overall performance.
|
135 |
+
- **Three-stage Pre-training:** Stage 1 focuses on broad language modeling and general knowledge acquisition, Stage 2 improves reasoning skills like STEM, coding, and logical reasoning, and Stage 3 enhances long-context comprehension by extending training sequence lengths up to 32k tokens.
|
136 |
+
- **Scaling Law Guided Hyperparameter Tuning:** Through comprehensive scaling law studies across the three-stage pre-training pipeline, Qwen3 systematically tunes critical hyperparameters β such as learning rate scheduler and batch size β separately for dense and MoE models, resulting in better training dynamics and final performance across different model scales.
|
137 |
+
|
138 |
+
## Model Overview
|
139 |
+
|
140 |
+
**Qwen3-30B-A3B** has the following features:
|
141 |
+
- Type: Causal Language Models
|
142 |
+
- Training Stage: Pretraining & Post-training
|
143 |
+
- Number of Parameters: 30.5B in total and 3.3B activated
|
144 |
+
- Number of Paramaters (Non-Embedding): 29.9B
|
145 |
+
- Number of Layers: 48
|
146 |
+
- Number of Attention Heads (GQA): 32 for Q and 4 for KV
|
147 |
+
- Number of Experts: 128
|
148 |
+
- Number of Activated Experts: 8
|
149 |
+
- Context Length: 32,768
|
150 |
+
|
151 |
+
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
152 |
+
|
153 |
+
## Requirements
|
154 |
+
|
155 |
+
The code of Qwen3-MoE has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
|
156 |
+
|
157 |
+
With `transformers<4.51.0`, you will encounter the following error:
|
158 |
+
```
|
159 |
+
KeyError: 'qwen3_moe'
|
160 |
+
```
|
161 |
+
|
162 |
+
## Evaluation & Performance
|
163 |
+
|
164 |
+
Detailed evaluation results are reported in this [π blog](https://qwenlm.github.io/blog/qwen3/).
|
165 |
+
|
166 |
+
### Citation
|
167 |
+
|
168 |
+
If you find our work helpful, feel free to give us a cite.
|
169 |
+
|
170 |
+
```
|
171 |
+
@misc{qwen3,
|
172 |
+
title = {Qwen3},
|
173 |
+
url = {https://qwenlm.github.io/blog/qwen3/},
|
174 |
+
author = {Qwen Team},
|
175 |
+
month = {April},
|
176 |
+
year = {2025}
|
177 |
+
}
|
178 |
```
|