lbourdois commited on
Commit
a75bdde
·
verified ·
1 Parent(s): 5723a16

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +47 -35
README.md CHANGED
@@ -1,36 +1,48 @@
1
- ---
2
- license: mit
3
- datasets:
4
- - isaiahbjork/chain-of-thought
5
- base_model:
6
- - Qwen/Qwen2.5-3B-Instruct
7
- library_name: mlx
8
- language:
9
- - en
10
- pipeline_tag: text-generation
11
- ---
12
-
13
- ## Model Overview
14
-
15
- This model is a fine-tuned version of the Qwen2.5-3B base model, enhanced using Low-Rank Adaptation (LoRA) techniques via the MLX framework. The fine-tuning process utilized the isaiahbjork/chain-of-thought dataset, comprising 7,143 examples, over 600 iterations. This enhancement aims to improve the model's performance in tasks requiring multi-step reasoning and problem-solving.
16
-
17
-
18
- ## Model Architecture
19
-
20
- - Base Model: Qwen2.5-3B
21
- - Model Type: Causal Language Model
22
- - Architecture: Transformer with Rotary Position Embedding (RoPE),
23
- SwiGLU activation, RMSNorm normalization, attention QKV bias, and tied word embeddings
24
- - Parameters: 3.09 billion
25
- - Layers: 36
26
- - Attention Heads: 16 for query, 2 for key and value (GQA)
27
-
28
- ## Fine-Tuning Details
29
-
30
- - Technique: Low-Rank Adaptation (LoRA)
31
- - Framework: MLX
32
- - Dataset: isaiahbjork/chain-of-thought
33
- - Dataset Size: 7,143 examples
34
- - Iterations: 600
35
-
 
 
 
 
 
 
 
 
 
 
 
 
36
  LoRA was employed to efficiently fine-tune the model by adjusting a subset of parameters, reducing computational requirements while maintaining performance. The MLX framework facilitated this process, leveraging Apple silicon hardware for optimized training.
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - isaiahbjork/chain-of-thought
5
+ base_model:
6
+ - Qwen/Qwen2.5-3B-Instruct
7
+ library_name: mlx
8
+ language:
9
+ - zho
10
+ - eng
11
+ - fra
12
+ - spa
13
+ - por
14
+ - deu
15
+ - ita
16
+ - rus
17
+ - jpn
18
+ - kor
19
+ - vie
20
+ - tha
21
+ - ara
22
+ pipeline_tag: text-generation
23
+ ---
24
+
25
+ ## Model Overview
26
+
27
+ This model is a fine-tuned version of the Qwen2.5-3B base model, enhanced using Low-Rank Adaptation (LoRA) techniques via the MLX framework. The fine-tuning process utilized the isaiahbjork/chain-of-thought dataset, comprising 7,143 examples, over 600 iterations. This enhancement aims to improve the model's performance in tasks requiring multi-step reasoning and problem-solving.
28
+
29
+
30
+ ## Model Architecture
31
+
32
+ - Base Model: Qwen2.5-3B
33
+ - Model Type: Causal Language Model
34
+ - Architecture: Transformer with Rotary Position Embedding (RoPE),
35
+ SwiGLU activation, RMSNorm normalization, attention QKV bias, and tied word embeddings
36
+ - Parameters: 3.09 billion
37
+ - Layers: 36
38
+ - Attention Heads: 16 for query, 2 for key and value (GQA)
39
+
40
+ ## Fine-Tuning Details
41
+
42
+ - Technique: Low-Rank Adaptation (LoRA)
43
+ - Framework: MLX
44
+ - Dataset: isaiahbjork/chain-of-thought
45
+ - Dataset Size: 7,143 examples
46
+ - Iterations: 600
47
+
48
  LoRA was employed to efficiently fine-tune the model by adjusting a subset of parameters, reducing computational requirements while maintaining performance. The MLX framework facilitated this process, leveraging Apple silicon hardware for optimized training.