atsuki-yamaguchi commited on
Commit
4e927b9
·
verified ·
1 Parent(s): be99789

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: apache-2.0
4
+ datasets:
5
+ - allenai/MADLAD-400
6
+ language:
7
+ - si
8
+ base_model:
9
+ - Qwen/Qwen2.5-7B-Instruct
10
+ - atsuki-yamaguchi/Qwen2.5-7B-Instruct-si-madlad-mean-tuned
11
+ library_name: transformers
12
+ ---
13
+ # Qwen2.5 7B Instruct for Sinhala: ElChat
14
+
15
+ This model is built on top of Qwen2.5 7B Instruct adapted for Sinhala using 500M target language tokens sampled from MADLAD-400. It has an additional target vocabulary of 10K. The model was trained using the ElChat method.
16
+
17
+ ## Model Details
18
+
19
+ * **Vocabulary**: This model has an additional target vocabulary of 10K.
20
+ * **Target vocabulary initialization**: The target weights of the embedding and LM head were initialized using mean initialization.
21
+ * **Training**: This model was continually pre-trained on 500M target language tokens sampled from MADLAD-400.
22
+ * **Post-processing**: The model was post-processed using the ElChat method.
23
+
24
+
25
+ ## Model Description
26
+
27
+ - **Language:** Sinhala
28
+ - **License:** Apache 2.0
29
+ - **Fine-tuned from model:** Qwen/Qwen2.5-7B-Instruct
30
+
31
+
32
+ ## Model Sources
33
+
34
+ - **Repository:** https://github.com/gucci-j/chat-cve
35
+ - **Paper:** https://arxiv.org/abs/2412.11704
36
+
37
+
38
+ ## How to Get Started with the Model
39
+ Use the code below to get started with the model.
40
+ ```python
41
+ from transformers import AutoTokenizer, AutoModelForCausalLM
42
+
43
+ model = AutoModelForCausalLM.from_pretrained(
44
+ "atsuki-yamaguchi/Qwen2.5-7B-Instruct-si-madlad-mean-slerp0305-emb-special"
45
+ )
46
+ tokenizer = AutoTokenizer.from_pretrained(
47
+ "atsuki-yamaguchi/Qwen2.5-7B-Instruct-si-madlad-mean-slerp0305-emb-special"
48
+ )
49
+ ```
50
+
51
+
52
+ ## Citation
53
+ ```
54
+ @misc{yamaguchi2024vocabularyexpansionchatmodels,
55
+ title={{ElChat}: Adapting Chat Language Models Using Only Target Unlabeled Language Data},
56
+ author={Atsuki Yamaguchi and Terufumi Morishita and Aline Villavicencio and Nikolaos Aletras},
57
+ year={2024},
58
+ eprint={2412.11704},
59
+ archivePrefix={arXiv},
60
+ primaryClass={cs.CL},
61
+ url={https://arxiv.org/abs/2412.11704},
62
+ }
63
+ ```
64
+
65
+