Safetensors
llama
haoranxu commited on
Commit
2fe8e06
·
verified ·
1 Parent(s): 9cd2ee4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - oscar-corpus/OSCAR-2301
5
+ - allenai/nllb
6
+ - Helsinki-NLP/opus-100
7
+ language:
8
+ - en
9
+ - az
10
+ - kk
11
+ - ky
12
+ - tr
13
+ - uz
14
+ - ar
15
+ - he
16
+ - fa
17
+ base_model:
18
+ - haoranxu/ALMA-13B-Pretrain
19
+ - meta-llama/Llama-2-13b-hf
20
+ ---
21
+
22
+
23
+ X-ALMA builds upon [ALMA-R](https://arxiv.org/pdf/2401.08417) by expanding support from 6 to 50 languages. It utilizes a plug-and-play architecture with language-specific modules, complemented by a carefully designed training recipe. This release includes the **language-specific X-ALMA LoRA module and a merged model that supports the languages in Group 8: English (en), Azerbaijani (az), Kazakh (kk), Kyrgyz (ky), Turkish (tr), Uzbek (uz), Arabic (ar), Hebrew (he), and Persian (fa)**.
24
+
25
+ Model X-ALMA checkpoints are released at huggingface:
26
+ | Models | Base Model Link | Description |
27
+ |:-------------:|:---------------:|:---------------:|
28
+ | X-ALMA | [haoranxu/X-ALMA]([https://huggingface.co/haoranxu/ALMA-7B](https://huggingface.co/haoranxu/X-ALMA)) | X-ALMA model with all its modules |
29
+ | X-ALMA-13B-Pretrain | [haoranxu/X-ALMA-13B-Pretrain](https://huggingface.co/haoranxu/X-ALMA-13B-Pretrain) | X-ALMA 13B multilingual pre-trained base model |
30
+ | X-ALMA-Group1 | [haoranxu/X-ALMA-13B-Group1](https://huggingface.co/haoranxu/X-ALMA-13B-Group1) | X-ALMA group1 specific module and the merged model |
31
+ | X-ALMA-Group2 | [haoranxu/X-ALMA-13B-Group2](https://huggingface.co/haoranxu/X-ALMA-13B-Group2) | X-ALMA group2 specific module and the merged model |
32
+ | X-ALMA-Group3 | [haoranxu/X-ALMA-13B-Group3](https://huggingface.co/haoranxu/X-ALMA-13B-Group3) | X-ALMA group3 specific module and the merged model |
33
+ | X-ALMA-Group4 | [haoranxu/X-ALMA-13B-Group4](https://huggingface.co/haoranxu/X-ALMA-13B-Group4) | X-ALMA group4 specific module and the merged model |
34
+ | X-ALMA-Group5 | [haoranxu/X-ALMA-13B-Group5](https://huggingface.co/haoranxu/X-ALMA-13B-Group5) | X-ALMA group5 specific module and the merged model |
35
+ | X-ALMA-Group6 | [haoranxu/X-ALMA-13B-Group6](https://huggingface.co/haoranxu/X-ALMA-13B-Group6) | X-ALMA group6 specific module and the merged model |
36
+ | X-ALMA-Group7 | [haoranxu/X-ALMA-13B-Group7](https://huggingface.co/haoranxu/X-ALMA-13B-Group7) | X-ALMA group7 specific module and the merged model |
37
+ | X-ALMA-Group8 | [haoranxu/X-ALMA-13B-Group8](https://huggingface.co/haoranxu/X-ALMA-13B-Group8) | X-ALMA group8 specific module and the merged model |
38
+
39
+ ## A quick start:
40
+ There are three ways to load X-ALMA for translation. An example of translating "我爱机器翻译。" into English (X-ALMA should also able to do multilingual open-ended QA).
41
+
42
+ **The first way**: loading the merged model where the language-specific module has been merged into the base model **(Recommended)**:
43
+ ```
44
+ import torch
45
+ from transformers import AutoModelForCausalLM
46
+ from transformers import AutoTokenizer
47
+ from peft import PeftModel
48
+
49
+ GROUP2LANG = {
50
+ 1: ["da", "nl", "de", "is", "no", "sv", "af"],
51
+ 2: ["ca", "ro", "gl", "it", "pt", "es"],
52
+ 3: ["bg", "mk", "sr", "uk", "ru"],
53
+ 4: ["id", "ms", "th", "vi", "mg", "fr"],
54
+ 5: ["hu", "el", "cs", "pl", "lt", "lv"],
55
+ 6: ["ka", "zh", "ja", "ko", "fi", "et"],
56
+ 7: ["gu", "hi", "mr", "ne", "ur"],
57
+ 8: ["az", "kk", "ky", "tr", "uz", "ar", "he", "fa"],
58
+ }
59
+ LANG2GROUP = {lang: str(group) for group, langs in GROUP2LANG.items() for lang in langs}
60
+ group_id = LANG2GROUP["zh"]
61
+
62
+ model = AutoModelForCausalLM.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", torch_dtype=torch.float16, device_map="auto")
63
+ tokenizer = AutoTokenizer.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", padding_side='left')
64
+
65
+ # Add the source sentence into the prompt template
66
+ prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
67
+
68
+ # X-ALMA needs chat template but ALMA and ALMA-R don't need it.
69
+ chat_style_prompt = [{"role": "user", "content": prompt}]
70
+ prompt = tokenizer.apply_chat_template(chat_style_prompt, tokenize=False, add_generation_prompt=True)
71
+
72
+ input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()
73
+
74
+ # Translation
75
+ with torch.no_grad():
76
+ generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
77
+ outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
78
+ print(outputs)
79
+ ```
80
+
81
+ **The second way**: loading the base model and language-specific module **(Recommended)**:
82
+ ```
83
+ model = AutoModelForCausalLM.from_pretrained("haoranxu/X-ALMA-13B-Pretrain", torch_dtype=torch.float16, device_map="auto")
84
+ model = PeftModel.from_pretrained(model, f"haoranxu/X-ALMA-13B-Group{group_id}")
85
+ tokenizer = AutoTokenizer.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", padding_side='left')
86
+ ```
87
+
88
+ **The third way**: loading the base model with all language-specific modules like MoE: (Require large GPU memory)
89
+ ```
90
+ from modeling_xalma import XALMAForCausalLM
91
+ model = XALMAForCausalLM.from_pretrained("haoranxu/X-ALMA", torch_dtype=torch.float16, device_map="auto")
92
+ tokenizer = AutoTokenizer.from_pretrained("haoranxu/X-ALMA", padding_side='left')
93
+
94
+ # Add `lang="zh"`: specify the language to instruct the model on which group to use for the third loading method during generation.
95
+ generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9, lang="zh")
96
+ ```