Mxode commited on
Commit
7673309
·
verified ·
1 Parent(s): 7ddb6db

Create README_zh-CN.md

Browse files
Files changed (1) hide show
  1. README_zh-CN.md +120 -0
README_zh-CN.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # **NanoTranslator-XXL2**
2
+
3
+ [English](README.md) | 简体中文
4
+
5
+ ## Introduction
6
+
7
+ 这是 NanoTranslator 的 **XX-Large-2** 型号,目前仅支持**英译中**。仓库中同时提供了 ONNX 版本的模型。
8
+
9
+ 所有模型均收录于 [NanoTranslator Collection](https://huggingface.co/collections/Mxode/nanotranslator-66e1de2ba352e926ae865bd2) 中。
10
+
11
+ | | P. | Arch. | Act. | V. | H. | I. | L. | A.H. | K.H. | Tie |
12
+ | :--: | :-----: | :--: | :--: | :--: | :-----: | :---: | :------: | :--: | :--: | :--: |
13
+ | [XXL2](https://huggingface.co/Mxode/NanoTranslator-XXL2) | 102 | LLaMA | SwiGLU | 16K | 1120 | 3072 | 6 | 16 | 8 | True |
14
+ | [XXL](https://huggingface.co/Mxode/NanoTranslator-XXL) | 100 | LLaMA | SwiGLU | 16K | 768 | 4096 | 8 | 24 | 8 | True |
15
+ | [XL](https://huggingface.co/Mxode/NanoTranslator-XL) | 78 | LLaMA | GeGLU | 16K | 768 | 4096 | 6 | 24 | 8 | True |
16
+ | [L](https://huggingface.co/Mxode/NanoTranslator-L) | 49 | LLaMA | GeGLU | 16K | 512 | 2816 | 8 | 16 | 8 | True |
17
+ | [M2](https://huggingface.co/Mxode/NanoTranslator-M2) | 22 | Qwen2 | GeGLU | 4K | 432 | 2304 | 6 | 24 | 8 | True |
18
+ | [M](https://huggingface.co/Mxode/NanoTranslator-M) | 22 | LLaMA | SwiGLU | 8K | 256 | 1408 | 16 | 16 | 4 | True |
19
+ | [S](https://huggingface.co/Mxode/NanoTranslator-S) | 9 | LLaMA | SwiGLU | 4K | 168 | 896 | 16 | 12 | 4 | True |
20
+ | [XS](https://huggingface.co/Mxode/NanoTranslator-XS) | 2 | LLaMA | SwiGLU | 2K | 96 | 512 | 12 | 12 | 4 | True |
21
+
22
+ - **P.** - Parameters (in million)
23
+ - **V.** - vocab size
24
+ - **H.** - hidden size
25
+ - **I.** - intermediate size
26
+ - **L.** - num layers
27
+ - **A.H.** - num attention heads
28
+ - **K.H.** - num kv heads
29
+ - **Tie** - tie word embeddings
30
+
31
+
32
+
33
+ ## How to use
34
+
35
+ Prompt 格式如下:
36
+
37
+ ```
38
+ <|im_start|> {English Text} <|endoftext|>
39
+ ```
40
+
41
+ ### Directly using transformers
42
+
43
+ ```python
44
+ import torch
45
+ from transformers import AutoTokenizer, AutoModelForCausalLM
46
+
47
+ model_path = 'Mxode/NanoTranslator-XXL2'
48
+
49
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
50
+ model = AutoModelForCausalLM.from_pretrained(model_path)
51
+
52
+ def translate(text: str, model, **kwargs):
53
+ generation_args = dict(
54
+ max_new_tokens = kwargs.pop("max_new_tokens", 512),
55
+ do_sample = kwargs.pop("do_sample", True),
56
+ temperature = kwargs.pop("temperature", 0.55),
57
+ top_p = kwargs.pop("top_p", 0.8),
58
+ top_k = kwargs.pop("top_k", 40),
59
+ **kwargs
60
+ )
61
+
62
+ prompt = "<|im_start|>" + text + "<|endoftext|>"
63
+ model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
64
+
65
+ generated_ids = model.generate(model_inputs.input_ids, **generation_args)
66
+ generated_ids = [
67
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
68
+ ]
69
+
70
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
71
+ return response
72
+
73
+ text = "I love to watch my favorite TV series."
74
+
75
+ response = translate(text, model, max_new_tokens=64, do_sample=False)
76
+ print(response)
77
+ ```
78
+
79
+
80
+ ### ONNX
81
+
82
+ 根据实际测试,使用 ONNX 模型推理会比直接使用 transformers 推理要**快 2~10 倍**。
83
+
84
+ 如果希望使用 ONNX 模型,那么你需要手动切换到 [onnx 分支](https://huggingface.co/Mxode/NanoTranslator-XXL2/tree/onnx)并从本地加载。
85
+
86
+ 参考文档:
87
+
88
+ - [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
89
+ - [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)
90
+
91
+ **Using ORTModelForCausalLM**
92
+
93
+ ```python
94
+ from optimum.onnxruntime import ORTModelForCausalLM
95
+ from transformers import AutoTokenizer
96
+
97
+ model_path = "your/folder/to/onnx_model"
98
+
99
+ ort_model = ORTModelForCausalLM.from_pretrained(model_path)
100
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
101
+
102
+ text = "I love to watch my favorite TV series."
103
+
104
+ response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
105
+ print(response)
106
+ ```
107
+
108
+ **Using pipeline**
109
+
110
+ ```python
111
+ from optimum.pipelines import pipeline
112
+
113
+ model_path = "your/folder/to/onnx_model"
114
+ pipe = pipeline("text-generation", model=model_path, accelerator="ort")
115
+
116
+ text = "I love to watch my favorite TV series."
117
+
118
+ response = pipe(text, max_new_tokens=64, do_sample=False)
119
+ response
120
+ ```