Mxode commited on
Commit
7ddb6db
·
verified ·
1 Parent(s): fc26079

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -3
README.md CHANGED
@@ -1,3 +1,132 @@
1
- ---
2
- license: gpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ datasets:
4
+ - Mxode/BiST
5
+ language:
6
+ - en
7
+ - zh
8
+ pipeline_tag: translation
9
+ library_name: transformers
10
+ ---
11
+ # **NanoTranslator-XXL2**
12
+
13
+ English | [简体中文](README_zh-CN.md)
14
+
15
+ ## Introduction
16
+
17
+ This is the **xx-large-2** model of the NanoTranslator, currently supported only in **English to Chinese**.
18
+
19
+ The ONNX version of the model is also available in the repository.
20
+
21
+ All models are collected in the [NanoTranslator Collection](https://huggingface.co/collections/Mxode/nanotranslator-66e1de2ba352e926ae865bd2).
22
+
23
+ | | P. | Arch. | Act. | V. | H. | I. | L. | A.H. | K.H. | Tie |
24
+ | :--: | :-----: | :--: | :--: | :--: | :-----: | :---: | :------: | :--: | :--: | :--: |
25
+ | [XXL2](https://huggingface.co/Mxode/NanoTranslator-XXL2) | 102 | LLaMA | SwiGLU | 16K | 1120 | 3072 | 6 | 16 | 8 | True |
26
+ | [XXL](https://huggingface.co/Mxode/NanoTranslator-XXL) | 100 | LLaMA | SwiGLU | 16K | 768 | 4096 | 8 | 24 | 8 | True |
27
+ | [XL](https://huggingface.co/Mxode/NanoTranslator-XL) | 78 | LLaMA | GeGLU | 16K | 768 | 4096 | 6 | 24 | 8 | True |
28
+ | [L](https://huggingface.co/Mxode/NanoTranslator-L) | 49 | LLaMA | GeGLU | 16K | 512 | 2816 | 8 | 16 | 8 | True |
29
+ | [M2](https://huggingface.co/Mxode/NanoTranslator-M2) | 22 | Qwen2 | GeGLU | 4K | 432 | 2304 | 6 | 24 | 8 | True |
30
+ | [M](https://huggingface.co/Mxode/NanoTranslator-M) | 22 | LLaMA | SwiGLU | 8K | 256 | 1408 | 16 | 16 | 4 | True |
31
+ | [S](https://huggingface.co/Mxode/NanoTranslator-S) | 9 | LLaMA | SwiGLU | 4K | 168 | 896 | 16 | 12 | 4 | True |
32
+ | [XS](https://huggingface.co/Mxode/NanoTranslator-XS) | 2 | LLaMA | SwiGLU | 2K | 96 | 512 | 12 | 12 | 4 | True |
33
+
34
+ - **P.** - Parameters (in million)
35
+ - **V.** - vocab size
36
+ - **H.** - hidden size
37
+ - **I.** - intermediate size
38
+ - **L.** - num layers
39
+ - **A.H.** - num attention heads
40
+ - **K.H.** - num kv heads
41
+ - **Tie** - tie word embeddings
42
+
43
+
44
+
45
+ ## How to use
46
+
47
+ Prompt format as follows:
48
+
49
+ ```
50
+ <|im_start|> {English Text} <|endoftext|>
51
+ ```
52
+
53
+ ### Directly using transformers
54
+
55
+ ```python
56
+ import torch
57
+ from transformers import AutoTokenizer, AutoModelForCausalLM
58
+
59
+ model_path = 'Mxode/NanoTranslator-XXL2'
60
+
61
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
62
+ model = AutoModelForCausalLM.from_pretrained(model_path)
63
+
64
+ def translate(text: str, model, **kwargs):
65
+ generation_args = dict(
66
+ max_new_tokens = kwargs.pop("max_new_tokens", 512),
67
+ do_sample = kwargs.pop("do_sample", True),
68
+ temperature = kwargs.pop("temperature", 0.55),
69
+ top_p = kwargs.pop("top_p", 0.8),
70
+ top_k = kwargs.pop("top_k", 40),
71
+ **kwargs
72
+ )
73
+
74
+ prompt = "<|im_start|>" + text + "<|endoftext|>"
75
+ model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
76
+
77
+ generated_ids = model.generate(model_inputs.input_ids, **generation_args)
78
+ generated_ids = [
79
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
80
+ ]
81
+
82
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
83
+ return response
84
+
85
+ text = "I love to watch my favorite TV series."
86
+
87
+ response = translate(text, model, max_new_tokens=64, do_sample=False)
88
+ print(response)
89
+ ```
90
+
91
+
92
+ ### ONNX
93
+
94
+ It has been measured that reasoning with ONNX models will be **2-10 times faster** than reasoning directly with transformers models.
95
+
96
+ You should switch to [onnx branch](https://huggingface.co/Mxode/NanoTranslator-XXL2/tree/onnx) manually and download to local.
97
+
98
+ reference docs:
99
+
100
+ - [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
101
+ - [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)
102
+
103
+ **Using ORTModelForCausalLM**
104
+
105
+ ```python
106
+ from optimum.onnxruntime import ORTModelForCausalLM
107
+ from transformers import AutoTokenizer
108
+
109
+ model_path = "your/folder/to/onnx_model"
110
+
111
+ ort_model = ORTModelForCausalLM.from_pretrained(model_path)
112
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
113
+
114
+ text = "I love to watch my favorite TV series."
115
+
116
+ response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
117
+ print(response)
118
+ ```
119
+
120
+ **Using pipeline**
121
+
122
+ ```python
123
+ from optimum.pipelines import pipeline
124
+
125
+ model_path = "your/folder/to/onnx_model"
126
+ pipe = pipeline("text-generation", model=model_path, accelerator="ort")
127
+
128
+ text = "I love to watch my favorite TV series."
129
+
130
+ response = pipe(text, max_new_tokens=64, do_sample=False)
131
+ response
132
+ ```