stepping1st commited on
Commit
b014b3b
·
verified ·
1 Parent(s): a6cc34d

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +28 -0
  2. tokenizer.json +0 -0
README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ Upstage `solar-pro` tokenizer
6
+ - Vocab size: 64,000
7
+
8
+ Please use this tokenizer for tokenizing inputs for the Upstage [solar-pro](https://developers.upstage.ai/docs/apis/chat) model.
9
+
10
+ You can load it with the tokenizer library like this:
11
+
12
+ ```python
13
+ from tokenizers import Tokenizer
14
+
15
+ tokenizer = Tokenizer.from_pretrained("upstage/solar-pro-tokenizer")
16
+ text = "Hi, how are you?"
17
+ enc = tokenizer.encode(text)
18
+ print("Encoded input:")
19
+ print(enc)
20
+
21
+ inv_vocab = {v: k for k, v in tokenizer.get_vocab().items()}
22
+ tokens = [inv_vocab[token_id] for token_id in enc.ids]
23
+ print("Tokens:")
24
+ print(tokens)
25
+
26
+ number_of_tokens = len(enc.ids)
27
+ print("Number of tokens:", number_of_tokens)
28
+ ```
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff