TheBloke commited on
Commit
b20041c
1 Parent(s): cc97ea0

Initial merged FP16 model commit

Browse files
Files changed (1) hide show
  1. README.md +66 -1
README.md CHANGED
@@ -28,9 +28,74 @@ Note that `config.json` has been set to a sequence length of 8192. This can be m
28
  ## Repositories available
29
 
30
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-GPTQ)
 
31
  * [Unquantised SuperHOT fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-fp16)
32
  * [Unquantised base fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/Monero/Manticore-13b-Chat-Pyg-Guanaco)
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  <!-- footer start -->
35
  ## Discord
36
 
@@ -53,7 +118,7 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
53
 
54
  **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
55
 
56
- **Patreon special mentions**: Pyrater, WelcomeToTheClub, Kalila, Mano Prime, Trenton Dambrowitz, Spiking Neurons AB, Pierre Kircher, Fen Risland, Kevin Schuppel, Luke, Rainer Wilmers, vamX, Gabriel Puliatti, Alex , Karl Bernard, Ajan Kanaga, Talal Aujan, Space Cruiser, ya boyyy, biorpg, Johann-Peter Hartmann, Asp the Wyvern, Ai Maven, Ghost , Preetika Verma, Nikolai Manek, trip7s trip, John Detwiler, Fred von Graf, Artur Olbinski, subjectnull, John Villwock, Junyu Yang, Rod A, Lone Striker, Chris McCloskey, Iucharbius , Matthew Berman, Illia Dulskyi, Khalefa Al-Ahmad, Imad Khwaja, chris gileta, Willem Michiel, Greatston Gnanesh, Derek Yates, K, Alps Aficionado, Oscar Rangel, David Flickinger, Luke Pendergrass, Deep Realms, Eugene Pentland, Cory Kujawski, terasurfer , Jonathan Leane, senxiiz, Joseph William Delisle, Sean Connelly, webtim, zynix , Nathan LeClaire.
57
 
58
  Thank you to all my generous patrons and donaters!
59
 
 
28
  ## Repositories available
29
 
30
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-GPTQ)
31
+ * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU inference](https://huggingface.co/TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-GGML)
32
  * [Unquantised SuperHOT fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-fp16)
33
  * [Unquantised base fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/Monero/Manticore-13b-Chat-Pyg-Guanaco)
34
 
35
+ ## How to use this model from Python code
36
+
37
+ First make sure you have Einops installed:
38
+
39
+ ```
40
+ pip3 install auto-gptq
41
+ ```
42
+
43
+ Then run the following code. `config.json` has been default to a sequence length of 8192, but you can also configure this in your Python code.
44
+
45
+ The provided modelling code, activated with `trust_remote_code=True` will automatically set the `scale` parameter from the configured `max_position_embeddings`. Eg for 8192, `scale` is set to `4`.
46
+
47
+ ```python
48
+ from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM, pipeline
49
+ import argparse
50
+
51
+ model_name_or_path = "TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-fp16"
52
+
53
+ use_triton = False
54
+
55
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
56
+
57
+ config = AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)
58
+ # Change this to the sequence length you want
59
+ config.max_position_embeddings = 8192
60
+
61
+ model = AutoModelForCausalLM.from_quantized(model_name_or_path,
62
+ config=config,
63
+ trust_remote_code=True,
64
+ device_map='auto')
65
+
66
+ # Note: check to confirm if this is correct prompt template is correct for this model!
67
+ prompt = "Tell me about AI"
68
+ prompt_template=f'''USER: {prompt}
69
+ ASSISTANT:'''
70
+
71
+ print("\n\n*** Generate:")
72
+
73
+ input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
74
+ output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
75
+ print(tokenizer.decode(output[0]))
76
+
77
+ # Inference can also be done using transformers' pipeline
78
+
79
+ print("*** Pipeline:")
80
+ pipe = pipeline(
81
+ "text-generation",
82
+ model=model,
83
+ tokenizer=tokenizer,
84
+ max_new_tokens=512,
85
+ temperature=0.7,
86
+ top_p=0.95,
87
+ repetition_penalty=1.15
88
+ )
89
+
90
+ print(pipe(prompt_template)[0]['generated_text'])
91
+ ```
92
+
93
+ ## Using other UIs: monkey patch
94
+
95
+ Provided in the repo is `llama_rope_scaled_monkey_patch.py`, written by @kaiokendev.
96
+
97
+ It can be theoretically be added to any Python UI or custom code to enable the same result as `trust_remote_code=True`. I have not tested this, and it should be superseded by using `trust_remote_code=True`, but I include it for completeness and for interest.
98
+
99
  <!-- footer start -->
100
  ## Discord
101
 
 
118
 
119
  **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
120
 
121
+ **Patreon special mentions**: zynix , ya boyyy, Trenton Dambrowitz, Imad Khwaja, Alps Aficionado, chris gileta, John Detwiler, Willem Michiel, RoA, Mano Prime, Rainer Wilmers, Fred von Graf, Matthew Berman, Ghost , Nathan LeClaire, Iucharbius , Ai Maven, Illia Dulskyi, Joseph William Delisle, Space Cruiser, Lone Striker, Karl Bernard, Eugene Pentland, Greatston Gnanesh, Jonathan Leane, Randy H, Pierre Kircher, Willian Hasse, Stephen Murray, Alex , terasurfer , Edmond Seymore, Oscar Rangel, Luke Pendergrass, Asp the Wyvern, Junyu Yang, David Flickinger, Luke, Spiking Neurons AB, subjectnull, Pyrater, Nikolai Manek, senxiiz, Ajan Kanaga, Johann-Peter Hartmann, Artur Olbinski, Kevin Schuppel, Derek Yates, Kalila, K, Talal Aujan, Khalefa Al-Ahmad, Gabriel Puliatti, John Villwock, WelcomeToTheClub, Daniel P. Andersen, Preetika Verma, Deep Realms, Fen Risland, trip7s trip, webtim, Sean Connelly, Michael Levine, Chris McCloskey, biorpg, vamX, Viktor Bowallius, Cory Kujawski.
122
 
123
  Thank you to all my generous patrons and donaters!
124