JustJaro commited on
Commit
85e37df
·
verified ·
1 Parent(s): 6b17329

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,455 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - gptq
4
+ - quantization
5
+ - 4bit
6
+ - confidentialmind
7
+ - text-generation
8
+ - apache2.0
9
+ - mistral-small-24b
10
+ ---
11
+ # 🔥 Quantized Model: Virtuoso-Medium-v2_gptq_g128_4bit 🔥
12
+
13
+ This is a 4-bit quantized version of [arcee-ai/Virtuoso-Medium-v2](https://huggingface.co/arcee-ai/Virtuoso-Medium-v2) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
14
+ It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 128 resulting in a
15
+ smaller,
16
+ faster model with minimal performance degradation.
17
+
18
+ Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
19
+
20
+ *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.
21
+
22
+ ## Model Details
23
+ - **Original Model:** [arcee-ai/Virtuoso-Medium-v2](https://huggingface.co/arcee-ai/Virtuoso-Medium-v2)
24
+ - **Quantized Model:** Virtuoso-Medium-v2_gptq_g128_4bit (this repository)
25
+ - **Quantization Method:** GPTQ (4-bit, group size 128)
26
+ - **Quantization Library:** [GPTQModel](https://github.com/ModelCloud/GPTQModel/tree/main)
27
+ - **Calibration Dataset:** neuralmagic/LLM_compression_calibration (using 512 samples with seq len 4096)
28
+ - **Quantized by:** [ConfidentialMind.com](https://www.confidentialmind.com)
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from gptqmodel import GPTQModel
34
+ from transformers import AutoTokenizer
35
+
36
+ # Use the local directory or JustJaro/Virtuoso-Medium-v2_gptq_g128_4bit after upload
37
+ quantized_model_id = "/home/jaro/models/quantized/Virtuoso-Medium-v2_gptq_g128_4bit" # or "JustJaro/Virtuoso-Medium-v2_gptq_g128_4bit"
38
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_id)
39
+ model = GPTQModel.load(quantized_model_id, device="cuda:0") # or "cpu"
40
+
41
+ input_text = "This is a test prompt"
42
+ inputs = tokenizer(input_text, return_tensors="pt").to("cuda:0")
43
+ outputs = model.generate(**inputs)
44
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
45
+ ```
46
+
47
+ ## Package Versions and Installation Instructions
48
+
49
+ See pyproject.toml for the exact UV project file. See the [GPTQModel](
50
+ https://github.com/ModelCloud/GPTQModel/tree/main) repo for more details. on how to install the package.
51
+
52
+ Use the provided pyproject.toml:
53
+
54
+ ```bash
55
+ uv venv
56
+ source venv/bin/activate
57
+ uv sync
58
+ ```
59
+
60
+ ### Environment Variables
61
+
62
+ ```bash
63
+ HF_TOKEN=<YOUR_HF_TOKEN>
64
+ TOKENIZERS_PARALLELISM="true"
65
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
66
+ ```
67
+
68
+ ## Quantization Script
69
+ Below is the exact quantize.py script used to generate this model (with the exact versions of the dependencies):
70
+
71
+ ```python
72
+ #!/usr/bin/env python3
73
+ """
74
+ This script loads a source Hugging Face model and a calibration dataset,
75
+ quantizes the model using GPTQModel (with 4-bit precision and group size 128),
76
+ saves the quantized model using the Transformers API with safetensors (safe serialization)
77
+ under ~/models/quantized/, and then creates/updates a Hugging Face repository (with the
78
+ _gptq_g128_4bit suffix) by uploading the model, tokenizer, and an auto-generated README.md.
79
+
80
+ Usage example:
81
+ python quantize.py --source-model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
82
+ --calibration-dataset wikitext/wikitext-2-raw-v1 \
83
+ --seq-len 1024 --nsamples 256 --hf-token <YOUR_HF_TOKEN>
84
+ """
85
+
86
+ import os
87
+ import shutil
88
+ import subprocess
89
+ from pathlib import Path
90
+ from typing import List
91
+
92
+ import torch
93
+ import typer
94
+ from datasets import load_dataset
95
+ from dotenv import load_dotenv, find_dotenv
96
+ from gptqmodel import GPTQModel, QuantizeConfig
97
+ from gptqmodel.utils import Perplexity
98
+ # For later pushing to the model hub
99
+ from huggingface_hub import HfApi
100
+ from transformers import AutoTokenizer, PreTrainedTokenizerBase
101
+
102
+ load_dotenv(find_dotenv())
103
+ HF_TOKEN = os.getenv("HF_TOKEN")
104
+
105
+ app = typer.Typer()
106
+
107
+
108
+ def get_text_from_example(example: dict) -> str:
109
+ """
110
+ Returns text from a dataset example.
111
+ If the example contains a "text" field, and it is nonempty, that text is used.
112
+ Otherwise, if it has a "messages" field (a list of dicts with a "content" key),
113
+ the function returns the concatenation of all non-empty message contents.
114
+ """
115
+ if "text" in example and example["text"]:
116
+ return example["text"]
117
+ elif "messages" in example:
118
+ contents = [msg.get("content", "").strip() for msg in example["messages"]]
119
+ return " ".join([s for s in contents if s])
120
+ else:
121
+ return ""
122
+
123
+
124
+ def get_calibration_dataset(
125
+ tokenizer: PreTrainedTokenizerBase,
126
+ nsamples: int,
127
+ seqlen: int,
128
+ calibration_dataset: str
129
+ ) -> List[dict]:
130
+ """
131
+ Loads a calibration dataset from the Hugging Face Hub (or from a local file).
132
+ It accepts datasets with a single "text" field (like wikitext)
133
+ or with a "messages" field (as in the Neural Magic LLM Compression Calibration dataset).
134
+ Only examples whose extracted text length is at least 'seqlen' are kept.
135
+ Each chosen example is tokenized (with truncation up to 'seqlen') and returned as a dict.
136
+ """
137
+ ds = None
138
+ try:
139
+ # Attempt to load from HF Hub.
140
+ try:
141
+ if "/" in calibration_dataset:
142
+ parts = calibration_dataset.split("/", 1)
143
+ ds = load_dataset(parts[0], parts[1], split="train")
144
+ else:
145
+ ds = load_dataset(calibration_dataset, split="train")
146
+ except Exception as e:
147
+ print(f"Error loading dataset '{calibration_dataset}' via load_dataset: {e}")
148
+ ds = load_dataset(calibration_dataset, split="train")
149
+ print(f"Loaded calibration dataset from full remote path {calibration_dataset}.")
150
+
151
+
152
+ except Exception as e:
153
+ print(f"Error loading dataset '{calibration_dataset}' via load_dataset: {e}")
154
+ # Fallback: if the supplied calibration_dataset is a local path, try to load it as JSON-lines.
155
+ if os.path.exists(calibration_dataset):
156
+ try:
157
+ ds = load_dataset("json", data_files=calibration_dataset, split="train")
158
+ print(f"Loaded calibration dataset from local file {calibration_dataset}.")
159
+ except Exception as e2:
160
+ print(f"Error loading local json dataset from '{calibration_dataset}': {e2}")
161
+ return []
162
+ else:
163
+ return []
164
+
165
+ print(f"Dataset features: {ds.features}")
166
+
167
+ # Filter examples that have at least 80% 'seqlen' of extracted text.
168
+ ds = ds.filter(lambda x: len(get_text_from_example(x)) >= int(seqlen*0.8))
169
+ sample_range = min(nsamples, len(ds))
170
+ calibration_data = []
171
+ for i in range(sample_range):
172
+ example = ds[i]
173
+ text = get_text_from_example(example)
174
+ tokenized = tokenizer(text, truncation=True, max_length=seqlen, return_tensors="pt")
175
+ tokenized = {k: v.squeeze(0) for k, v in tokenized.items()}
176
+ calibration_data.append(tokenized)
177
+ return calibration_data
178
+
179
+
180
+ def calculate_avg_ppl(model, tokenizer):
181
+ """
182
+ Computes the average perplexity on the wikitext-2-raw-v1 train split using GPTQModel's Perplexity utility.
183
+ """
184
+ ppl = Perplexity(
185
+ model=model,
186
+ tokenizer=tokenizer,
187
+ dataset_path="wikitext",
188
+ dataset_name="wikitext-2-raw-v1",
189
+ split="train",
190
+ text_column="text",
191
+ )
192
+ ppl_values = ppl.calculate(n_ctx=512, n_batch=512)
193
+ avg = sum(ppl_values) / len(ppl_values)
194
+ return avg
195
+
196
+
197
+ def get_pinned_package_versions():
198
+ """
199
+ Retrieves pinned package versions using 'uv pip freeze'.
200
+ Returns a dictionary mapping lowercased package names to their versions.
201
+ """
202
+ try:
203
+ result = subprocess.run(["uv", "pip", "freeze"], capture_output=True, text=True, check=True)
204
+ packages_output = result.stdout.strip()
205
+ versions = {}
206
+ for line in packages_output.splitlines():
207
+ if "==" in line:
208
+ package_name, package_version = line.split("==", 1)
209
+ versions[package_name.lower()] = package_version
210
+ return versions
211
+ except subprocess.CalledProcessError as e:
212
+ typer.echo(f"Error running 'uv pip freeze': {e}", err=True)
213
+ return {}
214
+ except FileNotFoundError:
215
+ typer.echo("uv command not found. Make sure uv is installed and in your PATH.", err=True)
216
+ return {}
217
+
218
+
219
+ @app.command()
220
+ def main(
221
+ seq_len: int = typer.Option(4096, help="Sequence length for tokenization and calibration."),
222
+ nsamples: int = typer.Option(512, help="Number of samples to use for calibration."),
223
+ source_model: str = typer.Option("arcee-ai/Virtuoso-Medium-v2",
224
+ help="Source model HF repository identifier."),
225
+ calibration_dataset: str = typer.Option("wikitext/wikitext-2-raw-v1",
226
+ help="Calibration dataset identifier (in 'dataset/config' format) or local file path."),
227
+ hf_token: str = typer.Option(HF_TOKEN,
228
+ help="Hugging Face token for creating/updating your repo."),
229
+ upload_only: bool = typer.Option(False, help="Only upload the quantized model to the Hugging Face Hub."),
230
+ ):
231
+ # Prepare destination directory and model names.
232
+ model_name = source_model.split("/")[-1]
233
+ quantized_model_name = f"{model_name}_gptq_g128_4bit"
234
+ quantized_model_dir = os.path.expanduser(os.path.join("~/models/quantized", quantized_model_name))
235
+ if not os.path.exists(quantized_model_dir) or not upload_only:
236
+ os.makedirs(quantized_model_dir, exist_ok=True)
237
+
238
+ os.makedirs(quantized_model_dir, exist_ok=True)
239
+
240
+ typer.echo("Loading tokenizer from source model...")
241
+ tokenizer_obj = AutoTokenizer.from_pretrained(source_model, use_fast=True)
242
+
243
+ typer.echo("Loading calibration dataset...")
244
+ typer.echo(f"Calibration dataset: {calibration_dataset}")
245
+ calibration_data = get_calibration_dataset(tokenizer_obj, nsamples, seq_len, calibration_dataset)
246
+ if not calibration_data:
247
+ typer.echo("Calibration dataset is empty. Aborting.", err=True)
248
+ raise typer.Exit(code=1)
249
+
250
+ quantize_config = QuantizeConfig(bits=4, group_size=128, damp_percent=0.01)
251
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
252
+ typer.echo(f"Loading model in {device} mode...")
253
+ model = GPTQModel.load(source_model, quantize_config)
254
+
255
+ typer.echo("Quantizing model...")
256
+ model.quantize(calibration_data, auto_gc=False, batch_size=int(nsamples*0.1))
257
+ # Retrieve Hugging Face user info for README generation.
258
+ package_versions = get_pinned_package_versions()
259
+ username = get_my_user(hf_token)
260
+
261
+ script_content = self_read_script()
262
+
263
+ typer.echo(f"Saving quantized model to {quantized_model_dir} using Transformers safe serialization...")
264
+ try:
265
+ model.save_pretrained(quantized_model_dir)
266
+ tokenizer_obj.save_pretrained(quantized_model_dir)
267
+ except Exception as ex:
268
+ typer.echo(f"Error during saving with safe_serialization: {ex}. Aborting.")
269
+ raise
270
+ typer.echo(f"Model uploaded to Hugging Face repo: {quantized_model_name}")
271
+ else:
272
+ tokenizer_obj = AutoTokenizer.from_pretrained(source_model, use_fast=True)
273
+ package_versions = get_pinned_package_versions()
274
+ username = get_my_user(hf_token)
275
+ script_content = self_read_script()
276
+
277
+
278
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
279
+ model = GPTQModel.load(quantized_model_dir, device=device)
280
+ avg_ppl = calculate_avg_ppl(model, tokenizer_obj)
281
+ typer.echo(f"Average perplexity (PPL) on wikitext v2 dataset: {avg_ppl}")
282
+ deps = Path("./pyproject.toml")
283
+ shutil.copy(deps, quantized_model_dir)
284
+ generate_readme(calibration_dataset, nsamples, quantized_model_dir,
285
+ quantized_model_name, script_content, seq_len, source_model, username, avg_ppl)
286
+ GPTQModel.push_to_hub(quantized_path=quantized_model_dir, private=False, repo_id=quantized_model_name,
287
+ token=HF_TOKEN)
288
+ typer.echo(f"Model uploaded to Hugging Face repo: {quantized_model_name}")
289
+ demo_input = tokenizer_obj("test is", return_tensors="pt").to(device)
290
+ generated_ids = model.generate(**demo_input)
291
+ output_text = tokenizer_obj.decode(generated_ids[0])
292
+ typer.echo(f"Inference demo output: {output_text}")
293
+ typer.echo(f"Average perplexity (PPL) on calibration dataset: {avg_ppl}")
294
+
295
+
296
+ def self_read_script():
297
+ try:
298
+ script_path = os.path.abspath(__file__)
299
+ with open(script_path, "r") as f:
300
+ script_content = f.read()
301
+ except Exception as e:
302
+ script_content = "Error reading script content: " + str(e)
303
+ return script_content
304
+
305
+
306
+ def get_my_user(hf_token):
307
+ api = HfApi(token=hf_token)
308
+ user_info = api.whoami()
309
+ try:
310
+ username = user_info.get("name") or user_info.get("username")
311
+ except Exception as e:
312
+ typer.echo(f"Error retrieving username from Hugging Face API: {e}. Using default username.")
313
+ username = api.whoami()
314
+ if not username:
315
+ typer.echo("Could not determine your Hugging Face username from the token, defaulting to hard coded username.",
316
+ err=True)
317
+ username = "JustJaro"
318
+ return username
319
+
320
+
321
+ def generate_readme(calibration_dataset, nsamples, quantized_model_dir,
322
+ quantized_model_name, script_content, seq_len, source_model, username, avg_ppl):
323
+ readme_content = f"""---
324
+ tags:
325
+ - gptq
326
+ - quantization
327
+ - 4bit
328
+ - confidentialmind
329
+ - text-generation
330
+ - apache2.0
331
+ - mistral-small-24b
332
+ ---
333
+ # 🔥 Quantized Model: {quantized_model_name} 🔥
334
+
335
+ This is a 4-bit quantized version of [{source_model}](https://huggingface.co/{source_model}) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
336
+ It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 128 resulting in a
337
+ smaller,
338
+ faster model with minimal performance degradation.
339
+
340
+ Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
341
+
342
+ *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.
343
+
344
+ ## Model Details
345
+ - **Original Model:** [{source_model}](https://huggingface.co/{source_model})
346
+ - **Quantized Model:** {quantized_model_name} (this repository)
347
+ - **Quantization Method:** GPTQ (4-bit, group size 128)
348
+ - **Quantization Library:** [GPTQModel](https://github.com/ModelCloud/GPTQModel/tree/main)
349
+ - **Calibration Dataset:** {calibration_dataset} (using {nsamples} samples with seq len {seq_len})
350
+ - **Quantized by:** [ConfidentialMind.com](https://www.confidentialmind.com)
351
+
352
+ ## Usage
353
+
354
+ ```python
355
+ from gptqmodel import GPTQModel
356
+ from transformers import AutoTokenizer
357
+
358
+ # Use the local directory or {username}/{quantized_model_name} after upload
359
+ quantized_model_id = "{quantized_model_dir}" # or "{username}/{quantized_model_name}"
360
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_id)
361
+ model = GPTQModel.load(quantized_model_id, device="cuda:0") # or "cpu"
362
+
363
+ input_text = "This is a test prompt"
364
+ inputs = tokenizer(input_text, return_tensors="pt").to("cuda:0")
365
+ outputs = model.generate(**inputs)
366
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
367
+ ```
368
+
369
+ ## Package Versions and Installation Instructions
370
+
371
+ See pyproject.toml for the exact UV project file. See the [GPTQModel](
372
+ https://github.com/ModelCloud/GPTQModel/tree/main) repo for more details. on how to install the package.
373
+
374
+ Use the provided pyproject.toml:
375
+
376
+ ```bash
377
+ uv venv
378
+ source venv/bin/activate
379
+ uv sync
380
+ ```
381
+
382
+ ### Environment Variables
383
+
384
+ ```bash
385
+ HF_TOKEN=<YOUR_HF_TOKEN>
386
+ TOKENIZERS_PARALLELISM="true"
387
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
388
+ ```
389
+
390
+ ## Quantization Script
391
+ Below is the exact quantize.py script used to generate this model (with the exact versions of the dependencies):
392
+
393
+ ```python
394
+ {script_content}
395
+ ```
396
+
397
+ ## Quantization Performance
398
+
399
+ Average perplexity (PPL) on wikitext v2 dataset: {avg_ppl}
400
+
401
+ ## Disclaimer
402
+ This model is for research purposes only. It may inherit limitations and biases from the original model and the quantization process. Please use responsibly and refer to the original model card for more details.
403
+
404
+ ## Contact
405
+ For any questions or support, please visit [ConfidentialMind.com](https://www.confidentialmind.com) or contact us directly.
406
+
407
+ ## License
408
+ This model inherits the license from the original model. Please refer to the original model card for more details.
409
+ Original model card: `{source_model}`
410
+
411
+ ## Author
412
+ This model was quantized by [Jaro](https://www.linkedin.com/in/jaroai/)
413
+
414
+ ## Acknowledgements
415
+ Quantization performed using the GPTQModel pipeline.
416
+
417
+ TODO: Add `gptqmodel.utils.eval` integration and auto-generation of eval table.
418
+
419
+ ---
420
+ *Generated and quantized using GPTQModel.*
421
+ """
422
+ readme_path = os.path.join(quantized_model_dir, "README.md")
423
+ with open(readme_path, "w") as f:
424
+ f.write(readme_content)
425
+ typer.echo("README.md created with detailed information.")
426
+
427
+
428
+ if __name__ == "__main__":
429
+ app()
430
+ ```
431
+
432
+ ## Quantization Performance
433
+
434
+ Average perplexity (PPL) on wikitext v2 dataset: 6.455169136972343
435
+
436
+ ## Disclaimer
437
+ This model is for research purposes only. It may inherit limitations and biases from the original model and the quantization process. Please use responsibly and refer to the original model card for more details.
438
+
439
+ ## Contact
440
+ For any questions or support, please visit [ConfidentialMind.com](https://www.confidentialmind.com) or contact us directly.
441
+
442
+ ## License
443
+ This model inherits the license from the original model. Please refer to the original model card for more details.
444
+ Original model card: `arcee-ai/Virtuoso-Medium-v2`
445
+
446
+ ## Author
447
+ This model was quantized by [Jaro](https://www.linkedin.com/in/jaroai/)
448
+
449
+ ## Acknowledgements
450
+ Quantization performed using the GPTQModel pipeline.
451
+
452
+ TODO: Add `gptqmodel.utils.eval` integration and auto-generation of eval table.
453
+
454
+ ---
455
+ *Generated and quantized using GPTQModel.*
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "_name_or_path": "/home/jaro/.cache/huggingface/hub/models--arcee-ai--Virtuoso-Medium-v2/snapshots/bb67df2fbf6c4819b139f3fbd22de9615ea49949",
4
+ "architectures": [
5
+ "Qwen2ForCausalLM"
6
+ ],
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 151643,
9
+ "eos_token_id": 151645,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 5120,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 27648,
14
+ "max_position_embeddings": 131072,
15
+ "max_window_layers": 70,
16
+ "model_type": "qwen2",
17
+ "num_attention_heads": 40,
18
+ "num_hidden_layers": 64,
19
+ "num_key_value_heads": 8,
20
+ "quantization_config": {
21
+ "bits": 4,
22
+ "checkpoint_format": "gptq",
23
+ "desc_act": true,
24
+ "group_size": 128,
25
+ "lm_head": false,
26
+ "meta": {
27
+ "damp_auto_increment": 0.0025,
28
+ "damp_percent": 0.01,
29
+ "mse": 0.0,
30
+ "quantizer": [
31
+ "gptqmodel:1.9.0"
32
+ ],
33
+ "static_groups": false,
34
+ "true_sequential": true,
35
+ "uri": "https://github.com/modelcloud/gptqmodel"
36
+ },
37
+ "pack_dtype": "int32",
38
+ "quant_method": "gptq",
39
+ "sym": true
40
+ },
41
+ "rms_norm_eps": 1e-06,
42
+ "rope_scaling": null,
43
+ "rope_theta": 1000000.0,
44
+ "sliding_window": null,
45
+ "tie_word_embeddings": false,
46
+ "torch_dtype": "bfloat16",
47
+ "transformers_version": "4.48.3",
48
+ "use_cache": true,
49
+ "use_sliding_window": false,
50
+ "vocab_size": 151665
51
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65e96979a3270d419a9eaa70526756e8ef0d96a88c59205381bfec5abebdb241
3
+ size 3941700568
model-00002-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f69601ee3a046a93468c43cab3c80667c0f275d5770c3564e7b96f71a862c0c1
3
+ size 3983861408
model-00003-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66fce05532989d6be9ab4c62d33fa0df36fd8303f9a43b005bd1d76df48fe875
3
+ size 3950966128
model-00004-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cace3af65e4282cec81180ad38e1fc8988c2ddfab9b587941ccd5a167e921cfa
3
+ size 3983861456
model-00005-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8ffed24887ed2c755d765e93c2c664e0d28e306f41dd637b3c3e74f2b1739a1
3
+ size 3475423888
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
pyproject.toml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ cat pyproject.toml
2
+ [build-system]
3
+ requires = ["uv", "setuptools>=61.0", "wheel"] # uv for uv-aware builds, setuptools for packaging
4
+ build-backend = "setuptools.build_meta"
5
+
6
+ [project]
7
+ name = "cquantize"
8
+ version = "0.1.0"
9
+ description = "Quantization script module for confidentialmind-graph project for 4bit GPTQ quantizations (so far)"
10
+ readme = "README.md"
11
+ requires-python = ">=3.11,<=3.13.10" # 3.13.8 is used in the main project
12
+
13
+ dependencies = [
14
+ "python-dotenv>=1.0.1",
15
+ "gptqmodel>=1.9.0",
16
+ "threadpoolctl>=3.5.0",
17
+ "tokenicer>=0.0.2",
18
+ "device-smi>=0.3.3",
19
+ "pillow>=11.1.0",
20
+ "torch>=2.6.0",
21
+ "accelerate>=1.3.0",
22
+ "safetensors>=0.5.2",
23
+ "transformers>=4.48.3",
24
+ "datasets>=3.3.0",
25
+ "huggingface-hub>=0.28.1",
26
+ "typer>=0.15.1",
27
+ ]
28
+
29
+ [tool.setuptools.package-data]
30
+ quantize = ["README.md", "*.py"] # Include README and Python files if packaged
quant_log.csv ADDED
@@ -0,0 +1,449 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ layer,module,loss,damp,time
2
+ 0,self_attn.k_proj,0.68429,0.01000,3.673
3
+ 0,self_attn.v_proj,0.22175,0.01000,2.039
4
+ 0,self_attn.q_proj,1.96869,0.01000,2.136
5
+ 0,self_attn.o_proj,3.77693,0.01150,2.910
6
+ 0,mlp.up_proj,0.99279,0.01000,3.604
7
+ 0,mlp.gate_proj,1.07921,0.01000,2.528
8
+ 0,mlp.down_proj,1.35582,0.01000,24.237
9
+ 1,self_attn.k_proj,0.03837,0.01000,3.430
10
+ 1,self_attn.v_proj,0.01894,0.01000,2.031
11
+ 1,self_attn.q_proj,0.16220,0.01000,2.131
12
+ 1,self_attn.o_proj,0.03780,0.01000,2.936
13
+ 1,mlp.up_proj,8.74470,0.01000,3.627
14
+ 1,mlp.gate_proj,28.60597,0.01000,2.514
15
+ 1,mlp.down_proj,0.44505,0.01000,23.831
16
+ 2,self_attn.k_proj,0.16713,0.01000,3.350
17
+ 2,self_attn.v_proj,0.06144,0.01000,1.951
18
+ 2,self_attn.q_proj,0.49034,0.01000,2.032
19
+ 2,self_attn.o_proj,0.25572,0.01000,2.846
20
+ 2,mlp.up_proj,15.69767,0.01000,3.508
21
+ 2,mlp.gate_proj,36.18500,0.01000,2.435
22
+ 2,mlp.down_proj,1.68527,0.01000,23.804
23
+ 3,self_attn.k_proj,0.85578,0.01000,3.324
24
+ 3,self_attn.v_proj,0.26647,0.01000,1.911
25
+ 3,self_attn.q_proj,2.48419,0.01000,2.033
26
+ 3,self_attn.o_proj,0.82004,0.01000,2.826
27
+ 3,mlp.up_proj,17.33395,0.01000,3.542
28
+ 3,mlp.gate_proj,39.69993,0.01000,2.442
29
+ 3,mlp.down_proj,3.59631,0.01000,23.697
30
+ 4,self_attn.k_proj,0.78995,0.01000,3.319
31
+ 4,self_attn.v_proj,0.37706,0.01000,1.936
32
+ 4,self_attn.q_proj,2.42253,0.01000,2.047
33
+ 4,self_attn.o_proj,1.69794,0.01000,2.828
34
+ 4,mlp.up_proj,44.51686,0.01000,3.516
35
+ 4,mlp.gate_proj,85.78052,0.01000,2.448
36
+ 4,mlp.down_proj,34.84994,0.01000,23.718
37
+ 5,self_attn.k_proj,2.75266,0.01000,3.312
38
+ 5,self_attn.v_proj,1.52558,0.01000,1.923
39
+ 5,self_attn.q_proj,9.42790,0.01000,2.020
40
+ 5,self_attn.o_proj,2.21958,0.01000,2.828
41
+ 5,mlp.up_proj,80.44930,0.01000,3.534
42
+ 5,mlp.gate_proj,146.69415,0.01000,2.422
43
+ 5,mlp.down_proj,155.78115,0.01000,23.645
44
+ 6,self_attn.k_proj,2.81902,0.01000,3.320
45
+ 6,self_attn.v_proj,2.04968,0.01000,1.915
46
+ 6,self_attn.q_proj,10.05766,0.01000,2.023
47
+ 6,self_attn.o_proj,2.02901,0.01000,2.806
48
+ 6,mlp.up_proj,138.50996,0.01000,3.527
49
+ 6,mlp.gate_proj,239.54771,0.01000,2.442
50
+ 6,mlp.down_proj,17.99062,0.01000,23.722
51
+ 7,self_attn.k_proj,3.87859,0.01000,3.309
52
+ 7,self_attn.v_proj,3.12427,0.01000,1.914
53
+ 7,self_attn.q_proj,14.01700,0.01000,2.012
54
+ 7,self_attn.o_proj,2.40274,0.01000,2.809
55
+ 7,mlp.up_proj,182.27310,0.01000,3.520
56
+ 7,mlp.gate_proj,314.12385,0.01000,2.411
57
+ 7,mlp.down_proj,8.08519,0.01000,23.577
58
+ 8,self_attn.k_proj,4.84397,0.01000,3.310
59
+ 8,self_attn.v_proj,2.74577,0.01000,1.906
60
+ 8,self_attn.q_proj,17.26296,0.01000,2.014
61
+ 8,self_attn.o_proj,1.65540,0.01000,2.807
62
+ 8,mlp.up_proj,120.39232,0.01000,3.511
63
+ 8,mlp.gate_proj,203.03354,0.01000,2.425
64
+ 8,mlp.down_proj,10.47238,0.01000,23.598
65
+ 9,self_attn.k_proj,4.16787,0.01000,3.311
66
+ 9,self_attn.v_proj,3.41207,0.01000,1.918
67
+ 9,self_attn.q_proj,14.64457,0.01000,2.020
68
+ 9,self_attn.o_proj,3.06484,0.01000,2.812
69
+ 9,mlp.up_proj,67.20688,0.01000,3.514
70
+ 9,mlp.gate_proj,72.71201,0.01000,2.415
71
+ 9,mlp.down_proj,13.09478,0.01000,23.613
72
+ 10,self_attn.k_proj,6.04928,0.01000,3.304
73
+ 10,self_attn.v_proj,4.86239,0.01000,1.910
74
+ 10,self_attn.q_proj,22.36867,0.01000,2.013
75
+ 10,self_attn.o_proj,3.30104,0.01000,2.802
76
+ 10,mlp.up_proj,80.23578,0.01000,3.516
77
+ 10,mlp.gate_proj,86.89879,0.01000,2.433
78
+ 10,mlp.down_proj,16.23887,0.01000,23.581
79
+ 11,self_attn.k_proj,4.21718,0.01000,3.300
80
+ 11,self_attn.v_proj,3.04058,0.01000,1.907
81
+ 11,self_attn.q_proj,15.69358,0.01000,2.012
82
+ 11,self_attn.o_proj,4.96679,0.01000,2.807
83
+ 11,mlp.up_proj,91.37010,0.01000,3.516
84
+ 11,mlp.gate_proj,110.56358,0.01000,2.408
85
+ 11,mlp.down_proj,14.46899,0.01000,23.621
86
+ 12,self_attn.k_proj,5.12722,0.01000,3.317
87
+ 12,self_attn.v_proj,3.43182,0.01000,1.923
88
+ 12,self_attn.q_proj,19.18940,0.01000,2.025
89
+ 12,self_attn.o_proj,4.86182,0.01000,2.821
90
+ 12,mlp.up_proj,94.84682,0.01000,3.508
91
+ 12,mlp.gate_proj,101.46472,0.01000,2.407
92
+ 12,mlp.down_proj,17.35850,0.01000,23.601
93
+ 13,self_attn.k_proj,5.84547,0.01000,3.314
94
+ 13,self_attn.v_proj,4.18078,0.01000,1.917
95
+ 13,self_attn.q_proj,20.81503,0.01000,2.032
96
+ 13,self_attn.o_proj,5.00791,0.01000,2.819
97
+ 13,mlp.up_proj,106.81878,0.01000,3.540
98
+ 13,mlp.gate_proj,115.32652,0.01000,2.438
99
+ 13,mlp.down_proj,19.56649,0.01000,23.678
100
+ 14,self_attn.k_proj,7.41857,0.01000,3.309
101
+ 14,self_attn.v_proj,4.78736,0.01000,1.925
102
+ 14,self_attn.q_proj,26.43606,0.01000,2.027
103
+ 14,self_attn.o_proj,5.53114,0.01000,2.822
104
+ 14,mlp.up_proj,112.08216,0.01000,3.518
105
+ 14,mlp.gate_proj,122.21570,0.01000,2.405
106
+ 14,mlp.down_proj,21.00375,0.01000,23.632
107
+ 15,self_attn.k_proj,6.29604,0.01000,3.336
108
+ 15,self_attn.v_proj,4.85133,0.01000,1.906
109
+ 15,self_attn.q_proj,22.34891,0.01000,2.022
110
+ 15,self_attn.o_proj,6.25405,0.01000,2.811
111
+ 15,mlp.up_proj,114.44092,0.01000,3.501
112
+ 15,mlp.gate_proj,129.78032,0.01000,2.406
113
+ 15,mlp.down_proj,21.92258,0.01000,23.620
114
+ 16,self_attn.k_proj,5.75072,0.01000,3.312
115
+ 16,self_attn.v_proj,3.36871,0.01000,1.910
116
+ 16,self_attn.q_proj,19.35139,0.01000,2.018
117
+ 16,self_attn.o_proj,4.97404,0.01000,2.795
118
+ 16,mlp.up_proj,109.24636,0.01000,3.513
119
+ 16,mlp.gate_proj,117.69237,0.01000,2.422
120
+ 16,mlp.down_proj,19.93951,0.01000,23.677
121
+ 17,self_attn.k_proj,7.39674,0.01000,3.303
122
+ 17,self_attn.v_proj,4.26947,0.01000,1.914
123
+ 17,self_attn.q_proj,25.22288,0.01000,2.012
124
+ 17,self_attn.o_proj,4.67647,0.01000,2.820
125
+ 17,mlp.up_proj,107.66575,0.01000,3.514
126
+ 17,mlp.gate_proj,114.25159,0.01000,2.422
127
+ 17,mlp.down_proj,19.09546,0.01000,23.624
128
+ 18,self_attn.k_proj,7.44274,0.01000,3.315
129
+ 18,self_attn.v_proj,4.28236,0.01000,1.921
130
+ 18,self_attn.q_proj,25.45588,0.01000,2.012
131
+ 18,self_attn.o_proj,4.71608,0.01000,2.834
132
+ 18,mlp.up_proj,105.33404,0.01000,3.517
133
+ 18,mlp.gate_proj,110.93895,0.01000,2.431
134
+ 18,mlp.down_proj,18.90383,0.01000,23.600
135
+ 19,self_attn.k_proj,6.62320,0.01000,3.320
136
+ 19,self_attn.v_proj,4.40162,0.01000,1.916
137
+ 19,self_attn.q_proj,24.01939,0.01000,2.018
138
+ 19,self_attn.o_proj,3.40690,0.01000,2.812
139
+ 19,mlp.up_proj,106.48657,0.01000,3.523
140
+ 19,mlp.gate_proj,111.95718,0.01000,2.423
141
+ 19,mlp.down_proj,18.99875,0.01000,23.604
142
+ 20,self_attn.k_proj,8.01747,0.01000,3.320
143
+ 20,self_attn.v_proj,4.32686,0.01000,1.926
144
+ 20,self_attn.q_proj,27.61566,0.01000,2.022
145
+ 20,self_attn.o_proj,5.08997,0.01000,2.827
146
+ 20,mlp.up_proj,102.25371,0.01000,3.520
147
+ 20,mlp.gate_proj,105.85433,0.01000,2.428
148
+ 20,mlp.down_proj,19.00534,0.01000,23.687
149
+ 21,self_attn.k_proj,6.44339,0.01000,3.313
150
+ 21,self_attn.v_proj,3.84395,0.01000,1.914
151
+ 21,self_attn.q_proj,22.02793,0.01000,2.019
152
+ 21,self_attn.o_proj,5.63995,0.01000,2.810
153
+ 21,mlp.up_proj,100.82385,0.01000,3.526
154
+ 21,mlp.gate_proj,104.78507,0.01000,2.429
155
+ 21,mlp.down_proj,18.76496,0.01000,23.633
156
+ 22,self_attn.k_proj,7.03357,0.01000,3.314
157
+ 22,self_attn.v_proj,5.50739,0.01000,1.930
158
+ 22,self_attn.q_proj,24.99235,0.01000,2.017
159
+ 22,self_attn.o_proj,7.64190,0.01000,2.823
160
+ 22,mlp.up_proj,106.12436,0.01000,3.532
161
+ 22,mlp.gate_proj,111.44365,0.01000,2.417
162
+ 22,mlp.down_proj,20.60899,0.01000,23.662
163
+ 23,self_attn.k_proj,6.91308,0.01000,3.319
164
+ 23,self_attn.v_proj,5.71615,0.01000,1.908
165
+ 23,self_attn.q_proj,25.19551,0.01000,2.015
166
+ 23,self_attn.o_proj,8.81081,0.01000,2.830
167
+ 23,mlp.up_proj,114.01709,0.01000,3.512
168
+ 23,mlp.gate_proj,120.36119,0.01000,2.407
169
+ 23,mlp.down_proj,22.46280,0.01000,23.675
170
+ 24,self_attn.k_proj,9.23080,0.01000,3.313
171
+ 24,self_attn.v_proj,5.80098,0.01000,1.906
172
+ 24,self_attn.q_proj,32.09112,0.01000,2.021
173
+ 24,self_attn.o_proj,6.87168,0.01000,2.827
174
+ 24,mlp.up_proj,116.23648,0.01000,3.527
175
+ 24,mlp.gate_proj,120.92004,0.01000,2.419
176
+ 24,mlp.down_proj,22.61730,0.01000,23.678
177
+ 25,self_attn.k_proj,9.93150,0.01000,3.314
178
+ 25,self_attn.v_proj,7.39638,0.01000,1.902
179
+ 25,self_attn.q_proj,36.10401,0.01000,2.033
180
+ 25,self_attn.o_proj,6.79195,0.01000,2.811
181
+ 25,mlp.up_proj,118.64264,0.01000,3.508
182
+ 25,mlp.gate_proj,121.46796,0.01000,2.414
183
+ 25,mlp.down_proj,24.84690,0.01000,23.661
184
+ 26,self_attn.k_proj,8.65966,0.01000,3.313
185
+ 26,self_attn.v_proj,4.98542,0.01000,1.920
186
+ 26,self_attn.q_proj,31.55737,0.01000,2.004
187
+ 26,self_attn.o_proj,7.98239,0.01000,2.816
188
+ 26,mlp.up_proj,121.85531,0.01000,3.524
189
+ 26,mlp.gate_proj,125.33771,0.01000,2.406
190
+ 26,mlp.down_proj,26.58099,0.01000,23.653
191
+ 27,self_attn.k_proj,8.58310,0.01000,3.301
192
+ 27,self_attn.v_proj,5.21801,0.01000,1.919
193
+ 27,self_attn.q_proj,28.61016,0.01000,2.020
194
+ 27,self_attn.o_proj,11.54781,0.01000,2.823
195
+ 27,mlp.up_proj,128.94634,0.01000,3.527
196
+ 27,mlp.gate_proj,130.93888,0.01000,2.417
197
+ 27,mlp.down_proj,30.78769,0.01000,23.658
198
+ 28,self_attn.k_proj,8.23528,0.01000,3.321
199
+ 28,self_attn.v_proj,7.83469,0.01000,1.919
200
+ 28,self_attn.q_proj,30.92525,0.01000,2.030
201
+ 28,self_attn.o_proj,13.63801,0.01000,2.822
202
+ 28,mlp.up_proj,143.61916,0.01000,3.530
203
+ 28,mlp.gate_proj,144.40119,0.01000,2.438
204
+ 28,mlp.down_proj,36.58346,0.01000,23.664
205
+ 29,self_attn.k_proj,13.03879,0.01000,3.314
206
+ 29,self_attn.v_proj,10.05327,0.01000,1.917
207
+ 29,self_attn.q_proj,44.58715,0.01000,2.016
208
+ 29,self_attn.o_proj,16.30329,0.01000,2.826
209
+ 29,mlp.up_proj,158.88219,0.01000,3.531
210
+ 29,mlp.gate_proj,156.18305,0.01000,2.439
211
+ 29,mlp.down_proj,40.20440,0.01000,23.645
212
+ 30,self_attn.k_proj,10.77216,0.01000,3.305
213
+ 30,self_attn.v_proj,10.38958,0.01000,1.926
214
+ 30,self_attn.q_proj,40.28143,0.01000,2.161
215
+ 30,self_attn.o_proj,23.21788,0.01000,2.835
216
+ 30,mlp.up_proj,170.46665,0.01000,3.518
217
+ 30,mlp.gate_proj,173.49276,0.01000,2.398
218
+ 30,mlp.down_proj,44.82191,0.01000,23.648
219
+ 31,self_attn.k_proj,11.63493,0.01000,3.319
220
+ 31,self_attn.v_proj,10.50318,0.01000,1.926
221
+ 31,self_attn.q_proj,42.10770,0.01000,2.022
222
+ 31,self_attn.o_proj,17.96640,0.01000,2.804
223
+ 31,mlp.up_proj,191.48129,0.01000,3.537
224
+ 31,mlp.gate_proj,187.92781,0.01000,2.399
225
+ 31,mlp.down_proj,50.03851,0.01000,23.631
226
+ 32,self_attn.k_proj,11.51235,0.01000,3.332
227
+ 32,self_attn.v_proj,7.84144,0.01000,1.937
228
+ 32,self_attn.q_proj,39.07314,0.01000,2.030
229
+ 32,self_attn.o_proj,16.07241,0.01000,2.801
230
+ 32,mlp.up_proj,216.05202,0.01000,3.525
231
+ 32,mlp.gate_proj,222.60847,0.01000,2.410
232
+ 32,mlp.down_proj,52.91387,0.01000,23.675
233
+ 33,self_attn.k_proj,11.80478,0.01000,3.319
234
+ 33,self_attn.v_proj,8.69015,0.01000,1.925
235
+ 33,self_attn.q_proj,42.76728,0.01000,2.038
236
+ 33,self_attn.o_proj,17.84222,0.01000,2.832
237
+ 33,mlp.up_proj,201.68671,0.01000,3.534
238
+ 33,mlp.gate_proj,206.43596,0.01000,2.452
239
+ 33,mlp.down_proj,48.19399,0.01000,23.679
240
+ 34,self_attn.k_proj,11.43841,0.01000,3.322
241
+ 34,self_attn.v_proj,9.44701,0.01000,1.909
242
+ 34,self_attn.q_proj,42.19950,0.01000,2.031
243
+ 34,self_attn.o_proj,23.21826,0.01000,2.813
244
+ 34,mlp.up_proj,197.22599,0.01000,3.549
245
+ 34,mlp.gate_proj,198.69668,0.01000,2.425
246
+ 34,mlp.down_proj,47.08673,0.01000,23.684
247
+ 35,self_attn.k_proj,12.18994,0.01000,3.314
248
+ 35,self_attn.v_proj,9.17933,0.01000,1.909
249
+ 35,self_attn.q_proj,48.97500,0.01000,2.008
250
+ 35,self_attn.o_proj,14.92670,0.01000,2.828
251
+ 35,mlp.up_proj,203.08513,0.01000,3.495
252
+ 35,mlp.gate_proj,199.72694,0.01000,2.409
253
+ 35,mlp.down_proj,47.83594,0.01000,23.627
254
+ 36,self_attn.k_proj,12.86573,0.01000,3.313
255
+ 36,self_attn.v_proj,9.14343,0.01000,1.930
256
+ 36,self_attn.q_proj,48.02929,0.01000,2.031
257
+ 36,self_attn.o_proj,19.71742,0.01000,2.822
258
+ 36,mlp.up_proj,188.81137,0.01000,3.518
259
+ 36,mlp.gate_proj,181.60324,0.01000,2.430
260
+ 36,mlp.down_proj,46.46059,0.01000,23.682
261
+ 37,self_attn.k_proj,11.23444,0.01000,3.307
262
+ 37,self_attn.v_proj,8.68904,0.01000,1.913
263
+ 37,self_attn.q_proj,41.53437,0.01000,2.013
264
+ 37,self_attn.o_proj,17.14158,0.01000,2.830
265
+ 37,mlp.up_proj,191.38220,0.01000,3.503
266
+ 37,mlp.gate_proj,180.87342,0.01000,2.413
267
+ 37,mlp.down_proj,45.87174,0.01000,23.720
268
+ 38,self_attn.k_proj,12.95493,0.01000,3.321
269
+ 38,self_attn.v_proj,13.34433,0.01000,1.947
270
+ 38,self_attn.q_proj,47.03329,0.01000,2.028
271
+ 38,self_attn.o_proj,24.59711,0.01000,2.826
272
+ 38,mlp.up_proj,198.80115,0.01000,3.533
273
+ 38,mlp.gate_proj,192.11092,0.01000,2.449
274
+ 38,mlp.down_proj,52.32773,0.01000,23.665
275
+ 39,self_attn.k_proj,12.81120,0.01000,3.321
276
+ 39,self_attn.v_proj,14.26659,0.01000,1.919
277
+ 39,self_attn.q_proj,51.02459,0.01000,2.014
278
+ 39,self_attn.o_proj,26.17471,0.01000,2.820
279
+ 39,mlp.up_proj,213.29043,0.01000,3.544
280
+ 39,mlp.gate_proj,210.95182,0.01000,2.430
281
+ 39,mlp.down_proj,54.58284,0.01000,23.705
282
+ 40,self_attn.k_proj,16.96652,0.01000,3.335
283
+ 40,self_attn.v_proj,12.92616,0.01000,1.948
284
+ 40,self_attn.q_proj,59.58385,0.01000,2.031
285
+ 40,self_attn.o_proj,23.56769,0.01000,2.850
286
+ 40,mlp.up_proj,213.04031,0.01000,3.536
287
+ 40,mlp.gate_proj,209.94276,0.01000,2.452
288
+ 40,mlp.down_proj,52.87887,0.01000,23.594
289
+ 41,self_attn.k_proj,16.80811,0.01000,3.312
290
+ 41,self_attn.v_proj,17.76251,0.01000,1.937
291
+ 41,self_attn.q_proj,65.20547,0.01000,2.026
292
+ 41,self_attn.o_proj,25.04289,0.01000,2.817
293
+ 41,mlp.up_proj,220.87492,0.01000,3.520
294
+ 41,mlp.gate_proj,211.67563,0.01000,2.423
295
+ 41,mlp.down_proj,59.25484,0.01000,23.661
296
+ 42,self_attn.k_proj,15.02459,0.01000,3.320
297
+ 42,self_attn.v_proj,10.88788,0.01000,1.911
298
+ 42,self_attn.q_proj,57.54241,0.01000,2.035
299
+ 42,self_attn.o_proj,20.55937,0.01000,2.821
300
+ 42,mlp.up_proj,236.70117,0.01000,3.520
301
+ 42,mlp.gate_proj,221.29241,0.01000,2.403
302
+ 42,mlp.down_proj,66.95299,0.01000,23.756
303
+ 43,self_attn.k_proj,15.22730,0.01000,3.326
304
+ 43,self_attn.v_proj,13.28080,0.01000,1.941
305
+ 43,self_attn.q_proj,55.10066,0.01000,2.040
306
+ 43,self_attn.o_proj,27.03987,0.01000,2.830
307
+ 43,mlp.up_proj,255.56507,0.01000,3.506
308
+ 43,mlp.gate_proj,243.61768,0.01000,2.411
309
+ 43,mlp.down_proj,95.69682,0.01000,23.712
310
+ 44,self_attn.k_proj,13.45872,0.01000,3.315
311
+ 44,self_attn.v_proj,19.12148,0.01000,1.903
312
+ 44,self_attn.q_proj,56.92778,0.01000,2.011
313
+ 44,self_attn.o_proj,34.21928,0.01000,2.825
314
+ 44,mlp.up_proj,273.97650,0.01000,3.531
315
+ 44,mlp.gate_proj,260.11922,0.01000,2.435
316
+ 44,mlp.down_proj,93.20020,0.01000,23.694
317
+ 45,self_attn.k_proj,17.86078,0.01000,3.321
318
+ 45,self_attn.v_proj,20.56466,0.01000,1.925
319
+ 45,self_attn.q_proj,67.85091,0.01000,2.018
320
+ 45,self_attn.o_proj,40.36663,0.01000,2.823
321
+ 45,mlp.up_proj,290.53839,0.01000,3.536
322
+ 45,mlp.gate_proj,274.27193,0.01000,2.425
323
+ 45,mlp.down_proj,104.36816,0.01000,23.659
324
+ 46,self_attn.k_proj,15.56675,0.01000,3.327
325
+ 46,self_attn.v_proj,22.97811,0.01000,1.921
326
+ 46,self_attn.q_proj,64.33953,0.01000,2.035
327
+ 46,self_attn.o_proj,52.65483,0.01000,2.843
328
+ 46,mlp.up_proj,308.17915,0.01000,3.537
329
+ 46,mlp.gate_proj,288.95987,0.01000,2.427
330
+ 46,mlp.down_proj,114.66652,0.01000,23.659
331
+ 47,self_attn.k_proj,16.12798,0.01000,3.317
332
+ 47,self_attn.v_proj,20.42561,0.01000,1.922
333
+ 47,self_attn.q_proj,65.25035,0.01000,1.995
334
+ 47,self_attn.o_proj,36.59410,0.01000,2.823
335
+ 47,mlp.up_proj,343.30617,0.01000,3.525
336
+ 47,mlp.gate_proj,329.58792,0.01000,2.427
337
+ 47,mlp.down_proj,132.65701,0.01000,23.655
338
+ 48,self_attn.k_proj,17.20556,0.01000,3.329
339
+ 48,self_attn.v_proj,27.96892,0.01000,1.926
340
+ 48,self_attn.q_proj,69.40354,0.01000,2.016
341
+ 48,self_attn.o_proj,41.50221,0.01000,2.827
342
+ 48,mlp.up_proj,368.98319,0.01000,3.513
343
+ 48,mlp.gate_proj,357.69691,0.01000,2.428
344
+ 48,mlp.down_proj,152.02718,0.01000,23.796
345
+ 49,self_attn.k_proj,17.55639,0.01000,3.308
346
+ 49,self_attn.v_proj,25.41515,0.01000,1.921
347
+ 49,self_attn.q_proj,73.97384,0.01000,2.021
348
+ 49,self_attn.o_proj,45.05861,0.01000,2.817
349
+ 49,mlp.up_proj,424.21883,0.01000,3.527
350
+ 49,mlp.gate_proj,411.17935,0.01000,2.424
351
+ 49,mlp.down_proj,206.88105,0.01000,23.645
352
+ 50,self_attn.k_proj,18.10877,0.01000,3.337
353
+ 50,self_attn.v_proj,30.56898,0.01000,1.949
354
+ 50,self_attn.q_proj,78.36073,0.01000,2.049
355
+ 50,self_attn.o_proj,38.76283,0.01000,2.815
356
+ 50,mlp.up_proj,474.55079,0.01000,3.524
357
+ 50,mlp.gate_proj,475.99778,0.01000,2.422
358
+ 50,mlp.down_proj,227.39140,0.01000,23.676
359
+ 51,self_attn.k_proj,17.18596,0.01000,3.320
360
+ 51,self_attn.v_proj,25.70684,0.01000,1.930
361
+ 51,self_attn.q_proj,69.60019,0.01000,2.015
362
+ 51,self_attn.o_proj,61.42497,0.01000,2.811
363
+ 51,mlp.up_proj,509.62421,0.01000,3.536
364
+ 51,mlp.gate_proj,521.04976,0.01000,2.440
365
+ 51,mlp.down_proj,252.19549,0.01000,23.688
366
+ 52,self_attn.k_proj,20.03325,0.01000,3.327
367
+ 52,self_attn.v_proj,44.92303,0.01000,1.918
368
+ 52,self_attn.q_proj,83.38118,0.01000,2.040
369
+ 52,self_attn.o_proj,52.90344,0.01000,2.836
370
+ 52,mlp.up_proj,545.77302,0.01000,3.528
371
+ 52,mlp.gate_proj,545.53809,0.01000,2.437
372
+ 52,mlp.down_proj,296.30906,0.01000,23.676
373
+ 53,self_attn.k_proj,23.01326,0.01000,3.325
374
+ 53,self_attn.v_proj,44.71671,0.01000,1.925
375
+ 53,self_attn.q_proj,89.46515,0.01000,2.028
376
+ 53,self_attn.o_proj,71.05999,0.01000,2.830
377
+ 53,mlp.up_proj,592.97445,0.01000,3.523
378
+ 53,mlp.gate_proj,597.97927,0.01000,2.455
379
+ 53,mlp.down_proj,330.54104,0.01000,23.643
380
+ 54,self_attn.k_proj,21.23220,0.01000,3.305
381
+ 54,self_attn.v_proj,40.19472,0.01000,1.904
382
+ 54,self_attn.q_proj,82.89218,0.01000,2.013
383
+ 54,self_attn.o_proj,52.05958,0.01000,2.808
384
+ 54,mlp.up_proj,644.24573,0.01000,3.539
385
+ 54,mlp.gate_proj,641.55237,0.01000,2.416
386
+ 54,mlp.down_proj,350.10202,0.01000,23.670
387
+ 55,self_attn.k_proj,19.44832,0.01000,3.319
388
+ 55,self_attn.v_proj,39.86784,0.01000,1.927
389
+ 55,self_attn.q_proj,84.99967,0.01000,2.042
390
+ 55,self_attn.o_proj,61.12057,0.01000,2.815
391
+ 55,mlp.up_proj,694.21804,0.01000,3.510
392
+ 55,mlp.gate_proj,678.76851,0.01000,2.435
393
+ 55,mlp.down_proj,401.86646,0.01000,23.683
394
+ 56,self_attn.k_proj,23.57326,0.01000,3.332
395
+ 56,self_attn.v_proj,58.79263,0.01000,1.935
396
+ 56,self_attn.q_proj,95.75816,0.01000,2.019
397
+ 56,self_attn.o_proj,52.32878,0.01000,2.833
398
+ 56,mlp.up_proj,746.73790,0.01000,3.541
399
+ 56,mlp.gate_proj,733.93956,0.01000,2.438
400
+ 56,mlp.down_proj,438.57271,0.01000,23.702
401
+ 57,self_attn.k_proj,21.81357,0.01000,3.333
402
+ 57,self_attn.v_proj,58.80958,0.01000,1.916
403
+ 57,self_attn.q_proj,96.38024,0.01000,2.035
404
+ 57,self_attn.o_proj,51.38194,0.01000,2.820
405
+ 57,mlp.up_proj,799.61796,0.01000,3.541
406
+ 57,mlp.gate_proj,771.37184,0.01000,2.438
407
+ 57,mlp.down_proj,481.00989,0.01000,23.747
408
+ 58,self_attn.k_proj,23.75545,0.01000,3.332
409
+ 58,self_attn.v_proj,70.84536,0.01000,1.934
410
+ 58,self_attn.q_proj,96.86566,0.01000,2.032
411
+ 58,self_attn.o_proj,49.27853,0.01000,2.848
412
+ 58,mlp.up_proj,869.68038,0.01000,3.530
413
+ 58,mlp.gate_proj,821.56930,0.01000,2.459
414
+ 58,mlp.down_proj,578.50016,0.01000,23.705
415
+ 59,self_attn.k_proj,26.05345,0.01000,3.340
416
+ 59,self_attn.v_proj,93.06062,0.01000,1.932
417
+ 59,self_attn.q_proj,116.08675,0.01000,2.030
418
+ 59,self_attn.o_proj,114.26896,0.01000,2.843
419
+ 59,mlp.up_proj,956.03853,0.01000,3.531
420
+ 59,mlp.gate_proj,890.38750,0.01000,2.433
421
+ 59,mlp.down_proj,692.34074,0.01000,23.720
422
+ 60,self_attn.k_proj,22.77888,0.01000,3.334
423
+ 60,self_attn.v_proj,108.71282,0.01000,1.922
424
+ 60,self_attn.q_proj,113.01335,0.01000,2.040
425
+ 60,self_attn.o_proj,148.25188,0.01000,2.830
426
+ 60,mlp.up_proj,1042.81598,0.01000,3.531
427
+ 60,mlp.gate_proj,954.13782,0.01000,2.446
428
+ 60,mlp.down_proj,1130.72294,0.01000,23.736
429
+ 61,self_attn.k_proj,26.17376,0.01000,3.321
430
+ 61,self_attn.v_proj,147.72431,0.01000,1.925
431
+ 61,self_attn.q_proj,127.80374,0.01000,2.025
432
+ 61,self_attn.o_proj,190.04517,0.01000,2.822
433
+ 61,mlp.up_proj,1187.25134,0.01000,3.543
434
+ 61,mlp.gate_proj,1082.49106,0.01000,2.441
435
+ 61,mlp.down_proj,1350.51108,0.01000,23.699
436
+ 62,self_attn.k_proj,25.79631,0.01000,3.318
437
+ 62,self_attn.v_proj,173.76084,0.01000,1.921
438
+ 62,self_attn.q_proj,131.01425,0.01000,2.032
439
+ 62,self_attn.o_proj,368.42813,0.01000,2.823
440
+ 62,mlp.up_proj,1142.54573,0.01000,3.534
441
+ 62,mlp.gate_proj,1070.04494,0.01000,2.420
442
+ 62,mlp.down_proj,2184.37816,0.01000,23.732
443
+ 63,self_attn.k_proj,22.13227,0.01000,3.319
444
+ 63,self_attn.v_proj,119.49068,0.01000,1.925
445
+ 63,self_attn.q_proj,89.71157,0.01000,2.028
446
+ 63,self_attn.o_proj,141.02466,0.01000,2.822
447
+ 63,mlp.up_proj,1223.05870,0.01000,3.519
448
+ 63,mlp.gate_proj,1174.51392,0.01000,2.429
449
+ 63,mlp.down_proj,3595.41361,0.01000,23.665
quantize_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bits": 4,
3
+ "group_size": 128,
4
+ "desc_act": true,
5
+ "sym": true,
6
+ "lm_head": false,
7
+ "quant_method": "gptq",
8
+ "checkpoint_format": "gptq",
9
+ "pack_dtype": "int32",
10
+ "meta": {
11
+ "quantizer": [
12
+ "gptqmodel:1.9.0"
13
+ ],
14
+ "uri": "https://github.com/modelcloud/gptqmodel",
15
+ "damp_percent": 0.01,
16
+ "damp_auto_increment": 0.0025,
17
+ "static_groups": false,
18
+ "true_sequential": true,
19
+ "mse": 0.0
20
+ }
21
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83396048d512ec1f3178af0d7c1f79a226bba041822614b0e26a4fd2d4b55bf7
3
+ size 11421995
tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Virtuoso Medium, created by Arcee AI. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 131072,
204
+ "pad_token": "<|endoftext|>",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff