JustJaro commited on
Commit
9252c6c
·
verified ·
1 Parent(s): ce05413

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,483 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - gptq
4
+ - quantization
5
+ - 4bit
6
+ - confidentialmind
7
+ - text-generation
8
+ - apache2.0
9
+ - mistral-small-24b
10
+ ---
11
+ # 🔥 Quantized Model: Rombos-LLM-V2.6-Qwen-14b_gptq_g32_4bit 🔥
12
+
13
+ This is a 4-bit quantized version of [rombodawg/Rombos-LLM-V2.6-Qwen-14b](https://huggingface.co/rombodawg/Rombos-LLM-V2.6-Qwen-14b) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
14
+ It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 128 resulting in a
15
+ smaller,
16
+ faster model with minimal performance degradation.
17
+
18
+ Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
19
+
20
+ *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.
21
+
22
+ ## Model Details
23
+ - **Original Model:** [rombodawg/Rombos-LLM-V2.6-Qwen-14b](https://huggingface.co/rombodawg/Rombos-LLM-V2.6-Qwen-14b)
24
+ - **Quantized Model:** Rombos-LLM-V2.6-Qwen-14b_gptq_g32_4bit (this repository)
25
+ - **Quantization Method:** GPTQ (4-bit, group size 128)
26
+ - **Quantization Library:** [GPTQModel](https://github.com/ModelCloud/GPTQModel/tree/main)
27
+ - **Calibration Dataset:** neuralmagic/LLM_compression_calibration (using 1536 samples with seq len 6144)
28
+ - **Quantized by:** [ConfidentialMind.com](https://www.confidentialmind.com)
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from gptqmodel import GPTQModel
34
+ from transformers import AutoTokenizer
35
+
36
+ # Use the local directory or JustJaro/Rombos-LLM-V2.6-Qwen-14b_gptq_g32_4bit after upload
37
+ quantized_model_id = "/home/jaro/models/quantized/Rombos-LLM-V2.6-Qwen-14b_gptq_g32_4bit" # or "JustJaro/Rombos-LLM-V2.6-Qwen-14b_gptq_g32_4bit"
38
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_id)
39
+ model = GPTQModel.load(quantized_model_id, device="cuda:0") # or "cpu"
40
+
41
+ input_text = "This is a test prompt"
42
+ inputs = tokenizer(input_text, return_tensors="pt").to("cuda:0")
43
+ outputs = model.generate(**inputs)
44
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
45
+ ```
46
+
47
+ ## Package Versions and Installation Instructions
48
+
49
+ See pyproject.toml for the exact UV project file. See the [GPTQModel](
50
+ https://github.com/ModelCloud/GPTQModel/tree/main) repo for more details. on how to install the package.
51
+
52
+ Use the provided pyproject.toml:
53
+
54
+ ```bash
55
+ uv venv
56
+ source venv/bin/activate
57
+ uv sync
58
+ ```
59
+
60
+ ### Environment Variables
61
+
62
+ ```bash
63
+ HF_TOKEN=<YOUR_HF_TOKEN>
64
+ TOKENIZERS_PARALLELISM="true"
65
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
66
+ ```
67
+
68
+ ## Quantization Script
69
+ Below is the exact quantize.py script used to generate this model (with the exact versions of the dependencies):
70
+
71
+ ```python
72
+ #!/usr/bin/env python3
73
+ """
74
+ This script loads a source Hugging Face model and a calibration dataset,
75
+ quantizes the model using GPTQModel (with 4-bit precision and group size 128),
76
+ saves the quantized model using the Transformers API with safetensors (safe serialization)
77
+ under ~/models/quantized/, and then creates/updates a Hugging Face repository (with the
78
+ _gptq_g128_4bit suffix) by uploading the model, tokenizer, and an auto-generated README.md.
79
+
80
+ Usage example:
81
+ python quantize.py --source-model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
82
+ --calibration-dataset wikitext/wikitext-2-raw-v1 \
83
+ --seq-len 1024 --nsamples 256 --hf-token <YOUR_HF_TOKEN>
84
+ """
85
+
86
+ import os
87
+ import shutil
88
+ import subprocess
89
+ from enum import Enum
90
+ from pathlib import Path
91
+ from typing import List
92
+
93
+ import torch
94
+ import typer
95
+ from datasets import load_dataset
96
+ from dotenv import load_dotenv, find_dotenv
97
+ from gptqmodel import GPTQModel, QuantizeConfig
98
+ from gptqmodel.utils import Perplexity
99
+ # For later pushing to the model hub
100
+ from huggingface_hub import HfApi
101
+ from transformers import AutoTokenizer, PreTrainedTokenizerBase
102
+
103
+ load_dotenv(find_dotenv())
104
+ HF_TOKEN = os.getenv("HF_TOKEN")
105
+
106
+ app = typer.Typer()
107
+
108
+ class GroupSize(str, Enum):
109
+ accurate:int = 32
110
+ balanced:int = 64
111
+ fast:int = 128
112
+
113
+
114
+ def get_text_from_example(example: dict) -> str:
115
+ """
116
+ Returns text from a dataset example.
117
+ If the example contains a "text" field, and it is nonempty, that text is used.
118
+ Otherwise, if it has a "messages" field (a list of dicts with a "content" key),
119
+ the function returns the concatenation of all non-empty message contents.
120
+ """
121
+ if "text" in example and example["text"]:
122
+ return example["text"]
123
+ elif "messages" in example:
124
+ contents = [msg.get("content", "").strip() for msg in example["messages"]]
125
+ return " ".join([s for s in contents if s])
126
+ else:
127
+ return ""
128
+
129
+
130
+ def get_calibration_dataset(
131
+ tokenizer: PreTrainedTokenizerBase,
132
+ nsamples: int,
133
+ seqlen: int,
134
+ calibration_dataset: str
135
+ ) -> List[dict]:
136
+ """
137
+ Loads a calibration dataset from the Hugging Face Hub (or from a local file).
138
+ It accepts datasets with a single "text" field (like wikitext)
139
+ or with a "messages" field (as in the Neural Magic LLM Compression Calibration dataset).
140
+ Only examples whose extracted text length is at least 'seqlen' are kept.
141
+ Each chosen example is tokenized (with truncation up to 'seqlen') and returned as a dict.
142
+ """
143
+ ds = None
144
+ try:
145
+ # Attempt to load from HF Hub.
146
+ try:
147
+ if "/" in calibration_dataset:
148
+ parts = calibration_dataset.split("/", 1)
149
+ ds = load_dataset(parts[0], parts[1], split="train")
150
+ else:
151
+ ds = load_dataset(calibration_dataset, split="train")
152
+ except Exception as e:
153
+ print(f"Error loading dataset '{calibration_dataset}' via load_dataset: {e}")
154
+ ds = load_dataset(calibration_dataset, split="train")
155
+ print(f"Loaded calibration dataset from full remote path {calibration_dataset}.")
156
+
157
+
158
+ except Exception as e:
159
+ print(f"Error loading dataset '{calibration_dataset}' via load_dataset: {e}")
160
+ # Fallback: if the supplied calibration_dataset is a local path, try to load it as JSON-lines.
161
+ if os.path.exists(calibration_dataset):
162
+ try:
163
+ ds = load_dataset("json", data_files=calibration_dataset, split="train")
164
+ print(f"Loaded calibration dataset from local file {calibration_dataset}.")
165
+ except Exception as e2:
166
+ print(f"Error loading local json dataset from '{calibration_dataset}': {e2}")
167
+ return []
168
+ else:
169
+ return []
170
+
171
+ print(f"Dataset features: {ds.features}")
172
+
173
+ # Filter examples that have at least 80% 'seqlen' of extracted text (wikitext-2-raw-v1 dataset has short examples).
174
+ ds = ds.filter(lambda x: len(get_text_from_example(x)) <= int(seqlen*0.8))
175
+ sample_range = min(nsamples, len(ds))
176
+ calibration_data = []
177
+ for i in range(sample_range):
178
+ example = ds[i]
179
+ text = get_text_from_example(example)
180
+ tokenized = tokenizer(text, truncation=True, max_length=seqlen, return_tensors="pt")
181
+ tokenized = {k: v.squeeze(0) for k, v in tokenized.items()}
182
+ calibration_data.append(tokenized)
183
+ return calibration_data
184
+
185
+
186
+ def calculate_avg_ppl(model, tokenizer):
187
+ """
188
+ Computes the average perplexity on the wikitext-2-raw-v1 train split using GPTQModel's Perplexity utility.
189
+ """
190
+ ppl = Perplexity(
191
+ model=model,
192
+ tokenizer=tokenizer,
193
+ dataset_path="wikitext",
194
+ dataset_name="wikitext-2-raw-v1",
195
+ split="train",
196
+ text_column="text",
197
+ )
198
+ ppl_values = ppl.calculate(n_ctx=512, n_batch=512)
199
+ avg = sum(ppl_values) / len(ppl_values)
200
+ return avg
201
+
202
+
203
+ def get_pinned_package_versions():
204
+ """
205
+ Retrieves pinned package versions using 'uv pip freeze'.
206
+ Returns a dictionary mapping lowercased package names to their versions.
207
+ """
208
+ try:
209
+ result = subprocess.run(["uv", "pip", "freeze"], capture_output=True, text=True, check=True)
210
+ packages_output = result.stdout.strip()
211
+ versions = {}
212
+ for line in packages_output.splitlines():
213
+ if "==" in line:
214
+ package_name, package_version = line.split("==", 1)
215
+ versions[package_name.lower()] = package_version
216
+ return versions
217
+ except subprocess.CalledProcessError as e:
218
+ typer.echo(f"Error running 'uv pip freeze': {e}", err=True)
219
+ return {}
220
+ except FileNotFoundError:
221
+ typer.echo("uv command not found. Make sure uv is installed and in your PATH.", err=True)
222
+ return {}
223
+
224
+
225
+ @app.command()
226
+ def main(
227
+ seq_len: int = typer.Option(4096, help="Sequence length for tokenization and calibration."),
228
+ nsamples: int = typer.Option(512, help="Number of samples to use for calibration."),
229
+ source_model: str = typer.Option("rombodawg/Rombos-LLM-V2.6-Qwen-14b",
230
+ help="Source model HF repository identifier."),
231
+ calibration_dataset: str = typer.Option("wikitext/wikitext-2-raw-v1",
232
+ help="Calibration dataset identifier (in 'dataset/config' format) or local file path."),
233
+ hf_token: str = typer.Option(HF_TOKEN,
234
+ help="Hugging Face token for creating/updating your repo."),
235
+ upload_only: bool = typer.Option(False, help="Only upload the quantized model to the Hugging Face Hub."),
236
+ # Allow for 32, 64, 128 only using typer:
237
+ group_size: GroupSize = typer.Option(GroupSize.accurate, help="Group size for quantization accurate: 32, "
238
+ "balanced: 64, fast: 128. Default: accurate."),
239
+ mse: bool = typer.Option(True, help="Use mse instead of mae for the loss function."),
240
+ size_multi: int = typer.Option(3.5, help="Model size multiplier depends on the source model. Default: 1."),
241
+ ):
242
+ # Prepare destination directory and model names.
243
+ model_name = source_model.split("/")[-1]
244
+ if not size_multi == 1:
245
+ size_multiplier = size_multi
246
+ size_multiplier_len = size_multiplier / 2
247
+ else:
248
+ size_multiplier = 1
249
+ size_multiplier_len = 1
250
+ nsamples = int(nsamples * size_multiplier)
251
+ seq_len = int(seq_len * size_multiplier_len)
252
+ quantized_model_name = f"{model_name}_gptq_g{int(group_size.value)}_4bit"
253
+ quantized_model_dir = os.path.expanduser(os.path.join("~/models/quantized", quantized_model_name))
254
+ if not upload_only:
255
+ # Remove the directory if it already exists
256
+ if os.path.exists(quantized_model_dir):
257
+ shutil.rmtree(quantized_model_dir)
258
+ # Create directory for quantized model.
259
+ os.makedirs(quantized_model_dir, exist_ok=True)
260
+
261
+ typer.echo("Loading tokenizer from source model...")
262
+ tokenizer_obj = AutoTokenizer.from_pretrained(source_model, use_fast=True)
263
+
264
+ typer.echo("Loading calibration dataset...")
265
+ typer.echo(f"Calibration dataset: {calibration_dataset}")
266
+ calibration_data = get_calibration_dataset(tokenizer_obj, nsamples, seq_len, calibration_dataset)
267
+ if not calibration_data:
268
+ typer.echo("Calibration dataset is empty. Aborting.", err=True)
269
+ raise typer.Exit(code=1)
270
+ if mse:
271
+ # Fits mistral-small-24b particularly well, as well as the increased damp_percent
272
+ mse = 0.01
273
+ quantize_config = QuantizeConfig(bits=4, group_size=int(group_size.value), damp_percent=0.015, mse=mse)
274
+ else:
275
+ quantize_config = QuantizeConfig(bits=4, group_size=int(group_size.value), damp_percent=0.01)
276
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
277
+ typer.echo(f"Loading model in {device} mode...")
278
+ model = GPTQModel.load(source_model, quantize_config)
279
+
280
+ typer.echo("Quantizing model...")
281
+ group_size_factor = int(128 / int(group_size.value))
282
+ model.quantize(calibration_data, auto_gc=False,
283
+ batch_size=max(1, int(int((nsamples * 0.1) / group_size_factor) *
284
+ int(size_multiplier_len))))
285
+ # Retrieve Hugging Face user info for README generation.
286
+ package_versions = get_pinned_package_versions()
287
+ username = get_my_user(hf_token)
288
+
289
+ script_content = self_read_script()
290
+
291
+ typer.echo(f"Saving quantized model to {quantized_model_dir} using Transformers safe serialization...")
292
+ try:
293
+ model.save_pretrained(quantized_model_dir)
294
+ tokenizer_obj.save_pretrained(quantized_model_dir)
295
+ except Exception as ex:
296
+ typer.echo(f"Error during saving with safe_serialization: {ex}. Aborting.")
297
+ raise
298
+ typer.echo(f"Model uploaded to Hugging Face repo: {quantized_model_name}")
299
+ else:
300
+ tokenizer_obj = AutoTokenizer.from_pretrained(source_model, use_fast=True)
301
+ package_versions = get_pinned_package_versions()
302
+ username = get_my_user(hf_token)
303
+ script_content = self_read_script()
304
+
305
+
306
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
307
+ model = GPTQModel.load(quantized_model_dir, device=device)
308
+ avg_ppl = calculate_avg_ppl(model, tokenizer_obj)
309
+ typer.echo(f"Average perplexity (PPL) on wikitext v2 dataset: {avg_ppl}")
310
+ deps = Path("./pyproject.toml")
311
+ shutil.copy(deps, quantized_model_dir)
312
+ generate_readme(calibration_dataset, nsamples, quantized_model_dir,
313
+ quantized_model_name, script_content, seq_len, source_model, username, avg_ppl)
314
+ GPTQModel.push_to_hub(quantized_path=quantized_model_dir, private=False, repo_id=quantized_model_name,
315
+ token=HF_TOKEN)
316
+ typer.echo(f"Model uploaded to Hugging Face repo: {quantized_model_name}")
317
+ demo_input = tokenizer_obj("test is", return_tensors="pt").to(device)
318
+ generated_ids = model.generate(**demo_input)
319
+ output_text = tokenizer_obj.decode(generated_ids[0])
320
+ typer.echo(f"Inference demo output: {output_text}")
321
+ typer.echo(f"Average perplexity (PPL) on calibration dataset: {avg_ppl}")
322
+
323
+
324
+ def self_read_script():
325
+ try:
326
+ script_path = os.path.abspath(__file__)
327
+ with open(script_path, "r") as f:
328
+ script_content = f.read()
329
+ except Exception as e:
330
+ script_content = "Error reading script content: " + str(e)
331
+ return script_content
332
+
333
+
334
+ def get_my_user(hf_token):
335
+ api = HfApi(token=hf_token)
336
+ user_info = api.whoami()
337
+ try:
338
+ username = user_info.get("name") or user_info.get("username")
339
+ except Exception as e:
340
+ typer.echo(f"Error retrieving username from Hugging Face API: {e}. Using default username.")
341
+ username = api.whoami()
342
+ if not username:
343
+ typer.echo("Could not determine your Hugging Face username from the token, defaulting to hard coded username.",
344
+ err=True)
345
+ username = "JustJaro"
346
+ return username
347
+
348
+
349
+ def generate_readme(calibration_dataset, nsamples, quantized_model_dir,
350
+ quantized_model_name, script_content, seq_len, source_model, username, avg_ppl):
351
+ readme_content = f"""---
352
+ tags:
353
+ - gptq
354
+ - quantization
355
+ - 4bit
356
+ - confidentialmind
357
+ - text-generation
358
+ - apache2.0
359
+ - mistral-small-24b
360
+ ---
361
+ # 🔥 Quantized Model: {quantized_model_name} 🔥
362
+
363
+ This is a 4-bit quantized version of [{source_model}](https://huggingface.co/{source_model}) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
364
+ It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 128 resulting in a
365
+ smaller,
366
+ faster model with minimal performance degradation.
367
+
368
+ Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
369
+
370
+ *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.
371
+
372
+ ## Model Details
373
+ - **Original Model:** [{source_model}](https://huggingface.co/{source_model})
374
+ - **Quantized Model:** {quantized_model_name} (this repository)
375
+ - **Quantization Method:** GPTQ (4-bit, group size 128)
376
+ - **Quantization Library:** [GPTQModel](https://github.com/ModelCloud/GPTQModel/tree/main)
377
+ - **Calibration Dataset:** {calibration_dataset} (using {nsamples} samples with seq len {seq_len})
378
+ - **Quantized by:** [ConfidentialMind.com](https://www.confidentialmind.com)
379
+
380
+ ## Usage
381
+
382
+ ```python
383
+ from gptqmodel import GPTQModel
384
+ from transformers import AutoTokenizer
385
+
386
+ # Use the local directory or {username}/{quantized_model_name} after upload
387
+ quantized_model_id = "{quantized_model_dir}" # or "{username}/{quantized_model_name}"
388
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_id)
389
+ model = GPTQModel.load(quantized_model_id, device="cuda:0") # or "cpu"
390
+
391
+ input_text = "This is a test prompt"
392
+ inputs = tokenizer(input_text, return_tensors="pt").to("cuda:0")
393
+ outputs = model.generate(**inputs)
394
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
395
+ ```
396
+
397
+ ## Package Versions and Installation Instructions
398
+
399
+ See pyproject.toml for the exact UV project file. See the [GPTQModel](
400
+ https://github.com/ModelCloud/GPTQModel/tree/main) repo for more details. on how to install the package.
401
+
402
+ Use the provided pyproject.toml:
403
+
404
+ ```bash
405
+ uv venv
406
+ source venv/bin/activate
407
+ uv sync
408
+ ```
409
+
410
+ ### Environment Variables
411
+
412
+ ```bash
413
+ HF_TOKEN=<YOUR_HF_TOKEN>
414
+ TOKENIZERS_PARALLELISM="true"
415
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
416
+ ```
417
+
418
+ ## Quantization Script
419
+ Below is the exact quantize.py script used to generate this model (with the exact versions of the dependencies):
420
+
421
+ ```python
422
+ {script_content}
423
+ ```
424
+
425
+ ## Quantization Performance
426
+
427
+ Average perplexity (PPL) on wikitext v2 dataset: {avg_ppl}
428
+
429
+ ## Disclaimer
430
+ This model is for research purposes only. It may inherit limitations and biases from the original model and the quantization process. Please use responsibly and refer to the original model card for more details.
431
+
432
+ ## Contact
433
+ For any questions or support, please visit [ConfidentialMind.com](https://www.confidentialmind.com) or contact us directly.
434
+
435
+ ## License
436
+ This model inherits the license from the original model. Please refer to the original model card for more details.
437
+ Original model card: `{source_model}`
438
+
439
+ ## Author
440
+ This model was quantized by [Jaro](https://www.linkedin.com/in/jaroai/)
441
+
442
+ ## Acknowledgements
443
+ Quantization performed using the GPTQModel pipeline.
444
+
445
+ TODO: Add `gptqmodel.utils.eval` integration and auto-generation of eval table.
446
+
447
+ ---
448
+ *Generated and quantized using GPTQModel.*
449
+ """
450
+ readme_path = os.path.join(quantized_model_dir, "README.md")
451
+ with open(readme_path, "w") as f:
452
+ f.write(readme_content)
453
+ typer.echo("README.md created with detailed information.")
454
+
455
+
456
+ if __name__ == "__main__":
457
+ app()
458
+ ```
459
+
460
+ ## Quantization Performance
461
+
462
+ Average perplexity (PPL) on wikitext v2 dataset: 108.12590932665465
463
+
464
+ ## Disclaimer
465
+ This model is for research purposes only. It may inherit limitations and biases from the original model and the quantization process. Please use responsibly and refer to the original model card for more details.
466
+
467
+ ## Contact
468
+ For any questions or support, please visit [ConfidentialMind.com](https://www.confidentialmind.com) or contact us directly.
469
+
470
+ ## License
471
+ This model inherits the license from the original model. Please refer to the original model card for more details.
472
+ Original model card: `rombodawg/Rombos-LLM-V2.6-Qwen-14b`
473
+
474
+ ## Author
475
+ This model was quantized by [Jaro](https://www.linkedin.com/in/jaroai/)
476
+
477
+ ## Acknowledgements
478
+ Quantization performed using the GPTQModel pipeline.
479
+
480
+ TODO: Add `gptqmodel.utils.eval` integration and auto-generation of eval table.
481
+
482
+ ---
483
+ *Generated and quantized using GPTQModel.*
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "_name_or_path": "/home/jaro/.cache/huggingface/hub/models--rombodawg--Rombos-LLM-V2.6-Qwen-14b/snapshots/8cd17daafd35682b445c0ef8ac6722223c85ffd3",
4
+ "architectures": [
5
+ "Qwen2ForCausalLM"
6
+ ],
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 151643,
9
+ "eos_token_id": 151643,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 5120,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 13824,
14
+ "max_position_embeddings": 131072,
15
+ "max_window_layers": 48,
16
+ "model_type": "qwen2",
17
+ "num_attention_heads": 40,
18
+ "num_hidden_layers": 48,
19
+ "num_key_value_heads": 8,
20
+ "quantization_config": {
21
+ "bits": 4,
22
+ "checkpoint_format": "gptq",
23
+ "desc_act": true,
24
+ "group_size": 32,
25
+ "lm_head": false,
26
+ "meta": {
27
+ "damp_auto_increment": 0.0025,
28
+ "damp_percent": 0.015,
29
+ "mse": 0.01,
30
+ "quantizer": [
31
+ "gptqmodel:1.9.0"
32
+ ],
33
+ "static_groups": false,
34
+ "true_sequential": true,
35
+ "uri": "https://github.com/modelcloud/gptqmodel"
36
+ },
37
+ "pack_dtype": "int32",
38
+ "quant_method": "gptq",
39
+ "sym": true
40
+ },
41
+ "rms_norm_eps": 1e-05,
42
+ "rope_scaling": null,
43
+ "rope_theta": 1000000.0,
44
+ "sliding_window": null,
45
+ "tie_word_embeddings": false,
46
+ "torch_dtype": "bfloat16",
47
+ "transformers_version": "4.48.3",
48
+ "use_cache": true,
49
+ "use_sliding_window": false,
50
+ "vocab_size": 152064
51
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f40da77e49c1c99863d8f8a94c5482489aca68fdc29e29efc7de8e835258ee17
3
+ size 3983800728
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58de98ba220dde5a2d27a70420fbca5406d5a46414ca092e561fd0c71ff11aa7
3
+ size 3983658040
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04a6fe421acf0a845627230e95c3b8bc63fb3485bf91cd075834ee6560cfe2aa
3
+ size 2795445280
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
pyproject.toml ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["uv", "setuptools>=61.0", "wheel"] # uv for uv-aware builds, setuptools for packaging
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "cquantize"
7
+ version = "0.1.0"
8
+ description = "Quantization script module for confidentialmind-graph project for 4bit GPTQ quantizations (so far)"
9
+ readme = "README.md"
10
+ requires-python = ">=3.11,<=3.13.10" # 3.13.8 is used in the main project
11
+
12
+ dependencies = [
13
+ "python-dotenv>=1.0.1",
14
+ "gptqmodel>=1.9.0",
15
+ "threadpoolctl>=3.5.0",
16
+ "tokenicer>=0.0.2",
17
+ "device-smi>=0.3.3",
18
+ "pillow>=11.1.0",
19
+ "torch>=2.6.0",
20
+ "accelerate>=1.3.0",
21
+ "safetensors>=0.5.2",
22
+ "transformers>=4.48.3",
23
+ "datasets>=3.3.0",
24
+ "huggingface-hub>=0.28.1",
25
+ "typer>=0.15.1",
26
+ ]
27
+
28
+ [tool.setuptools.package-data]
29
+ quantize = ["README.md", "*.py"] # Include README and Python files if packaged
quant_log.csv ADDED
@@ -0,0 +1,337 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ layer,module,loss,damp,time
2
+ 0,self_attn.k_proj,0.37561,0.01500,10.055
3
+ 0,self_attn.v_proj,0.06572,0.01500,9.529
4
+ 0,self_attn.q_proj,0.93163,0.01500,10.391
5
+ 0,self_attn.o_proj,0.49555,0.01500,10.511
6
+ 0,mlp.up_proj,0.52921,0.01500,10.883
7
+ 0,mlp.gate_proj,0.61072,0.01500,10.689
8
+ 0,mlp.down_proj,0.29186,0.01500,28.518
9
+ 1,self_attn.k_proj,0.02512,0.01500,9.566
10
+ 1,self_attn.v_proj,0.00690,0.01500,9.339
11
+ 1,self_attn.q_proj,0.13439,0.01500,10.163
12
+ 1,self_attn.o_proj,0.02494,0.01500,10.232
13
+ 1,mlp.up_proj,4.55129,0.01500,10.685
14
+ 1,mlp.gate_proj,32.37974,0.01500,10.493
15
+ 1,mlp.down_proj,0.03303,0.01500,28.148
16
+ 2,self_attn.k_proj,0.13044,0.01500,9.570
17
+ 2,self_attn.v_proj,0.02280,0.01500,9.314
18
+ 2,self_attn.q_proj,0.38131,0.01500,10.085
19
+ 2,self_attn.o_proj,0.06986,0.01500,10.242
20
+ 2,mlp.up_proj,10.17106,0.01500,10.615
21
+ 2,mlp.gate_proj,26.00120,0.01500,10.511
22
+ 2,mlp.down_proj,0.16400,0.01500,28.158
23
+ 3,self_attn.k_proj,0.62347,0.01500,9.584
24
+ 3,self_attn.v_proj,0.10256,0.01500,9.318
25
+ 3,self_attn.q_proj,1.56732,0.01500,10.138
26
+ 3,self_attn.o_proj,0.14301,0.01500,10.234
27
+ 3,mlp.up_proj,9.13295,0.01500,10.642
28
+ 3,mlp.gate_proj,30.38428,0.01500,10.435
29
+ 3,mlp.down_proj,0.29694,0.01500,28.121
30
+ 4,self_attn.k_proj,0.54692,0.01500,9.468
31
+ 4,self_attn.v_proj,0.13745,0.01500,9.323
32
+ 4,self_attn.q_proj,1.52606,0.01500,10.098
33
+ 4,self_attn.o_proj,0.27112,0.01500,10.144
34
+ 4,mlp.up_proj,15.79222,0.01500,10.602
35
+ 4,mlp.gate_proj,40.42462,0.01500,10.467
36
+ 4,mlp.down_proj,13389.03255,0.01500,28.034
37
+ 5,self_attn.k_proj,1.32183,0.01500,9.466
38
+ 5,self_attn.v_proj,0.48145,0.01500,9.252
39
+ 5,self_attn.q_proj,4.20482,0.01500,10.025
40
+ 5,self_attn.o_proj,0.11614,0.01500,10.148
41
+ 5,mlp.up_proj,61.69207,0.01500,10.529
42
+ 5,mlp.gate_proj,192.63190,0.01500,10.384
43
+ 5,mlp.down_proj,58.10828,0.01500,28.054
44
+ 6,self_attn.k_proj,1.33723,0.01500,9.496
45
+ 6,self_attn.v_proj,0.60071,0.01500,9.204
46
+ 6,self_attn.q_proj,4.79351,0.01500,10.014
47
+ 6,self_attn.o_proj,0.23736,0.01500,10.086
48
+ 6,mlp.up_proj,35.89475,0.01500,10.523
49
+ 6,mlp.gate_proj,175.79995,0.01500,10.375
50
+ 6,mlp.down_proj,25.07587,0.01500,27.974
51
+ 7,self_attn.k_proj,1.63740,0.01500,9.437
52
+ 7,self_attn.v_proj,0.81178,0.01500,9.216
53
+ 7,self_attn.q_proj,5.62039,0.01500,10.016
54
+ 7,self_attn.o_proj,0.18038,0.01500,10.109
55
+ 7,mlp.up_proj,31.86679,0.01500,10.531
56
+ 7,mlp.gate_proj,188.46973,0.01500,10.442
57
+ 7,mlp.down_proj,9.96931,0.01800,28.042
58
+ 8,self_attn.k_proj,1.87012,0.01500,9.468
59
+ 8,self_attn.v_proj,0.68184,0.01500,9.237
60
+ 8,self_attn.q_proj,5.96090,0.01500,9.993
61
+ 8,self_attn.o_proj,0.20523,0.01500,10.161
62
+ 8,mlp.up_proj,16.87071,0.01500,10.573
63
+ 8,mlp.gate_proj,87.21503,0.01500,10.401
64
+ 8,mlp.down_proj,3.89086,0.01950,28.255
65
+ 9,self_attn.k_proj,1.56542,0.01500,9.416
66
+ 9,self_attn.v_proj,0.78797,0.01500,9.199
67
+ 9,self_attn.q_proj,5.33111,0.01500,10.047
68
+ 9,self_attn.o_proj,0.38198,0.01500,10.118
69
+ 9,mlp.up_proj,9.64384,0.01500,10.596
70
+ 9,mlp.gate_proj,11.37062,0.01500,10.455
71
+ 9,mlp.down_proj,1.56816,0.01500,27.977
72
+ 10,self_attn.k_proj,2.11638,0.01500,9.457
73
+ 10,self_attn.v_proj,1.19567,0.01500,9.244
74
+ 10,self_attn.q_proj,7.52943,0.01500,10.013
75
+ 10,self_attn.o_proj,0.55364,0.01500,10.109
76
+ 10,mlp.up_proj,12.08244,0.01500,10.532
77
+ 10,mlp.gate_proj,14.55569,0.01500,10.456
78
+ 10,mlp.down_proj,2.35729,0.01500,27.986
79
+ 11,self_attn.k_proj,1.74899,0.01500,9.410
80
+ 11,self_attn.v_proj,0.86311,0.01500,9.176
81
+ 11,self_attn.q_proj,6.40680,0.01500,10.019
82
+ 11,self_attn.o_proj,0.59713,0.01500,10.131
83
+ 11,mlp.up_proj,14.62552,0.01500,10.560
84
+ 11,mlp.gate_proj,22.76668,0.01500,10.527
85
+ 11,mlp.down_proj,1.65943,0.01500,27.924
86
+ 12,self_attn.k_proj,2.17744,0.01500,9.529
87
+ 12,self_attn.v_proj,0.87480,0.01500,9.214
88
+ 12,self_attn.q_proj,8.09695,0.01500,10.039
89
+ 12,self_attn.o_proj,0.82904,0.01500,10.101
90
+ 12,mlp.up_proj,15.07694,0.01500,10.631
91
+ 12,mlp.gate_proj,16.26723,0.01500,10.431
92
+ 12,mlp.down_proj,2.22720,0.01500,27.856
93
+ 13,self_attn.k_proj,2.84977,0.01500,9.448
94
+ 13,self_attn.v_proj,1.14955,0.01500,9.250
95
+ 13,self_attn.q_proj,9.01724,0.01500,10.029
96
+ 13,self_attn.o_proj,1.14204,0.01500,10.103
97
+ 13,mlp.up_proj,18.93312,0.01500,10.562
98
+ 13,mlp.gate_proj,20.30654,0.01500,10.429
99
+ 13,mlp.down_proj,2.93490,0.01500,28.034
100
+ 14,self_attn.k_proj,3.38938,0.01500,9.517
101
+ 14,self_attn.v_proj,1.34478,0.01500,9.200
102
+ 14,self_attn.q_proj,11.40658,0.01500,10.022
103
+ 14,self_attn.o_proj,1.18970,0.01500,10.177
104
+ 14,mlp.up_proj,21.18795,0.01500,10.599
105
+ 14,mlp.gate_proj,22.57610,0.01500,10.458
106
+ 14,mlp.down_proj,3.79451,0.01500,27.976
107
+ 15,self_attn.k_proj,3.06189,0.01500,9.480
108
+ 15,self_attn.v_proj,1.77053,0.01500,9.241
109
+ 15,self_attn.q_proj,10.71523,0.01500,10.050
110
+ 15,self_attn.o_proj,1.63805,0.01500,10.206
111
+ 15,mlp.up_proj,23.73802,0.01500,10.587
112
+ 15,mlp.gate_proj,26.06912,0.01500,10.449
113
+ 15,mlp.down_proj,4.61325,0.01500,28.004
114
+ 16,self_attn.k_proj,3.56913,0.01500,9.484
115
+ 16,self_attn.v_proj,1.22035,0.01500,9.233
116
+ 16,self_attn.q_proj,10.46971,0.01500,10.016
117
+ 16,self_attn.o_proj,1.36543,0.01500,10.173
118
+ 16,mlp.up_proj,23.54587,0.01500,10.565
119
+ 16,mlp.gate_proj,23.75440,0.01500,10.453
120
+ 16,mlp.down_proj,4.76220,0.01500,27.984
121
+ 17,self_attn.k_proj,3.75751,0.01500,9.548
122
+ 17,self_attn.v_proj,1.59607,0.01500,9.272
123
+ 17,self_attn.q_proj,13.03399,0.01500,10.008
124
+ 17,self_attn.o_proj,1.73905,0.01500,10.100
125
+ 17,mlp.up_proj,23.84480,0.01500,10.608
126
+ 17,mlp.gate_proj,23.19424,0.01500,10.392
127
+ 17,mlp.down_proj,4.19675,0.01500,27.942
128
+ 18,self_attn.k_proj,3.76024,0.01500,9.455
129
+ 18,self_attn.v_proj,1.87119,0.01500,9.277
130
+ 18,self_attn.q_proj,13.60384,0.01500,10.037
131
+ 18,self_attn.o_proj,2.08934,0.01500,10.112
132
+ 18,mlp.up_proj,24.72187,0.01500,10.559
133
+ 18,mlp.gate_proj,23.33043,0.01500,10.471
134
+ 18,mlp.down_proj,4.26824,0.01500,27.897
135
+ 19,self_attn.k_proj,4.82251,0.01500,9.515
136
+ 19,self_attn.v_proj,1.91533,0.01500,9.249
137
+ 19,self_attn.q_proj,18.00388,0.01500,10.027
138
+ 19,self_attn.o_proj,1.83565,0.01500,10.153
139
+ 19,mlp.up_proj,26.69174,0.01500,10.527
140
+ 19,mlp.gate_proj,25.04108,0.01500,10.374
141
+ 19,mlp.down_proj,4.68541,0.01500,27.929
142
+ 20,self_attn.k_proj,5.64050,0.01500,9.495
143
+ 20,self_attn.v_proj,1.98617,0.01500,9.218
144
+ 20,self_attn.q_proj,17.77429,0.01500,10.072
145
+ 20,self_attn.o_proj,2.22124,0.01500,10.089
146
+ 20,mlp.up_proj,26.35580,0.01500,10.662
147
+ 20,mlp.gate_proj,23.41076,0.01500,10.369
148
+ 20,mlp.down_proj,5.01084,0.01500,28.021
149
+ 21,self_attn.k_proj,4.42920,0.01500,9.460
150
+ 21,self_attn.v_proj,1.96772,0.01500,9.309
151
+ 21,self_attn.q_proj,15.91179,0.01500,10.049
152
+ 21,self_attn.o_proj,1.92028,0.01500,10.131
153
+ 21,mlp.up_proj,27.48438,0.01500,10.561
154
+ 21,mlp.gate_proj,24.16916,0.01500,10.381
155
+ 21,mlp.down_proj,4.87991,0.01500,27.984
156
+ 22,self_attn.k_proj,5.23604,0.01500,9.490
157
+ 22,self_attn.v_proj,3.18003,0.01500,9.349
158
+ 22,self_attn.q_proj,17.82441,0.01500,10.051
159
+ 22,self_attn.o_proj,2.45201,0.01500,10.161
160
+ 22,mlp.up_proj,29.04674,0.01500,10.519
161
+ 22,mlp.gate_proj,25.53871,0.01500,10.432
162
+ 22,mlp.down_proj,10.64561,0.01500,27.918
163
+ 23,self_attn.k_proj,4.86997,0.01500,9.573
164
+ 23,self_attn.v_proj,3.13206,0.01500,9.245
165
+ 23,self_attn.q_proj,18.26726,0.01500,10.057
166
+ 23,self_attn.o_proj,3.95079,0.01500,10.151
167
+ 23,mlp.up_proj,29.33524,0.01500,10.610
168
+ 23,mlp.gate_proj,27.46653,0.01500,10.377
169
+ 23,mlp.down_proj,6.40909,0.01500,28.004
170
+ 24,self_attn.k_proj,8.47446,0.01500,9.650
171
+ 24,self_attn.v_proj,2.74370,0.01500,9.220
172
+ 24,self_attn.q_proj,23.74725,0.01500,10.062
173
+ 24,self_attn.o_proj,3.00568,0.01500,10.133
174
+ 24,mlp.up_proj,29.52289,0.01500,10.569
175
+ 24,mlp.gate_proj,27.06511,0.01500,10.455
176
+ 24,mlp.down_proj,5.78663,0.01500,28.001
177
+ 25,self_attn.k_proj,7.84489,0.01500,9.534
178
+ 25,self_attn.v_proj,4.06540,0.01500,9.433
179
+ 25,self_attn.q_proj,26.22566,0.01500,10.073
180
+ 25,self_attn.o_proj,3.40679,0.01500,10.087
181
+ 25,mlp.up_proj,29.64645,0.01500,10.596
182
+ 25,mlp.gate_proj,26.42579,0.01500,10.448
183
+ 25,mlp.down_proj,6.07409,0.01500,27.913
184
+ 26,self_attn.k_proj,6.22492,0.01500,9.468
185
+ 26,self_attn.v_proj,2.45609,0.01500,9.306
186
+ 26,self_attn.q_proj,21.51888,0.01500,10.064
187
+ 26,self_attn.o_proj,3.04100,0.01500,10.105
188
+ 26,mlp.up_proj,33.65908,0.01500,10.500
189
+ 26,mlp.gate_proj,28.68914,0.01500,10.335
190
+ 26,mlp.down_proj,7.12110,0.01500,28.002
191
+ 27,self_attn.k_proj,6.39761,0.01500,9.584
192
+ 27,self_attn.v_proj,3.20165,0.01500,9.236
193
+ 27,self_attn.q_proj,21.86742,0.01500,10.034
194
+ 27,self_attn.o_proj,4.35438,0.01500,10.147
195
+ 27,mlp.up_proj,37.42823,0.01500,10.698
196
+ 27,mlp.gate_proj,32.41779,0.01500,10.418
197
+ 27,mlp.down_proj,10.76270,0.01500,27.921
198
+ 28,self_attn.k_proj,5.69762,0.01500,9.491
199
+ 28,self_attn.v_proj,4.75157,0.01500,9.352
200
+ 28,self_attn.q_proj,22.72801,0.01500,10.006
201
+ 28,self_attn.o_proj,6.18256,0.01500,10.141
202
+ 28,mlp.up_proj,41.35907,0.01500,10.629
203
+ 28,mlp.gate_proj,36.23487,0.01500,10.401
204
+ 28,mlp.down_proj,10.81490,0.01500,27.913
205
+ 29,self_attn.k_proj,9.58126,0.01500,9.591
206
+ 29,self_attn.v_proj,5.00633,0.01500,9.274
207
+ 29,self_attn.q_proj,29.74718,0.01500,10.005
208
+ 29,self_attn.o_proj,5.97123,0.01500,10.111
209
+ 29,mlp.up_proj,43.76979,0.01500,10.600
210
+ 29,mlp.gate_proj,38.21955,0.01500,10.398
211
+ 29,mlp.down_proj,12.63174,0.01500,27.944
212
+ 30,self_attn.k_proj,7.37823,0.01500,9.519
213
+ 30,self_attn.v_proj,5.80818,0.01500,9.395
214
+ 30,self_attn.q_proj,26.16036,0.01500,10.030
215
+ 30,self_attn.o_proj,5.36946,0.01500,10.121
216
+ 30,mlp.up_proj,46.72618,0.01500,10.560
217
+ 30,mlp.gate_proj,41.69646,0.01500,10.454
218
+ 30,mlp.down_proj,12.70784,0.01500,27.953
219
+ 31,self_attn.k_proj,7.19344,0.01500,9.522
220
+ 31,self_attn.v_proj,4.95343,0.01500,9.382
221
+ 31,self_attn.q_proj,26.54338,0.01500,10.010
222
+ 31,self_attn.o_proj,4.34376,0.01500,10.124
223
+ 31,mlp.up_proj,51.25200,0.01500,10.577
224
+ 31,mlp.gate_proj,47.24580,0.01500,10.414
225
+ 31,mlp.down_proj,13.59183,0.01500,28.060
226
+ 32,self_attn.k_proj,7.64592,0.01500,9.508
227
+ 32,self_attn.v_proj,7.82178,0.01500,9.284
228
+ 32,self_attn.q_proj,27.26521,0.01500,10.057
229
+ 32,self_attn.o_proj,5.85959,0.01500,10.126
230
+ 32,mlp.up_proj,51.56344,0.01500,10.612
231
+ 32,mlp.gate_proj,48.08814,0.01500,10.431
232
+ 32,mlp.down_proj,14.84597,0.01500,27.986
233
+ 33,self_attn.k_proj,7.55068,0.01500,9.512
234
+ 33,self_attn.v_proj,6.42767,0.01500,9.312
235
+ 33,self_attn.q_proj,29.11816,0.01500,10.054
236
+ 33,self_attn.o_proj,5.41699,0.01500,10.184
237
+ 33,mlp.up_proj,60.78445,0.01500,10.578
238
+ 33,mlp.gate_proj,58.29738,0.01500,10.428
239
+ 33,mlp.down_proj,20.53683,0.01500,27.919
240
+ 34,self_attn.k_proj,7.47154,0.01500,9.419
241
+ 34,self_attn.v_proj,8.02526,0.01500,9.316
242
+ 34,self_attn.q_proj,30.63552,0.01500,10.068
243
+ 34,self_attn.o_proj,5.79722,0.01500,10.128
244
+ 34,mlp.up_proj,65.71398,0.01500,10.540
245
+ 34,mlp.gate_proj,64.65276,0.01500,10.420
246
+ 34,mlp.down_proj,21.38183,0.01500,27.912
247
+ 35,self_attn.k_proj,6.60566,0.01500,9.423
248
+ 35,self_attn.v_proj,6.35169,0.01500,9.255
249
+ 35,self_attn.q_proj,25.45971,0.01500,10.039
250
+ 35,self_attn.o_proj,6.54150,0.01500,10.122
251
+ 35,mlp.up_proj,67.49508,0.01500,10.551
252
+ 35,mlp.gate_proj,67.68895,0.01500,10.415
253
+ 35,mlp.down_proj,22.98903,0.01500,27.915
254
+ 36,self_attn.k_proj,6.94054,0.01500,9.428
255
+ 36,self_attn.v_proj,10.90587,0.01500,9.268
256
+ 36,self_attn.q_proj,29.40213,0.01500,10.054
257
+ 36,self_attn.o_proj,8.15820,0.01500,10.126
258
+ 36,mlp.up_proj,68.62708,0.01500,10.628
259
+ 36,mlp.gate_proj,67.88058,0.01500,10.426
260
+ 36,mlp.down_proj,27.17902,0.01500,28.025
261
+ 37,self_attn.k_proj,6.98475,0.01500,9.465
262
+ 37,self_attn.v_proj,9.50582,0.01500,9.246
263
+ 37,self_attn.q_proj,27.53993,0.01500,10.034
264
+ 37,self_attn.o_proj,10.20470,0.01500,10.206
265
+ 37,mlp.up_proj,70.65576,0.01500,10.583
266
+ 37,mlp.gate_proj,69.35595,0.01500,10.428
267
+ 37,mlp.down_proj,27.99811,0.01500,27.976
268
+ 38,self_attn.k_proj,6.39289,0.01500,9.463
269
+ 38,self_attn.v_proj,8.81844,0.01500,9.231
270
+ 38,self_attn.q_proj,25.09918,0.01500,10.116
271
+ 38,self_attn.o_proj,5.82686,0.01500,10.099
272
+ 38,mlp.up_proj,73.17823,0.01500,10.581
273
+ 38,mlp.gate_proj,70.98656,0.01500,10.441
274
+ 38,mlp.down_proj,26.28690,0.01500,27.941
275
+ 39,self_attn.k_proj,5.73835,0.01500,9.433
276
+ 39,self_attn.v_proj,8.94787,0.01500,9.225
277
+ 39,self_attn.q_proj,24.77987,0.01500,10.072
278
+ 39,self_attn.o_proj,6.94215,0.01500,10.199
279
+ 39,mlp.up_proj,76.39479,0.01500,10.590
280
+ 39,mlp.gate_proj,73.27057,0.01500,10.390
281
+ 39,mlp.down_proj,29.26816,0.01500,28.044
282
+ 40,self_attn.k_proj,6.27198,0.01500,9.481
283
+ 40,self_attn.v_proj,11.92071,0.01500,9.268
284
+ 40,self_attn.q_proj,26.84346,0.01500,10.032
285
+ 40,self_attn.o_proj,5.82510,0.01500,10.229
286
+ 40,mlp.up_proj,79.00155,0.01500,10.640
287
+ 40,mlp.gate_proj,75.15903,0.01500,10.437
288
+ 40,mlp.down_proj,29.47409,0.01500,27.911
289
+ 41,self_attn.k_proj,5.81315,0.01500,9.546
290
+ 41,self_attn.v_proj,11.77621,0.01500,9.205
291
+ 41,self_attn.q_proj,25.33446,0.01500,10.058
292
+ 41,self_attn.o_proj,7.60281,0.01500,10.141
293
+ 41,mlp.up_proj,82.86085,0.01500,10.593
294
+ 41,mlp.gate_proj,77.44541,0.01500,10.450
295
+ 41,mlp.down_proj,33.04249,0.01500,27.885
296
+ 42,self_attn.k_proj,5.96884,0.01500,9.406
297
+ 42,self_attn.v_proj,12.55932,0.01500,9.247
298
+ 42,self_attn.q_proj,25.97449,0.01500,10.029
299
+ 42,self_attn.o_proj,6.20780,0.01500,10.106
300
+ 42,mlp.up_proj,86.35454,0.01500,10.531
301
+ 42,mlp.gate_proj,79.40722,0.01500,10.387
302
+ 42,mlp.down_proj,43.79813,0.01500,27.904
303
+ 43,self_attn.k_proj,6.03354,0.01500,9.569
304
+ 43,self_attn.v_proj,18.59169,0.01500,9.242
305
+ 43,self_attn.q_proj,29.10031,0.01500,10.051
306
+ 43,self_attn.o_proj,15.70972,0.01500,10.124
307
+ 43,mlp.up_proj,93.61553,0.01500,10.609
308
+ 43,mlp.gate_proj,84.81170,0.01500,10.511
309
+ 43,mlp.down_proj,54.95075,0.01500,27.918
310
+ 44,self_attn.k_proj,5.72830,0.01500,9.511
311
+ 44,self_attn.v_proj,21.47696,0.01500,9.296
312
+ 44,self_attn.q_proj,28.71090,0.01500,10.138
313
+ 44,self_attn.o_proj,23.35439,0.01500,10.131
314
+ 44,mlp.up_proj,100.27838,0.01500,10.639
315
+ 44,mlp.gate_proj,89.86852,0.01500,10.388
316
+ 44,mlp.down_proj,316.48352,0.01500,27.949
317
+ 45,self_attn.k_proj,6.52884,0.01500,9.572
318
+ 45,self_attn.v_proj,29.51034,0.01500,9.257
319
+ 45,self_attn.q_proj,31.15594,0.01500,10.115
320
+ 45,self_attn.o_proj,30.49210,0.01500,10.086
321
+ 45,mlp.up_proj,115.91935,0.01500,10.601
322
+ 45,mlp.gate_proj,110.35923,0.01500,10.480
323
+ 45,mlp.down_proj,219.74860,0.01500,28.049
324
+ 46,self_attn.k_proj,6.69098,0.01500,9.467
325
+ 46,self_attn.v_proj,35.10717,0.01500,9.272
326
+ 46,self_attn.q_proj,33.14360,0.01500,10.058
327
+ 46,self_attn.o_proj,56.20092,0.01500,10.154
328
+ 46,mlp.up_proj,232.21847,0.01500,10.598
329
+ 46,mlp.gate_proj,151.91497,0.01500,10.450
330
+ 46,mlp.down_proj,339.50818,0.01500,28.008
331
+ 47,self_attn.k_proj,5.51548,0.01500,9.446
332
+ 47,self_attn.v_proj,23.11919,0.01500,9.358
333
+ 47,self_attn.q_proj,24.49030,0.01500,10.044
334
+ 47,self_attn.o_proj,31.32616,0.01500,10.153
335
+ 47,mlp.up_proj,130.35323,0.01500,10.593
336
+ 47,mlp.gate_proj,125.38177,0.01500,10.468
337
+ 47,mlp.down_proj,936.09912,0.01500,28.032
quantize_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bits": 4,
3
+ "group_size": 32,
4
+ "desc_act": true,
5
+ "sym": true,
6
+ "lm_head": false,
7
+ "quant_method": "gptq",
8
+ "checkpoint_format": "gptq",
9
+ "pack_dtype": "int32",
10
+ "meta": {
11
+ "quantizer": [
12
+ "gptqmodel:1.9.0"
13
+ ],
14
+ "uri": "https://github.com/modelcloud/gptqmodel",
15
+ "damp_percent": 0.015,
16
+ "damp_auto_increment": 0.0025,
17
+ "static_groups": false,
18
+ "true_sequential": true,
19
+ "mse": 0.01
20
+ }
21
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ba22d8c644259937034835df5faed912276e25142efaeb91215dee17d22952b
3
+ size 11421995
tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 131072,
204
+ "pad_token": "<|endoftext|>",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff