JustJaro commited on
Commit
c9a730b
·
verified ·
1 Parent(s): 591664b

Add files using upload-large-folder tool

Browse files
README.md ADDED
@@ -0,0 +1,621 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - gptq
4
+ - quantization
5
+ - 4bit
6
+ - confidentialmind
7
+ - text-generation
8
+ - apache2.0
9
+ - mistral-small-24b
10
+ ---
11
+ # 🔥 Quantized Model: SmolLM-135M_gptq_g32_4bit 🔥
12
+
13
+ This is a 4-bit quantized version of [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
14
+ It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 128 resulting in a
15
+ smaller,
16
+ faster model with minimal performance degradation.
17
+
18
+ Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
19
+
20
+ *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.
21
+
22
+ ## Model Details
23
+ - **Original Model:** [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M)
24
+ - **Quantized Model:** SmolLM-135M_gptq_g32_4bit (this repository)
25
+ - **Quantization Method:** GPTQ (4-bit, group size 128)
26
+ - **Quantization Library:** [GPTQModel](https://github.com/ModelCloud/GPTQModel/tree/main)
27
+ - **Calibration Dataset:** wikitext/wikitext-2-raw-v1 (using 512 samples with seq len 4096)
28
+ - **Quantized by:** [ConfidentialMind.com](https://www.confidentialmind.com)
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from gptqmodel import GPTQModel
34
+ from transformers import AutoTokenizer
35
+
36
+ # Use the local directory or JustJaro/SmolLM-135M_gptq_g32_4bit after upload
37
+ quantized_model_id = "/home/jarouljanov/models/quantized/SmolLM-135M_gptq_g32_4bit" # or "JustJaro/SmolLM-135M_gptq_g32_4bit"
38
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_id)
39
+ model = GPTQModel.load(quantized_model_id, device="cuda:0") # or "cpu"
40
+
41
+ input_text = "This is a test prompt"
42
+ inputs = tokenizer(input_text, return_tensors="pt").to("cuda:0")
43
+ outputs = model.generate(**inputs)
44
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
45
+ ```
46
+
47
+ ## Package Versions and Installation Instructions
48
+
49
+ See pyproject.toml for the exact UV project file. See the [GPTQModel](
50
+ https://github.com/ModelCloud/GPTQModel/tree/main) repo for more details. on how to install the package.
51
+
52
+ Use the provided pyproject.toml:
53
+
54
+ ```bash
55
+ uv venv
56
+ source venv/bin/activate
57
+ uv sync
58
+ ```
59
+
60
+ ### Environment Variables
61
+
62
+ ```bash
63
+ HF_TOKEN=<YOUR_HF_TOKEN>
64
+ TOKENIZERS_PARALLELISM="true"
65
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
66
+ ```
67
+
68
+ ## Quantization Script
69
+ Below is the exact quantize.py script used to generate this model (with the exact versions of the dependencies):
70
+
71
+ ```python
72
+ #!/usr/bin/env python3
73
+ """
74
+ This script loads a source Hugging Face model and a calibration dataset,
75
+ quantizes the model using GPTQModel (with 4-bit precision and group size 128),
76
+ saves the quantized model using the Transformers API with safetensors (safe serialization)
77
+ under ~/models/quantized/, and then creates/updates a Hugging Face repository (with the
78
+ _gptq_g128_4bit suffix) by uploading the model, tokenizer, and an auto-generated README.md.
79
+
80
+ Usage example:
81
+ python quantize.py --source-model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
82
+ --calibration-dataset wikitext/wikitext-2-raw-v1 \
83
+ --seq-len 1024 --nsamples 256 --hf-token <YOUR_HF_TOKEN>
84
+ """
85
+
86
+ import os
87
+ import shutil
88
+ import subprocess
89
+ import math
90
+ from enum import Enum
91
+ from pathlib import Path
92
+ from typing import List, Union
93
+
94
+ import torch
95
+ import typer
96
+ from datasets import load_dataset
97
+ from dotenv import load_dotenv, find_dotenv
98
+ from gptqmodel import GPTQModel, QuantizeConfig
99
+ from huggingface_hub import HfApi
100
+ from transformers import AutoTokenizer, PreTrainedTokenizerBase
101
+
102
+ load_dotenv(find_dotenv())
103
+ HF_TOKEN = os.getenv("HF_TOKEN")
104
+
105
+ app = typer.Typer()
106
+
107
+
108
+ class GroupSize(str, Enum):
109
+ accurate: int = 32
110
+ balanced: int = 64
111
+ fast: int = 128
112
+
113
+
114
+ def get_text_from_example(example: dict) -> str:
115
+ """
116
+ Returns text from a dataset example.
117
+ If the example contains a "text" field, and it is nonempty, that text is used.
118
+ Otherwise, if it has a "messages" field (a list of dicts with a "content" key),
119
+ the function returns the concatenation of all non-empty message contents.
120
+ """
121
+ if "text" in example and example["text"]:
122
+ return example["text"]
123
+ elif "messages" in example:
124
+ contents = [msg.get("content", "").strip() for msg in example["messages"]]
125
+ return " ".join([s for s in contents if s])
126
+ else:
127
+ return ""
128
+
129
+
130
+ def get_calibration_dataset(
131
+ tokenizer: PreTrainedTokenizerBase,
132
+ nsamples: int,
133
+ seqlen: int,
134
+ calibration_dataset: str
135
+ ) -> List[dict]:
136
+ """
137
+ Loads a calibration dataset from the Hugging Face Hub (or from a local file).
138
+ It accepts datasets with a single "text" field (like wikitext)
139
+ or with a "messages" field (as in the Neural Magic LLM Compression Calibration dataset).
140
+ Only examples whose extracted text length is at least 'seqlen' are kept.
141
+ Each chosen example is tokenized (with truncation up to 'seqlen') and returned as a dict.
142
+ """
143
+ ds = None
144
+ try:
145
+ # Attempt to load from HF Hub.
146
+ try:
147
+ if "/" in calibration_dataset:
148
+ parts = calibration_dataset.split("/", 1)
149
+ ds = load_dataset(parts[0], parts[1], split="train")
150
+ else:
151
+ ds = load_dataset(calibration_dataset, split="train")
152
+ except Exception as e:
153
+ print(f"Error loading dataset '{calibration_dataset}' via load_dataset: {e}")
154
+ ds = load_dataset(calibration_dataset, split="train")
155
+ print(f"Loaded calibration dataset from full remote path {calibration_dataset}.")
156
+
157
+ except Exception as e:
158
+ print(f"Error loading dataset '{calibration_dataset}' via load_dataset: {e}")
159
+ # Fallback: if the supplied calibration_dataset is a local path, try to load it as JSON-lines.
160
+ if os.path.exists(calibration_dataset):
161
+ try:
162
+ ds = load_dataset("json", data_files=calibration_dataset, split="train")
163
+ print(f"Loaded calibration dataset from local file {calibration_dataset}.")
164
+ except Exception as e2:
165
+ print(f"Error loading local json dataset from '{calibration_dataset}': {e2}")
166
+ return []
167
+ else:
168
+ return []
169
+
170
+ print(f"Dataset features: {ds.features}")
171
+
172
+ # Filter examples that have at least 80% 'seqlen' of extracted text (wikitext-2-raw-v1 dataset has short examples).
173
+ ds = ds.filter(lambda x: len(get_text_from_example(x)) <= int(seqlen * 0.8))
174
+ sample_range = min(nsamples, len(ds))
175
+ calibration_data = []
176
+ for i in range(sample_range):
177
+ example = ds[i]
178
+ text = get_text_from_example(example)
179
+ tokenized = tokenizer(text, truncation=True, max_length=seqlen, return_tensors="pt")
180
+ tokenized = {k: v.squeeze(0) for k, v in tokenized.items()}
181
+ calibration_data.append(tokenized)
182
+ return calibration_data
183
+
184
+
185
+ def calculate_perplexity_manual(model, tokenizer, dataset_name="wikitext", dataset_config="wikitext-2-raw-v1",
186
+ split="test", max_samples=100, max_length=512) -> Union[float, str]:
187
+ """
188
+ Calculate perplexity manually using a dataset.
189
+ Based on the research from GPTQModel documentation.
190
+ """
191
+ try:
192
+ # Load test dataset
193
+ if "/" in dataset_name:
194
+ dataset = load_dataset(dataset_name, split=split)
195
+ else:
196
+ dataset = load_dataset(dataset_name, dataset_config, split=split)
197
+
198
+ # Filter out empty texts
199
+ texts = [text for text in dataset["text"] if text.strip()]
200
+
201
+ # Limit samples for efficiency
202
+ texts = texts[:max_samples]
203
+
204
+ typer.echo(f"Calculating perplexity on {len(texts)} samples from {dataset_name}...")
205
+
206
+ model.model.eval()
207
+ total_loss = 0.0
208
+ total_tokens = 0
209
+
210
+ with torch.no_grad():
211
+ for i, text in enumerate(texts):
212
+ if i % 20 == 0:
213
+ typer.echo(f"Processing sample {i+1}/{len(texts)}")
214
+
215
+ # Tokenize the text
216
+ inputs = tokenizer(
217
+ text,
218
+ return_tensors="pt",
219
+ truncation=True,
220
+ max_length=max_length,
221
+ padding=False
222
+ )
223
+
224
+ input_ids = inputs.input_ids.to(model.model.device)
225
+
226
+ # Skip if too short
227
+ if input_ids.size(1) < 2:
228
+ continue
229
+
230
+ # Calculate loss
231
+ outputs = model.model(input_ids, labels=input_ids)
232
+ loss = outputs.loss
233
+
234
+ # Accumulate loss and token count
235
+ total_loss += loss.item() * input_ids.size(1)
236
+ total_tokens += input_ids.size(1)
237
+
238
+ if total_tokens == 0:
239
+ return "N/A (No valid tokens processed)"
240
+
241
+ # Calculate perplexity
242
+ avg_loss = total_loss / total_tokens
243
+ perplexity = math.exp(avg_loss)
244
+
245
+ return perplexity
246
+
247
+ except Exception as e:
248
+ typer.echo(f"Error calculating perplexity manually: {e}")
249
+ return f"N/A (Error: {str(e)})"
250
+
251
+
252
+ def calculate_perplexity_lm_eval(model, tokenizer) -> Union[float, str]:
253
+ """
254
+ Calculate perplexity using lm-eval framework if available.
255
+ Based on GPTQModel documentation research.
256
+ """
257
+ try:
258
+ from gptqmodel.utils.eval import EVAL
259
+
260
+ # Try to use GPTQModel's built-in evaluation
261
+ typer.echo("Attempting to calculate perplexity using GPTQModel.eval...")
262
+
263
+ # Create a temporary directory to save the model for evaluation
264
+ temp_model_path = "/tmp/temp_gptq_model"
265
+ os.makedirs(temp_model_path, exist_ok=True)
266
+
267
+ model.save_pretrained(temp_model_path)
268
+ tokenizer.save_pretrained(temp_model_path)
269
+
270
+ # Use GPTQModel.eval with lm-eval framework
271
+ results = GPTQModel.eval(
272
+ temp_model_path,
273
+ framework=EVAL.LM_EVAL,
274
+ tasks=["wikitext"],
275
+ output_file=None
276
+ )
277
+
278
+ # Clean up temporary directory
279
+ shutil.rmtree(temp_model_path, ignore_errors=True)
280
+
281
+ # Extract perplexity from results
282
+ if "wikitext" in results.get("results", {}):
283
+ wikitext_results = results["results"]["wikitext"]
284
+ if "perplexity" in wikitext_results:
285
+ return wikitext_results["perplexity"]
286
+
287
+ return "N/A (Perplexity not found in lm-eval results)"
288
+
289
+ except ImportError:
290
+ typer.echo("lm-eval framework not available, falling back to manual calculation")
291
+ return None
292
+ except Exception as e:
293
+ typer.echo(f"Error using lm-eval: {e}, falling back to manual calculation")
294
+ return None
295
+
296
+
297
+ def calculate_avg_ppl(model, tokenizer):
298
+ """
299
+ Computes the average perplexity using multiple methods.
300
+ First tries lm-eval framework, then falls back to manual calculation.
301
+ """
302
+ typer.echo("Starting perplexity calculation...")
303
+
304
+ # Method 1: Try lm-eval framework
305
+ ppl_result = calculate_perplexity_lm_eval(model, tokenizer)
306
+ if ppl_result is not None and not isinstance(ppl_result, str):
307
+ typer.echo(f"✓ Perplexity calculated using lm-eval: {ppl_result:.4f}")
308
+ return ppl_result
309
+
310
+ # Method 2: Manual calculation
311
+ typer.echo("Using manual perplexity calculation...")
312
+ ppl_result = calculate_perplexity_manual(model, tokenizer)
313
+
314
+ if isinstance(ppl_result, float):
315
+ typer.echo(f"✓ Perplexity calculated manually: {ppl_result:.4f}")
316
+ return ppl_result
317
+ else:
318
+ typer.echo(f"⚠ Perplexity calculation failed: {ppl_result}")
319
+ return ppl_result
320
+
321
+
322
+ def get_pinned_package_versions():
323
+ """
324
+ Retrieves pinned package versions using 'uv pip freeze'.
325
+ Returns a dictionary mapping lowercased package names to their versions.
326
+ """
327
+ try:
328
+ result = subprocess.run(["uv", "pip", "freeze"], capture_output=True, text=True, check=True)
329
+ packages_output = result.stdout.strip()
330
+ versions = {}
331
+ for line in packages_output.splitlines():
332
+ if "==" in line:
333
+ package_name, package_version = line.split("==", 1)
334
+ versions[package_name.lower()] = package_version
335
+ return versions
336
+ except subprocess.CalledProcessError as e:
337
+ typer.echo(f"Error running 'uv pip freeze': {e}", err=True)
338
+ return {}
339
+ except FileNotFoundError:
340
+ typer.echo("uv command not found. Make sure uv is installed and in your PATH.", err=True)
341
+ return {}
342
+
343
+
344
+ def self_read_script():
345
+ """
346
+ Reads the current script file content for inclusion in README.
347
+ """
348
+ try:
349
+ script_path = os.path.abspath(__file__)
350
+ with open(script_path, "r") as f:
351
+ script_content = f.read()
352
+ except Exception as e:
353
+ script_content = "Error reading script content: " + str(e)
354
+ return script_content
355
+
356
+
357
+ def get_my_user(hf_token):
358
+ """
359
+ Gets the Hugging Face username from the provided token.
360
+ """
361
+ api = HfApi(token=hf_token)
362
+ user_info = api.whoami()
363
+ try:
364
+ username = user_info.get("name") or user_info.get("username")
365
+ except Exception as e:
366
+ typer.echo(f"Error retrieving username from Hugging Face API: {e}. Using default username.")
367
+ username = api.whoami()
368
+ if not username:
369
+ typer.echo("Could not determine your Hugging Face username from the token, defaulting to hard coded username.",
370
+ err=True)
371
+ username = "JustJaro"
372
+ return username
373
+
374
+
375
+ def generate_readme(calibration_dataset, nsamples, quantized_model_dir,
376
+ quantized_model_name, script_content, seq_len, source_model, username, avg_ppl):
377
+ """
378
+ Generates a comprehensive README.md file for the quantized model.
379
+ """
380
+ # Format perplexity value for display
381
+ if isinstance(avg_ppl, float):
382
+ ppl_display = f"{avg_ppl:.4f}"
383
+ else:
384
+ ppl_display = str(avg_ppl)
385
+
386
+ readme_content = f"""---
387
+ tags:
388
+ - gptq
389
+ - quantization
390
+ - 4bit
391
+ - confidentialmind
392
+ - text-generation
393
+ - apache2.0
394
+ - mistral-small-24b
395
+ ---
396
+ # 🔥 Quantized Model: {quantized_model_name} 🔥
397
+
398
+ This is a 4-bit quantized version of [{source_model}](https://huggingface.co/{source_model}) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
399
+ It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 128 resulting in a
400
+ smaller,
401
+ faster model with minimal performance degradation.
402
+
403
+ Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
404
+
405
+ *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.
406
+
407
+ ## Model Details
408
+ - **Original Model:** [{source_model}](https://huggingface.co/{source_model})
409
+ - **Quantized Model:** {quantized_model_name} (this repository)
410
+ - **Quantization Method:** GPTQ (4-bit, group size 128)
411
+ - **Quantization Library:** [GPTQModel](https://github.com/ModelCloud/GPTQModel/tree/main)
412
+ - **Calibration Dataset:** {calibration_dataset} (using {nsamples} samples with seq len {seq_len})
413
+ - **Quantized by:** [ConfidentialMind.com](https://www.confidentialmind.com)
414
+
415
+ ## Usage
416
+
417
+ ```python
418
+ from gptqmodel import GPTQModel
419
+ from transformers import AutoTokenizer
420
+
421
+ # Use the local directory or {username}/{quantized_model_name} after upload
422
+ quantized_model_id = "{quantized_model_dir}" # or "{username}/{quantized_model_name}"
423
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_id)
424
+ model = GPTQModel.load(quantized_model_id, device="cuda:0") # or "cpu"
425
+
426
+ input_text = "This is a test prompt"
427
+ inputs = tokenizer(input_text, return_tensors="pt").to("cuda:0")
428
+ outputs = model.generate(**inputs)
429
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
430
+ ```
431
+
432
+ ## Package Versions and Installation Instructions
433
+
434
+ See pyproject.toml for the exact UV project file. See the [GPTQModel](
435
+ https://github.com/ModelCloud/GPTQModel/tree/main) repo for more details. on how to install the package.
436
+
437
+ Use the provided pyproject.toml:
438
+
439
+ ```bash
440
+ uv venv
441
+ source venv/bin/activate
442
+ uv sync
443
+ ```
444
+
445
+ ### Environment Variables
446
+
447
+ ```bash
448
+ HF_TOKEN=<YOUR_HF_TOKEN>
449
+ TOKENIZERS_PARALLELISM="true"
450
+ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
451
+ ```
452
+
453
+ ## Quantization Script
454
+ Below is the exact quantize.py script used to generate this model (with the exact versions of the dependencies):
455
+
456
+ ```python
457
+ {script_content}
458
+ ```
459
+
460
+ ## Quantization Performance
461
+
462
+ Average perplexity (PPL) on WikiText-2 test dataset: **{ppl_display}**
463
+
464
+ *Perplexity calculated using {'GPTQModel evaluation framework' if isinstance(avg_ppl, float) else 'manual calculation method'}*
465
+
466
+ ## Disclaimer
467
+ This model is for research purposes only. It may inherit limitations and biases from the original model and the quantization process. Please use responsibly and refer to the original model card for more details.
468
+
469
+ ## Contact
470
+ For any questions or support, please visit [ConfidentialMind.com](https://www.confidentialmind.com) or contact us directly.
471
+
472
+ ## License
473
+ This model inherits the license from the original model. Please refer to the original model card for more details.
474
+ Original model card: `{source_model}`
475
+
476
+ ## Author
477
+ This model was quantized by [Jaro](https://www.linkedin.com/in/jaroai/)
478
+
479
+ ## Acknowledgements
480
+ Quantization performed using the GPTQModel pipeline.
481
+
482
+ TODO: Add `gptqmodel.utils.eval` integration and auto-generation of eval table.
483
+
484
+ ---
485
+ *Generated and quantized using GPTQModel.*
486
+ """
487
+ readme_path = os.path.join(quantized_model_dir, "README.md")
488
+ with open(readme_path, "w") as f:
489
+ f.write(readme_content)
490
+ typer.echo("README.md created with detailed information.")
491
+
492
+
493
+ @app.command()
494
+ def main(
495
+ seq_len: int = typer.Option(4096, help="Sequence length for tokenization and calibration."),
496
+ nsamples: int = typer.Option(256, help="Number of samples to use for calibration."),
497
+ source_model: str = typer.Option("arcee-ai/Virtuoso-Medium-v2",
498
+ help="Source model HF repository identifier."),
499
+ calibration_dataset: str = typer.Option("wikitext/wikitext-2-raw-v1",
500
+ help="Calibration dataset identifier (in 'dataset/config' format) or local file path."),
501
+ hf_token: str = typer.Option(HF_TOKEN,
502
+ help="Hugging Face token for creating/updating your repo."),
503
+ upload_only: bool = typer.Option(False, help="Only upload the quantized model to the Hugging Face Hub."),
504
+ group_size: GroupSize = typer.Option(GroupSize.accurate, help="Group size for quantization accurate: 32, "
505
+ "balanced: 64, fast: 128. Default: accurate."),
506
+ mse: bool = typer.Option(True, help="Use mse instead of mae for the loss function."),
507
+ ):
508
+ # Prepare destination directory and model names.
509
+ model_name = source_model.split("/")[-1]
510
+ quantized_model_name = f"{model_name}_gptq_g{int(group_size.value)}_4bit"
511
+ quantized_model_dir = os.path.expanduser(os.path.join("~/models/quantized", quantized_model_name))
512
+
513
+ if not os.path.exists(quantized_model_dir) or not upload_only:
514
+ os.makedirs(quantized_model_dir, exist_ok=True)
515
+
516
+ typer.echo("Loading tokenizer from source model...")
517
+ tokenizer_obj = AutoTokenizer.from_pretrained(source_model, use_fast=True)
518
+
519
+ typer.echo("Loading calibration dataset...")
520
+ typer.echo(f"Calibration dataset: {calibration_dataset}")
521
+ calibration_data = get_calibration_dataset(tokenizer_obj, nsamples, seq_len, calibration_dataset)
522
+ if not calibration_data:
523
+ typer.echo("Calibration dataset is empty. Aborting.", err=True)
524
+ raise typer.Exit(code=1)
525
+
526
+ if mse:
527
+ # Fits mistral-small-24b particularly well, as well as the increased damp_percent
528
+ mse = 0.01
529
+ quantize_config = QuantizeConfig(bits=4, group_size=int(group_size.value), damp_percent=0.015, mse=mse)
530
+ else:
531
+ quantize_config = QuantizeConfig(bits=4, group_size=int(group_size.value), damp_percent=0.01)
532
+
533
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
534
+ typer.echo(f"Loading model in {device} mode...")
535
+ model = GPTQModel.load(source_model, quantize_config)
536
+
537
+ typer.echo("Quantizing model...")
538
+ group_size_factor = int(128 / int(group_size.value))
539
+ model.quantize(calibration_data, auto_gc=False, batch_size=int((nsamples * 0.1) / group_size_factor))
540
+
541
+ # Retrieve Hugging Face user info for README generation.
542
+ package_versions = get_pinned_package_versions()
543
+ username = get_my_user(hf_token)
544
+ script_content = self_read_script()
545
+
546
+ typer.echo(f"Saving quantized model to {quantized_model_dir} using Transformers safe serialization...")
547
+ try:
548
+ model.save_pretrained(quantized_model_dir)
549
+ tokenizer_obj.save_pretrained(quantized_model_dir)
550
+ except Exception as ex:
551
+ typer.echo(f"Error during saving with safe_serialization: {ex}. Aborting.")
552
+ raise
553
+
554
+ typer.echo(f"Model saved to: {quantized_model_dir}")
555
+ else:
556
+ tokenizer_obj = AutoTokenizer.from_pretrained(source_model, use_fast=True)
557
+ package_versions = get_pinned_package_versions()
558
+ username = get_my_user(hf_token)
559
+ script_content = self_read_script()
560
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
561
+
562
+ # Load the quantized model for perplexity calculation
563
+ typer.echo("Loading quantized model for evaluation...")
564
+ model = GPTQModel.load(quantized_model_dir, device=device)
565
+
566
+ # Calculate perplexity with improved method
567
+ avg_ppl = calculate_avg_ppl(model, tokenizer_obj)
568
+ typer.echo(f"Final perplexity result: {avg_ppl}")
569
+
570
+ deps = Path("./pyproject.toml")
571
+ if deps.exists():
572
+ shutil.copy(deps, quantized_model_dir)
573
+
574
+ generate_readme(calibration_dataset, nsamples, quantized_model_dir,
575
+ quantized_model_name, script_content, seq_len, source_model, username, avg_ppl)
576
+
577
+ typer.echo("Uploading to Hugging Face Hub...")
578
+ GPTQModel.push_to_hub(quantized_path=quantized_model_dir, private=False, repo_id=quantized_model_name,
579
+ token=HF_TOKEN)
580
+
581
+ typer.echo(f"Model uploaded to Hugging Face repo: {quantized_model_name}")
582
+
583
+ # Run a quick inference demo
584
+ demo_input = tokenizer_obj("test is", return_tensors="pt").to(device)
585
+ generated_ids = model.generate(**demo_input)
586
+ output_text = tokenizer_obj.decode(generated_ids[0])
587
+ typer.echo(f"Inference demo output: {output_text}")
588
+ typer.echo(f"Final perplexity on test dataset: {avg_ppl}")
589
+
590
+
591
+ if __name__ == "__main__":
592
+ app()
593
+
594
+ ```
595
+
596
+ ## Quantization Performance
597
+
598
+ Average perplexity (PPL) on WikiText-2 test dataset: **N/A (Error: Invalid thread config: max_m_blocks = 0, thread_k = -1, thread_n = -1, num_threads = -1 for MKN = [7, 576, 576] and num_bits = 4, group_size = 32, has_act_order = 1, is_k_full = 1, max_shared_mem = 166912)**
599
+
600
+ *Perplexity calculated using manual calculation method*
601
+
602
+ ## Disclaimer
603
+ This model is for research purposes only. It may inherit limitations and biases from the original model and the quantization process. Please use responsibly and refer to the original model card for more details.
604
+
605
+ ## Contact
606
+ For any questions or support, please visit [ConfidentialMind.com](https://www.confidentialmind.com) or contact us directly.
607
+
608
+ ## License
609
+ This model inherits the license from the original model. Please refer to the original model card for more details.
610
+ Original model card: `HuggingFaceTB/SmolLM-135M`
611
+
612
+ ## Author
613
+ This model was quantized by [Jaro](https://www.linkedin.com/in/jaroai/)
614
+
615
+ ## Acknowledgements
616
+ Quantization performed using the GPTQModel pipeline.
617
+
618
+ TODO: Add `gptqmodel.utils.eval` integration and auto-generation of eval table.
619
+
620
+ ---
621
+ *Generated and quantized using GPTQModel.*
config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 0,
9
+ "head_dim": 64,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 576,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "max_position_embeddings": 2048,
15
+ "mlp_bias": false,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 9,
18
+ "num_hidden_layers": 30,
19
+ "num_key_value_heads": 3,
20
+ "pretraining_tp": 1,
21
+ "quantization_config": {
22
+ "bits": 4,
23
+ "checkpoint_format": "gptq",
24
+ "desc_act": true,
25
+ "group_size": 32,
26
+ "lm_head": false,
27
+ "meta": {
28
+ "damp_auto_increment": 0.0025,
29
+ "damp_percent": 0.015,
30
+ "mse": 0.01,
31
+ "quantizer": [
32
+ "gptqmodel:2.2.0"
33
+ ],
34
+ "static_groups": false,
35
+ "true_sequential": true,
36
+ "uri": "https://github.com/modelcloud/gptqmodel"
37
+ },
38
+ "pack_dtype": "int32",
39
+ "quant_method": "gptq",
40
+ "sym": true
41
+ },
42
+ "rms_norm_eps": 1e-05,
43
+ "rope_scaling": null,
44
+ "rope_theta": 10000.0,
45
+ "tie_word_embeddings": true,
46
+ "torch_dtype": "bfloat16",
47
+ "transformers_version": "4.53.0",
48
+ "use_cache": true,
49
+ "vocab_size": 49152
50
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.53.0"
6
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b6bc06e57afc22123c73f3ee13ac09257d654a9c1e7cfbcaf7030dfa6252657
3
+ size 118768376
pyproject.toml ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "gtpq"
3
+ version = "0.1.0"
4
+ requires-python = ">=3.12"
5
+ dependencies = [
6
+ "gptqmodel[vllm,bitblas,auto_round] @ git+https://github.com/IntelLabs/gptqmodel.git@main",
7
+ "flashinfer-python", # binary wheels; resolve CPU/GPU at install time
8
+ "hatchling",
9
+ ]
10
+
11
+ [build-system]
12
+ requires = ["setuptools-rust", "Cython", "hatchling"]
13
+ build-backend = "setuptools.build_meta"
quant_log.csv ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ layer,module,loss,samples,damp,time
2
+ 0,self_attn.k_proj,0.30323729,0.01500,1.376
3
+ 0,self_attn.v_proj,0.00081456,0.01500,1.134
4
+ 0,self_attn.q_proj,0.50177538,0.01500,1.143
5
+ 0,self_attn.o_proj,0.00410974,0.01500,1.150
6
+ 0,mlp.up_proj,0.87140083,0.01500,1.140
7
+ 0,mlp.gate_proj,0.88596356,0.01500,1.168
8
+ 0,mlp.down_proj,3.59487438,0.01500,2.880
9
+ 1,self_attn.k_proj,1.79021871,0.01500,1.075
10
+ 1,self_attn.v_proj,0.08866195,0.01500,1.081
11
+ 1,self_attn.q_proj,2.48720980,0.01500,1.146
12
+ 1,self_attn.o_proj,0.79290819,0.01500,1.133
13
+ 1,mlp.up_proj,0.70214444,0.01500,1.135
14
+ 1,mlp.gate_proj,0.79838932,0.01500,1.136
15
+ 1,mlp.down_proj,2.23402882,0.01500,2.897
16
+ 2,self_attn.k_proj,1.86278677,0.01500,1.109
17
+ 2,self_attn.v_proj,0.19091517,0.01500,1.116
18
+ 2,self_attn.q_proj,1.61090231,0.01500,1.166
19
+ 2,self_attn.o_proj,0.63032722,0.01500,1.121
20
+ 2,mlp.up_proj,0.84249359,0.01500,1.134
21
+ 2,mlp.gate_proj,0.95266348,0.01500,1.129
22
+ 2,mlp.down_proj,157.33862305,0.01500,2.853
23
+ 3,self_attn.k_proj,1.13371205,0.01500,1.070
24
+ 3,self_attn.v_proj,0.19112220,0.01500,1.130
25
+ 3,self_attn.q_proj,1.95455039,0.01500,1.110
26
+ 3,self_attn.o_proj,0.43849713,0.01500,1.116
27
+ 3,mlp.up_proj,0.91409326,0.01500,1.126
28
+ 3,mlp.gate_proj,1.00409079,0.01500,1.135
29
+ 3,mlp.down_proj,12.42405319,0.01500,2.883
30
+ 4,self_attn.k_proj,0.87567437,0.01500,1.108
31
+ 4,self_attn.v_proj,0.14826132,0.01500,1.070
32
+ 4,self_attn.q_proj,2.46379852,0.01500,1.138
33
+ 4,self_attn.o_proj,0.33942133,0.01500,1.126
34
+ 4,mlp.up_proj,1.05060482,0.01500,1.138
35
+ 4,mlp.gate_proj,1.17461491,0.01500,1.132
36
+ 4,mlp.down_proj,1.44320405,0.01500,2.879
37
+ 5,self_attn.k_proj,1.32893443,0.01500,1.075
38
+ 5,self_attn.v_proj,0.17556122,0.01500,1.062
39
+ 5,self_attn.q_proj,2.47826910,0.01500,1.112
40
+ 5,self_attn.o_proj,0.59731841,0.01500,1.113
41
+ 5,mlp.up_proj,1.13487399,0.01500,1.148
42
+ 5,mlp.gate_proj,1.18413568,0.01500,1.135
43
+ 5,mlp.down_proj,1.55033576,0.01500,2.881
44
+ 6,self_attn.k_proj,1.86866426,0.01500,1.072
45
+ 6,self_attn.v_proj,0.21436407,0.01500,1.075
46
+ 6,self_attn.q_proj,2.72684455,0.01500,1.111
47
+ 6,self_attn.o_proj,0.75209558,0.01500,1.127
48
+ 6,mlp.up_proj,1.31616020,0.01500,1.132
49
+ 6,mlp.gate_proj,1.32581091,0.01500,1.139
50
+ 6,mlp.down_proj,2.13556457,0.01500,2.880
51
+ 7,self_attn.k_proj,1.53838611,0.01500,1.087
52
+ 7,self_attn.v_proj,0.22950518,0.01500,1.071
53
+ 7,self_attn.q_proj,1.99251246,0.01500,1.109
54
+ 7,self_attn.o_proj,0.50463128,0.01500,1.118
55
+ 7,mlp.up_proj,1.58821356,0.01500,1.125
56
+ 7,mlp.gate_proj,1.58579147,0.01500,1.147
57
+ 7,mlp.down_proj,13.17560577,0.01500,2.849
58
+ 8,self_attn.k_proj,0.73372823,0.01500,1.068
59
+ 8,self_attn.v_proj,0.23304689,0.01500,1.088
60
+ 8,self_attn.q_proj,2.41669130,0.01500,1.118
61
+ 8,self_attn.o_proj,0.46610335,0.01500,1.116
62
+ 8,mlp.up_proj,1.65750051,0.01500,1.136
63
+ 8,mlp.gate_proj,1.56000757,0.01500,1.156
64
+ 8,mlp.down_proj,8.41586971,0.01500,2.874
65
+ 9,self_attn.k_proj,1.64468992,0.01500,1.094
66
+ 9,self_attn.v_proj,0.18850568,0.01500,1.094
67
+ 9,self_attn.q_proj,1.78751516,0.01500,1.112
68
+ 9,self_attn.o_proj,1.78545237,0.01500,1.117
69
+ 9,mlp.up_proj,1.77682400,0.01500,1.127
70
+ 9,mlp.gate_proj,1.57283258,0.01500,1.141
71
+ 9,mlp.down_proj,14.36819839,0.01500,2.873
72
+ 10,self_attn.k_proj,1.32247448,0.01500,1.061
73
+ 10,self_attn.v_proj,0.22332871,0.01500,1.074
74
+ 10,self_attn.q_proj,1.70967674,0.01500,1.109
75
+ 10,self_attn.o_proj,2.30593014,0.01500,1.119
76
+ 10,mlp.up_proj,2.46851778,0.01500,1.129
77
+ 10,mlp.gate_proj,2.33083105,0.01500,1.137
78
+ 10,mlp.down_proj,17.56273270,0.01500,2.857
79
+ 11,self_attn.k_proj,1.05880129,0.01500,1.068
80
+ 11,self_attn.v_proj,0.20641716,0.01500,1.074
81
+ 11,self_attn.q_proj,2.14066696,0.01500,1.111
82
+ 11,self_attn.o_proj,2.36638355,0.01500,1.138
83
+ 11,mlp.up_proj,2.62533450,0.01500,1.126
84
+ 11,mlp.gate_proj,2.36855841,0.01500,1.117
85
+ 11,mlp.down_proj,36624136.00000000,0.01500,2.888
86
+ 12,self_attn.k_proj,0.60637605,0.01500,1.081
87
+ 12,self_attn.v_proj,0.20556667,0.01500,1.108
88
+ 12,self_attn.q_proj,6.41779613,0.01500,1.111
89
+ 12,self_attn.o_proj,1.40109611,0.01500,1.119
90
+ 12,mlp.up_proj,10.91535950,0.01500,1.132
91
+ 12,mlp.gate_proj,4.99505901,0.01500,1.146
92
+ 12,mlp.down_proj,9760.42968750,0.01500,2.909
93
+ 13,self_attn.k_proj,2.24922800,0.01500,1.082
94
+ 13,self_attn.v_proj,0.27756080,0.01500,1.086
95
+ 13,self_attn.q_proj,7.30099392,0.01500,1.109
96
+ 13,self_attn.o_proj,2.96828890,0.01500,1.118
97
+ 13,mlp.up_proj,2.91019154,0.01500,1.123
98
+ 13,mlp.gate_proj,6.23261738,0.01500,1.141
99
+ 13,mlp.down_proj,6.46631765,0.01500,2.860
100
+ 14,self_attn.k_proj,0.69264430,0.01500,1.071
101
+ 14,self_attn.v_proj,0.51544046,0.01500,1.091
102
+ 14,self_attn.q_proj,3.18851161,0.01500,1.131
103
+ 14,self_attn.o_proj,1.25791931,0.01500,1.127
104
+ 14,mlp.up_proj,2.58898783,0.01500,1.112
105
+ 14,mlp.gate_proj,5.24106836,0.01500,1.133
106
+ 14,mlp.down_proj,9.01900768,0.01500,2.851
107
+ 15,self_attn.k_proj,0.67096651,0.01500,1.094
108
+ 15,self_attn.v_proj,0.36131397,0.01500,1.112
109
+ 15,self_attn.q_proj,3.81791139,0.01500,1.097
110
+ 15,self_attn.o_proj,2.25381660,0.01500,1.114
111
+ 15,mlp.up_proj,2.34244204,0.01500,1.116
112
+ 15,mlp.gate_proj,4.29145336,0.01500,1.132
113
+ 15,mlp.down_proj,7.26017952,0.01500,2.923
114
+ 16,self_attn.k_proj,0.64388847,0.01500,1.060
115
+ 16,self_attn.v_proj,0.34459960,0.01500,1.078
116
+ 16,self_attn.q_proj,4.61652613,0.01500,1.101
117
+ 16,self_attn.o_proj,1.88339543,0.01500,1.119
118
+ 16,mlp.up_proj,2.72254467,0.01500,1.121
119
+ 16,mlp.gate_proj,3.13850927,0.01500,1.119
120
+ 16,mlp.down_proj,6.84692860,0.01500,2.887
121
+ 17,self_attn.k_proj,0.55838877,0.01500,1.057
122
+ 17,self_attn.v_proj,0.51657593,0.01500,1.071
123
+ 17,self_attn.q_proj,4.33557653,0.01500,1.092
124
+ 17,self_attn.o_proj,2.56168294,0.01500,1.126
125
+ 17,mlp.up_proj,3.06461382,0.01500,1.110
126
+ 17,mlp.gate_proj,3.21341205,0.01500,1.114
127
+ 17,mlp.down_proj,7.29544258,0.01500,2.854
128
+ 18,self_attn.k_proj,0.68149984,0.01500,1.102
129
+ 18,self_attn.v_proj,1.32401478,0.01500,1.124
130
+ 18,self_attn.q_proj,7.70216036,0.01500,1.098
131
+ 18,self_attn.o_proj,2.28013134,0.01500,1.119
132
+ 18,mlp.up_proj,3.90698004,0.01500,1.116
133
+ 18,mlp.gate_proj,4.11436844,0.01500,1.152
134
+ 18,mlp.down_proj,11.87063789,0.01500,2.870
135
+ 19,self_attn.k_proj,0.65415150,0.01500,1.066
136
+ 19,self_attn.v_proj,0.53765559,0.01500,1.075
137
+ 19,self_attn.q_proj,6.70729065,0.01500,1.112
138
+ 19,self_attn.o_proj,3.07730746,0.01500,1.132
139
+ 19,mlp.up_proj,3.57759047,0.01500,1.119
140
+ 19,mlp.gate_proj,5.04411411,0.01500,1.120
141
+ 19,mlp.down_proj,13.49418736,0.01500,2.850
142
+ 20,self_attn.k_proj,0.45334548,0.01500,1.062
143
+ 20,self_attn.v_proj,1.12238181,0.01500,1.083
144
+ 20,self_attn.q_proj,5.45634222,0.01500,1.106
145
+ 20,self_attn.o_proj,2.11403751,0.01500,1.122
146
+ 20,mlp.up_proj,3.80852771,0.01500,1.123
147
+ 20,mlp.gate_proj,4.78546810,0.01500,1.129
148
+ 20,mlp.down_proj,15.71007252,0.01500,2.860
149
+ 21,self_attn.k_proj,0.67353952,0.01500,1.056
150
+ 21,self_attn.v_proj,0.74522316,0.01500,1.084
151
+ 21,self_attn.q_proj,4.39328623,0.01500,1.095
152
+ 21,self_attn.o_proj,2.93690538,0.01500,1.109
153
+ 21,mlp.up_proj,4.77232122,0.01500,1.108
154
+ 21,mlp.gate_proj,5.83200645,0.01500,1.138
155
+ 21,mlp.down_proj,19.96510315,0.01500,2.825
156
+ 22,self_attn.k_proj,0.63552809,0.01500,1.060
157
+ 22,self_attn.v_proj,1.77812839,0.01500,1.101
158
+ 22,self_attn.q_proj,6.14302778,0.01500,1.112
159
+ 22,self_attn.o_proj,3.85619378,0.01500,1.108
160
+ 22,mlp.up_proj,5.24628830,0.01500,1.106
161
+ 22,mlp.gate_proj,8.02893066,0.01500,1.106
162
+ 22,mlp.down_proj,24.18618774,0.01500,2.823
163
+ 23,self_attn.k_proj,0.82243836,0.01500,1.073
164
+ 23,self_attn.v_proj,2.90332031,0.01500,1.065
165
+ 23,self_attn.q_proj,9.50726032,0.01500,1.098
166
+ 23,self_attn.o_proj,8.13974857,0.01500,1.113
167
+ 23,mlp.up_proj,5.42801857,0.01500,1.099
168
+ 23,mlp.gate_proj,10.77536011,0.01500,1.128
169
+ 23,mlp.down_proj,29.22331047,0.01500,2.812
170
+ 24,self_attn.k_proj,0.60857439,0.01500,1.061
171
+ 24,self_attn.v_proj,1.17702281,0.01500,1.069
172
+ 24,self_attn.q_proj,6.26180887,0.01500,1.110
173
+ 24,self_attn.o_proj,4.30931950,0.01500,1.096
174
+ 24,mlp.up_proj,5.46759701,0.01500,1.095
175
+ 24,mlp.gate_proj,10.43561554,0.01500,1.102
176
+ 24,mlp.down_proj,31.56058502,0.01500,2.808
177
+ 25,self_attn.k_proj,0.78283930,0.01500,1.041
178
+ 25,self_attn.v_proj,1.24567127,0.01500,1.058
179
+ 25,self_attn.q_proj,4.97283077,0.01500,1.095
180
+ 25,self_attn.o_proj,5.16120386,0.01500,1.105
181
+ 25,mlp.up_proj,6.94152546,0.01500,1.102
182
+ 25,mlp.gate_proj,9.48820877,0.01500,1.099
183
+ 25,mlp.down_proj,37.16706085,0.01500,2.802
184
+ 26,self_attn.k_proj,0.95313138,0.01500,1.052
185
+ 26,self_attn.v_proj,1.37231970,0.01500,1.054
186
+ 26,self_attn.q_proj,6.39235163,0.01500,1.087
187
+ 26,self_attn.o_proj,5.90192604,0.01500,1.090
188
+ 26,mlp.up_proj,12.41309357,0.01500,1.094
189
+ 26,mlp.gate_proj,9.54691982,0.01500,1.093
190
+ 26,mlp.down_proj,37.73841095,0.01500,2.798
191
+ 27,self_attn.k_proj,0.49448842,0.01500,1.041
192
+ 27,self_attn.v_proj,0.81519926,0.01500,1.044
193
+ 27,self_attn.q_proj,5.96102524,0.01500,1.087
194
+ 27,self_attn.o_proj,5.79659319,0.01500,1.101
195
+ 27,mlp.up_proj,14.86005974,0.01500,1.099
196
+ 27,mlp.gate_proj,29.81634903,0.01500,1.100
197
+ 27,mlp.down_proj,702.13244629,0.01500,2.865
198
+ 28,self_attn.k_proj,0.60215271,0.01500,1.045
199
+ 28,self_attn.v_proj,1.07407022,0.01500,1.044
200
+ 28,self_attn.q_proj,7.19515991,0.01500,1.075
201
+ 28,self_attn.o_proj,6.81128788,0.01500,1.090
202
+ 28,mlp.up_proj,28.30822754,0.01500,1.100
203
+ 28,mlp.gate_proj,35.86479950,0.01500,1.100
204
+ 28,mlp.down_proj,33054578.00000000,0.01500,2.789
205
+ 29,self_attn.k_proj,0.62672096,0.01500,1.042
206
+ 29,self_attn.v_proj,0.87135106,0.01500,1.051
207
+ 29,self_attn.q_proj,12.68386078,0.01500,1.077
208
+ 29,self_attn.o_proj,41.61279678,0.01500,1.103
209
+ 29,mlp.up_proj,224.53115845,0.01500,1.092
210
+ 29,mlp.gate_proj,142.67680359,0.01500,1.107
211
+ 29,mlp.down_proj,10202.70410156,0.01500,2.805
quantize_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bits": 4,
3
+ "group_size": 32,
4
+ "desc_act": true,
5
+ "sym": true,
6
+ "lm_head": false,
7
+ "quant_method": "gptq",
8
+ "checkpoint_format": "gptq",
9
+ "pack_dtype": "int32",
10
+ "meta": {
11
+ "quantizer": [
12
+ "gptqmodel:2.2.0"
13
+ ],
14
+ "uri": "https://github.com/modelcloud/gptqmodel",
15
+ "damp_percent": 0.015,
16
+ "damp_auto_increment": 0.0025,
17
+ "static_groups": false,
18
+ "true_sequential": true,
19
+ "mse": 0.01
20
+ }
21
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|im_start|>",
5
+ "<|im_end|>",
6
+ "<repo_name>",
7
+ "<reponame>",
8
+ "<file_sep>",
9
+ "<filename>",
10
+ "<gh_stars>",
11
+ "<issue_start>",
12
+ "<issue_comment>",
13
+ "<issue_closed>",
14
+ "<jupyter_start>",
15
+ "<jupyter_text>",
16
+ "<jupyter_code>",
17
+ "<jupyter_output>",
18
+ "<jupyter_script>",
19
+ "<empty_output>"
20
+ ],
21
+ "bos_token": {
22
+ "content": "<|endoftext|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "eos_token": {
29
+ "content": "<|endoftext|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ },
35
+ "unk_token": {
36
+ "content": "<|endoftext|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false
41
+ }
42
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<repo_name>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<reponame>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<file_sep>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<filename>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<gh_stars>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_start>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_comment>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<issue_closed>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_start>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_text>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_code>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<jupyter_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<jupyter_script>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<empty_output>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ }
140
+ },
141
+ "additional_special_tokens": [
142
+ "<|endoftext|>",
143
+ "<|im_start|>",
144
+ "<|im_end|>",
145
+ "<repo_name>",
146
+ "<reponame>",
147
+ "<file_sep>",
148
+ "<filename>",
149
+ "<gh_stars>",
150
+ "<issue_start>",
151
+ "<issue_comment>",
152
+ "<issue_closed>",
153
+ "<jupyter_start>",
154
+ "<jupyter_text>",
155
+ "<jupyter_code>",
156
+ "<jupyter_output>",
157
+ "<jupyter_script>",
158
+ "<empty_output>"
159
+ ],
160
+ "bos_token": "<|endoftext|>",
161
+ "clean_up_tokenization_spaces": false,
162
+ "eos_token": "<|endoftext|>",
163
+ "extra_special_tokens": {},
164
+ "model_max_length": 1000000000000000019884624838656,
165
+ "tokenizer_class": "GPT2Tokenizer",
166
+ "unk_token": "<|endoftext|>",
167
+ "vocab_size": 49152
168
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff