Quant-Cartel
/

experiment_1_8b-iMat-GGUF

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on May 21, 2024

Commit

5ea01a8

·

verified ·

1 Parent(s): a8baef0

Create README.md

Files changed (1) hide show

README.md +87 -0

README.md ADDED Viewed

	@@ -0,0 +1,87 @@

+---
+tags:
+- GGUF
+- iMat
+- Llama3
+- conversational
+---
+```
+  e88 88e                               d8
+ d888 888b  8888 8888  ,"Y88b 888 8e   d88
+C8888 8888D 8888 8888 "8" 888 888 88b d88888
+ Y888 888P  Y888 888P ,ee 888 888 888  888
+  "88 88"    "88 88"  "88 888 888 888  888
+      b
+      8b,
+  e88'Y88                  d8           888
+ d888  'Y  ,"Y88b 888,8,  d88    ,e e,  888
+C8888     "8" 888 888 "  d88888 d88 88b 888
+ Y888  ,d ,ee 888 888     888   888   , 888
+  "88,d88 "88 888 888     888    "YeeP" 888
+PROUDLY PRESENTS
+```
+## experiment_1_8b-iMat-GGUF
+Quantized from fp16.
+* Weighted quantizations were creating using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 189 chunks and n_ctx=512
+* This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details
+* The enhancedv2-turbomini file appends snippets from turboderp's calibration data to the standard groups_merged.txt file
+For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
+<b>All quants are verified working prior to uploading to repo for your safety and convenience. </b>
+Original model card [here](https://huggingface.co/jukofyork/Dusk-Miqu-70B/) and below
+---
+# **UNTESTED, probably unfit for human consumption**
+1 epoch of grimulkan/LimaRP-augmented on LLaMA3-8b via unsloth on colab, using the llama-chat template. 16k context, probably.
+```
+model = FastLanguageModel.get_peft_model(
+    model,
+    r = 64, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
+    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
+                      "gate_proj", "up_proj", "down_proj",],
+    lora_alpha = 16,
+    lora_dropout = 0, # Supports any, but = 0 is optimized
+    bias = "none",    # Supports any, but = "none" is optimized
+    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
+    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
+    random_state = 3407,
+    use_rslora = False,  # We support rank stabilized LoRA
+    loftq_config = None, # And LoftQ
+)
+trainer = SFTTrainer(
+    model = model,
+    tokenizer = tokenizer,
+    train_dataset = dataset,
+    dataset_text_field = "text",
+    max_seq_length = max_seq_length,
+    dataset_num_proc = 2,
+    packing = False, # Can make training 5x faster for short sequences.
+    args = TrainingArguments(
+        per_device_train_batch_size = 1,
+        gradient_accumulation_steps = 8,
+        warmup_steps = 5,
+        num_train_epochs=1,
+        learning_rate = 2e-4,
+        fp16 = not torch.cuda.is_bf16_supported(),
+        bf16 = torch.cuda.is_bf16_supported(),
+        logging_steps = 1,
+        optim = "adamw_8bit",
+        weight_decay = 0.01,
+        lr_scheduler_type = "linear",
+        seed = 3407,
+        output_dir = "outputs",
+    ),
+)
+```