InferenceIllusionist commited on
Commit
5ea01a8
·
verified ·
1 Parent(s): a8baef0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - GGUF
4
+ - iMat
5
+ - Llama3
6
+ - conversational
7
+ ---
8
+
9
+ ```
10
+ e88 88e d8
11
+ d888 888b 8888 8888 ,"Y88b 888 8e d88
12
+ C8888 8888D 8888 8888 "8" 888 888 88b d88888
13
+ Y888 888P Y888 888P ,ee 888 888 888 888
14
+ "88 88" "88 88" "88 888 888 888 888
15
+ b
16
+ 8b,
17
+
18
+ e88'Y88 d8 888
19
+ d888 'Y ,"Y88b 888,8, d88 ,e e, 888
20
+ C8888 "8" 888 888 " d88888 d88 88b 888
21
+ Y888 ,d ,ee 888 888 888 888 , 888
22
+ "88,d88 "88 888 888 888 "YeeP" 888
23
+
24
+ PROUDLY PRESENTS
25
+ ```
26
+
27
+ ## experiment_1_8b-iMat-GGUF
28
+
29
+
30
+ Quantized from fp16.
31
+ * Weighted quantizations were creating using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 189 chunks and n_ctx=512
32
+ * This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details
33
+ * The enhancedv2-turbomini file appends snippets from turboderp's calibration data to the standard groups_merged.txt file
34
+
35
+ For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
36
+
37
+ <b>All quants are verified working prior to uploading to repo for your safety and convenience. </b>
38
+
39
+ Original model card [here](https://huggingface.co/jukofyork/Dusk-Miqu-70B/) and below
40
+
41
+ ---
42
+
43
+ # **UNTESTED, probably unfit for human consumption**
44
+
45
+ 1 epoch of grimulkan/LimaRP-augmented on LLaMA3-8b via unsloth on colab, using the llama-chat template. 16k context, probably.
46
+ ```
47
+ model = FastLanguageModel.get_peft_model(
48
+ model,
49
+ r = 64, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
50
+ target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
51
+ "gate_proj", "up_proj", "down_proj",],
52
+ lora_alpha = 16,
53
+ lora_dropout = 0, # Supports any, but = 0 is optimized
54
+ bias = "none", # Supports any, but = "none" is optimized
55
+ # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
56
+ use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
57
+ random_state = 3407,
58
+ use_rslora = False, # We support rank stabilized LoRA
59
+ loftq_config = None, # And LoftQ
60
+ )
61
+
62
+ trainer = SFTTrainer(
63
+ model = model,
64
+ tokenizer = tokenizer,
65
+ train_dataset = dataset,
66
+ dataset_text_field = "text",
67
+ max_seq_length = max_seq_length,
68
+ dataset_num_proc = 2,
69
+ packing = False, # Can make training 5x faster for short sequences.
70
+ args = TrainingArguments(
71
+ per_device_train_batch_size = 1,
72
+ gradient_accumulation_steps = 8,
73
+ warmup_steps = 5,
74
+ num_train_epochs=1,
75
+ learning_rate = 2e-4,
76
+ fp16 = not torch.cuda.is_bf16_supported(),
77
+ bf16 = torch.cuda.is_bf16_supported(),
78
+ logging_steps = 1,
79
+ optim = "adamw_8bit",
80
+ weight_decay = 0.01,
81
+ lr_scheduler_type = "linear",
82
+ seed = 3407,
83
+ output_dir = "outputs",
84
+ ),
85
+ )
86
+ ```
87
+