munish0838 commited on
Commit
a825216
1 Parent(s): 8d09cec

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -0
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1
4
+ pipeline_tag: text-generation
5
+ ---
6
+
7
+ # QuantFactory/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1-GGUF
8
+ This is quantized version of [OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1](https://huggingface.co/OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1) created using llama.cpp
9
+
10
+ # Model Description
11
+ Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement:
12
+ https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
13
+
14
+
15
+ We don't know how good this model is exactly in benchmarks since we have not benched this yet, but we think real prompts and usage is more telling anyways.
16
+
17
+
18
+ From our testing this model is:
19
+
20
+ - Less Refusals
21
+ - More Uncensored
22
+ - Follows requests better
23
+ - Can reply in requested formats better without adding unnecesary information
24
+
25
+ We are happy for anyone to try it out and give some feedback.
26
+
27
+
28
+ Training:
29
+ - 2048 sequence length, while the base model is 8192 sequence length. From testing it still performs the same 8192 context just fine.
30
+ - Trained on a modified and improved version of Cognitive Computations Eric Hartford's Dolphin dataset. https://huggingface.co/datasets/cognitivecomputations/dolphin
31
+ - Training duration is around 2 days on 2x RTX3090 on our own machine, using 4-bit loading and Qlora 64-rank 128-alpha resulting in ~2% trainable weights.
32
+
33
+
34
+ The goal for this model is to have the model less-censored and great at general tasks like the previous dolphin based models by Eric Hartford.
35
+ We started training this BEFORE they launched their own full weight trained Llama-3-8B-Dolphin-2.9 with their own curated datasets and the newer "Dolphin 2.9" dataset, but we think this model is still a unique take on Llama 3 8B Instruct and the dolphin dataset.
36
+ https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
37
+
38
+
39
+ The difference with their dolphin 2.9 model is that we train this using Meta's new Llama 3 instruct format and not the regular ChatML format that Dolphin models are usually trained on.
40
+ This is because we think that it performed better using the format it was originally trained on.
41
+
42
+ Instruct format:
43
+ ```
44
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
45
+
46
+ {{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
47
+
48
+ {{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
49
+
50
+ {{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>
51
+
52
+ {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
53
+ ```
54
+
55
+
56
+
57
+
58
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
59
+
60
+ Axolotl Config:
61
+ ```
62
+ base_model: Meta-Llama-3-8B-Instruct
63
+ model_type: LlamaForCausalLM
64
+ tokenizer_type: AutoTokenizer
65
+
66
+ train_on_inputs: false
67
+ group_by_length: false
68
+ load_in_8bit: false
69
+ load_in_4bit: true
70
+ strict: false
71
+ sequence_len: 2048
72
+ bf16: true
73
+ fp16: false
74
+ tf32: false
75
+ flash_attention: true
76
+
77
+ # Data
78
+ datasets:
79
+ - path: flan1m-universal-uncensored-system-2048.jsonl
80
+ type:
81
+ system_prompt: ""
82
+ system_format: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
83
+ field_system: system
84
+ field_instruction: input
85
+ field_output: output
86
+ format: "{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
87
+ no_input_format: "{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
88
+
89
+ warmup_steps: 10
90
+ dataset_prepared_path: ./last_run_prepared
91
+
92
+ # Iterations
93
+ num_epochs: 1
94
+ saves_per_epoch: 4
95
+
96
+ # Evaluation
97
+ val_set_size: 0.01
98
+ eval_table_size:
99
+ eval_table_max_new_tokens:
100
+ eval_sample_packing: false
101
+ evals_per_epoch: 4
102
+
103
+ # LoRA
104
+ output_dir: ./qlora-out
105
+ adapter: qlora
106
+ lora_model_dir:
107
+ lora_r: 64
108
+ lora_alpha: 128
109
+ lora_dropout: 0.05
110
+ lora_target_linear: true
111
+ lora_fan_in_fan_out:
112
+ lora_target_modules:
113
+ save_safetensors: true
114
+
115
+ # Sampling
116
+ sample_packing: true
117
+ pad_to_sequence_len: true
118
+
119
+ # Batching
120
+ gradient_accumulation_steps: 32
121
+ micro_batch_size: 4
122
+ gradient_checkpointing: true
123
+ gradient_checkpointing_kwargs:
124
+ use_reentrant: true
125
+
126
+ # Optimizer
127
+ optimizer: paged_adamw_8bit
128
+ lr_scheduler: cosine
129
+ learning_rate: 0.0002
130
+
131
+ # Misc
132
+ early_stopping_patience:
133
+ resume_from_checkpoint:
134
+ logging_steps: 1
135
+ debug:
136
+ deepspeed: zero3_bf16.json
137
+ weight_decay: 0.1
138
+ special_tokens:
139
+ pad_token: <|end_of_text|>
140
+ ```