Lambent commited on
Commit
16718a3
·
verified ·
1 Parent(s): 28add13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md CHANGED
@@ -1,3 +1,81 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ Catastrophic forgetting test results:
6
+
7
+ Initial evaluation loss on 1k subset of HuggingFaceTB/cosmopedia-100k dataset was 1.102. 100 steps of LISA training reduced this to 1.049.
8
+
9
+ Comparison to control: cosmo-1b started out with 1.003 loss on (a different subset of) dataset, increasing to 1.024 at 100 steps.
10
+
11
+ Axolotl config:
12
+
13
+ ```
14
+ base_model: HuggingFaceTB/cosmo-1b
15
+ model_type: LlamaForCausalLM
16
+ tokenizer_type: LlamaTokenizer
17
+
18
+ load_in_8bit: false
19
+ load_in_4bit: false
20
+ strict: false
21
+
22
+ datasets:
23
+ - path: Vezora/Tested-22k-Python-Alpaca
24
+ type: alpaca
25
+ dataset_prepared_path: prepared-qlora
26
+ val_set_size: 0.05
27
+ output_dir: ./lisa-out
28
+
29
+ sequence_len: 2048
30
+ sample_packing: true
31
+ pad_to_sequence_len: true
32
+
33
+ adapter:
34
+ lora_model_dir:
35
+ lora_r:
36
+ lora_alpha:
37
+ lora_dropout:
38
+ lora_target_linear:
39
+ lora_fan_in_fan_out:
40
+
41
+ lisa_n_layers: 4
42
+ lisa_step_interval: 10
43
+ lisa_layers_attribute: model.layers
44
+
45
+ wandb_project: cosmo-python-lisa
46
+ wandb_entity:
47
+ wandb_watch:
48
+ wandb_name:
49
+ wandb_log_model:
50
+
51
+ gradient_accumulation_steps: 4
52
+ micro_batch_size: 2
53
+ num_epochs: 1
54
+ optimizer: adamw_bnb_8bit
55
+ lr_scheduler: cosine
56
+ learning_rate: 0.0005
57
+
58
+ train_on_inputs: false
59
+ group_by_length: false
60
+ bf16: auto
61
+ fp16:
62
+ tf32: false
63
+
64
+ gradient_checkpointing: true
65
+ early_stopping_patience:
66
+ resume_from_checkpoint:
67
+ local_rank:
68
+ logging_steps: 1
69
+ xformers_attention:
70
+ flash_attention: true
71
+
72
+ warmup_steps: 10
73
+ evals_per_epoch: 4
74
+ saves_per_epoch: 1
75
+ debug:
76
+ deepspeed:
77
+ weight_decay: 0.0
78
+ fsdp:
79
+ fsdp_config:
80
+ special_tokens:
81
+ ```