Safetensors
mistral
SPPO
alignment-handbook
Generated from Trainer
christopherthompson81 commited on
Commit
08bb349
·
verified ·
1 Parent(s): dd7e5f1

Upload 12 files

Browse files
README.md CHANGED
@@ -1,3 +1,61 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: OuteAI/Lite-Mistral-150M-v2-Instruct
3
+ tags:
4
+ - SPPO
5
+ - alignment-handbook
6
+ - generated_from_trainer
7
+ datasets:
8
+ - UCLA-AGI/data-mistral-7b-instruct-sppo-iter1
9
+ - christopherthompson81/sppo-synthetic-dataset-lite-mistral-150m-v2
10
+ model-index:
11
+ - name: Lite-Mistral-150M-v2-Instruct-SPPO-Iter3
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ # Lite-Mistral-150M-v2-Instruct-SPPO-Iter3
19
+
20
+ This model is iteration 3 of an SPPO process applied to [OuteAI/Lite-Mistral-150M-v2-Instruct](https://huggingface.co/OuteAI/Lite-Mistral-150M-v2-Instruct) by generating synthetic datasets from the prompts of [UCLA-AGI/data-mistral-7b-instruct-sppo-iter1](https://huggingface.co/UCLA-AGI/data-mistral-7b-instruct-sppo-iter1).
21
+
22
+ One of the notable learnings I had was that the prompts used for an SPPO process should only be slightly beyond the capabilities of the base model. If they're too difficult, none of the synthetic outputs will be meaningfully "better" for the autoranker to prefer over the others. Additionally, the autoranker only needs to be good enough to evaluate the target prompts, but will need to be adjudicated for the prompts to know if it is satisfactory.
23
+
24
+ ## Model description
25
+
26
+ I made this model to practice with the SPPO method. The model selection was based on it being small and therefore fast, as well as being coherent.
27
+
28
+ ## Intended uses & limitations
29
+
30
+ 150M models are generally meant for a data scientist or technological enthusiast to familiarize themselves with a topic. This model is likely best used as a means of gaining an understanding of the results that SPPO can deliver by contrasting outputs between the base model and prior iterations of the SPPO process. Viable output is just a bonus.
31
+
32
+ ## Training procedure
33
+
34
+ I'm still working on putting my code together in a releasable way. The code will eventually be accessible here:
35
+ * [GitHub - SPPO Generator](https://github.com/christopherthompson81/sppo_generator)
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 5e-07
41
+ - train_batch_size: 1
42
+ - eval_batch_size: 4
43
+ - seed: 42
44
+ - distributed_type: multi-GPU
45
+ - gradient_accumulation_steps: 2
46
+ - total_train_batch_size: 2
47
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
+ - lr_scheduler_type: linear
49
+ - lr_scheduler_warmup_ratio: 0.1
50
+ - num_epochs: 1.0
51
+
52
+ ### Training results
53
+
54
+ Still to come. Have not run evals yet...
55
+
56
+ ### Framework versions
57
+
58
+ - Transformers 4.44.0
59
+ - Pytorch 2.4.0+cu121
60
+ - Datasets 2.20.0
61
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 116322.4441848534,
5
+ "train_runtime": 4176.8003,
6
+ "train_samples": 19766,
7
+ "train_samples_per_second": 4.732,
8
+ "train_steps_per_second": 2.366
9
+ }
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "checkpoints/Lite-Mistral-150M-v2-Instruct-SPPO-Iter2",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "head_dim": 48,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "max_position_embeddings": 2048,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 12,
18
+ "num_key_value_heads": 8,
19
+ "rms_norm_eps": 1e-06,
20
+ "rope_theta": 10000.0,
21
+ "sliding_window": 4096,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "bfloat16",
24
+ "transformers_version": "4.44.0",
25
+ "use_cache": true,
26
+ "vocab_size": 32768
27
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.44.0"
6
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df99b621e32c26f67a568c4f18f6baf76942e3f244adddbc86fa1baf2d958270
3
+ size 313050768
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "bos_token": "<s>",
32
+ "chat_template": "{% for message in messages %}{{bos_token + message['role'] + '\n' + message['content'] + eos_token + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ bos_token + 'assistant\n' }}{% endif %}",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "legacy": true,
36
+ "model_max_length": 2048,
37
+ "pad_token": "</s>",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 116322.4441848534,
5
+ "train_runtime": 4176.8003,
6
+ "train_samples": 19766,
7
+ "train_samples_per_second": 4.732,
8
+ "train_steps_per_second": 2.366
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:555140185aeb816de2e06e385717375fdf388869d1a3a2e8678f087daaf0a1b3
3
+ size 6584