--- license: other library_name: peft tags: - axolotl - generated_from_trainer base_model: NousResearch/Meta-Llama-3-8B model-index: - name: llama3-conciser results: [] pipeline_tag: text2text-generation datasets: - chrislee973/llama3-conciser-dataset --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.0` ```yaml ### # Model Configuration: LLaMA-3 8B ### # Copied from most recent modal llm-finetuning repo base_model: NousResearch/Meta-Llama-3-8B sequence_len: 4096 # base model weight quantization load_in_8bit: true # attention implementation flash_attention: true # finetuned adapter config adapter: lora lora_model_dir: lora_r: 16 lora_alpha: 32 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral - embed_tokens - lm_head # for details, see https://github.com/huggingface/peft/issues/334#issuecomment-1561727994 ### # Dataset Configuration: sqlqa ### datasets: # This will be the path used for the data when it is saved to the Volume in the cloud. - path: conciser_dataset_50.jsonl ds_type: json type: # JSONL file contains question, context, answer fields per line. # This gets mapped to instruction, input, output axolotl tags. field_instruction: instruction field_input: text field_output: cleaned_text # Format is used by axolotl to generate the prompt. format: |- [INST] {instruction} {input} [/INST] # dataset formatting config tokens: # add new control tokens from the dataset to the model - "[INST]" - " [/INST]" - "[RES]" - " [/RES]" special_tokens: pad_token: <|end_of_text|> val_set_size: 0.05 ### # Training Configuration ### # random seed for better reproducibility seed: 117 # optimizer config optimizer: adamw_bnb_8bit # optimizer: adamw_torch learning_rate: 0.0001 lr_scheduler: cosine num_epochs: 4 micro_batch_size: 2 gradient_accumulation_steps: 1 warmup_steps: 10 # axolotl saving config dataset_prepared_path: last_run_prepared output_dir: ./lora-out # logging and eval config logging_steps: 1 eval_steps: 0.05 # training performance optimization config bf16: auto tf32: false gradient_checkpointing: true ### # Miscellaneous Configuration ### # when true, prevents over-writing the config from the CLI strict: false # "Don't mess with this, it's here for accelerate and torchrun" -- axolotl docs local_rank: # wandb logging config wandb_project: llama3-conciser wandb_name: llama3-4epochs-2batchsize-pushtohub hub_model_id: chrislee973/llama3-conciser ```

# llama3-conciser This model is a fine-tuned version of [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) on my [conciser dataset](https://huggingface.co/datasets/chrislee973/llama3-conciser-dataset). ## Uses ### Text Revision task Given an input of a paragraph of text from a transcript, it lightly touches up and edits the sentences and phrases, improving the flow and readability of the text while maintaining the speaker's original intention. For example, given the following input text: ``` I think I sort of deep down believed in what we were doing, and I did some analysis. I was like, okay, well, what would I go do if I wasn't doing this? It's like, well, I really like building things, and I like helping people communicate, and I like understanding what's going on with people and the dynamics between people. So I think if I sold this company, I'd just go build another company like this. And I kind of like the one I have. ``` the revised output text is: ``` I believed deep down in what we were doing. I did some analysis. What would I go do if I wasn’t doing this? I really like building things, helping people communicate, understanding what’s going on with people and the dynamics between them. If I sold this company, I’d just go build another one like this. I kind of like the one I have. ``` There are still some rough edges around the model as a result of my dataset being so tiny (just 50 examples). I hope to smooth these imperfections out and close the quality gap by adding many more examples to the dataset. ## Usage TODO: add sample inference code ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 2 - eval_batch_size: 2 - seed: 117 - distributed_type: multi-GPU - num_devices: 2 - total_train_batch_size: 4 - total_eval_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10 - num_epochs: 4 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.8738 | 0.0833 | 1 | 0.7897 | | 1.2209 | 0.25 | 3 | 0.7878 | | 0.8204 | 0.5 | 6 | 0.6336 | | 0.6652 | 0.75 | 9 | 0.5303 | | 0.4086 | 1.0 | 12 | 0.4836 | | 0.3365 | 1.25 | 15 | 0.4733 | | 0.3445 | 1.5 | 18 | 0.5132 | | 0.3641 | 1.75 | 21 | 0.5146 | | 0.1941 | 2.0 | 24 | 0.4939 | | 0.1814 | 2.25 | 27 | 0.4863 | | 0.1342 | 2.5 | 30 | 0.4969 | | 0.1978 | 2.75 | 33 | 0.5141 | | 0.1589 | 3.0 | 36 | 0.5222 | | 0.1184 | 3.25 | 39 | 0.5258 | | 0.1513 | 3.5 | 42 | 0.5182 | | 0.1172 | 3.75 | 45 | 0.5155 | | 0.0607 | 4.0 | 48 | 0.5174 | ### Framework versions - PEFT 0.10.0 - Transformers 4.40.2 - Pytorch 2.2.2+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1