Model Training Request
May somebody give me train code or train a Llama 3/4 Model on the following dataset? Thank you in advance. The dataset is Sabresooth/Sabresooth_Train
You can use axolotl and the following configurations on a sysdtem with 2x RTX 4090 to train this.
base_model: mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
datasets:
- path: Sabresooth/Sabresooth_Train
chat_template: llama3
type:
system_prompt: ""
field_system: system
field_instruction: input
field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out
adapter: lora
lora_model_dir:
sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00001
bf16: auto
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
- full_shard
- auto_wrap
fsdp_config:
fsdp_limit_all_gathers: true
fsdp_sync_module_states: true
fsdp_offload_params: true
fsdp_use_orig_params: false
fsdp_cpu_ram_efficient_loading: true
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sharding_strategy: FULL_SHARD
special_tokens:
pad_token: <|end_of_text|>
I don't have RTX 4090 or GPUS - Can I use something else?
Yes sure. What GPUs do you have? 2x RTX 3090 would work with above configuration as well. For RTX 5090 and any other NVidia GPU with at least 32 GiB GPU memory like A100 or more just delete th fsdp/fsdp_config section as they only require a single GPU.
I have the free tier of Google Colab - 8 hours per week of TPU
I never tried running axolotl on Google Colab using a TPU. I don't think TPUs are supported by axolotl. There are Google Colab with GPUs as far I'm aware. You could also rent some GPUs on RunPod. They even have an axolotl template. But no worries I can also train this for you tomorrow but I can't continue doing this for you forever as doing so is quite time and resource intensive so at some point you will need to find a way to finetune by your own.
Could you try to train this for me tomorrow? Last model.
Yes sure no problem I will do. I'm currently using the GPUs to train Medra 27B but the next checkpoint should be reached in around 5 hours which should hopefully align with when I wake up. After that I will generate some importance matrix to empty the mradermacher matrix queue and then I will train your model which again should take around 2 hours. So expect it early in the afternoon if everything goes according to plan (which it rarely does).
this model looks cool - How could I see how far thru the training is?
It's in fact already long done for 12 hours and I just didn't have the time to upload it yet or to be more precise write the model card which I always do before uploading. I will do so now.
@Enderchef and @Sabresooth Your model is now ready!
- SafeTensors: https://huggingface.co/nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth
- Lora: https://huggingface.co/nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth-Lora
I also queued it so we soon have GGUFs of it:
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth-GGUF for quants to appear.
Question: how do i, on my own device, increase the context? Is it in config.json?
Question: how do i, on my own device, increase the context? Is it in config.json?
The model has 128k which is the same as every Meta-Llama-3.1-8B-Instruct based model so no need to increase it. Depending on the interference engine you use there is usually an UI option or command line argument to specify it because almost nobody has enough GPU memory to run a model at 128k context. If you use the GGUF using llama.cpp you can specify the context length using something like --ctx-size 17000
Your inference software should have a max context length setting somewhere.