Model Training Request

#920
by Enderchef - opened

May somebody give me train code or train a Llama 3/4 Model on the following dataset? Thank you in advance. The dataset is Sabresooth/Sabresooth_Train

You can use axolotl and the following configurations on a sysdtem with 2x RTX 4090 to train this.

base_model: mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false

datasets:
  - path: Sabresooth/Sabresooth_Train
    chat_template: llama3
    type:
      system_prompt: ""
      field_system: system
      field_instruction: input
      field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out

adapter: lora
lora_model_dir:

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00001

bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: true
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sharding_strategy: FULL_SHARD
special_tokens:
  pad_token: <|end_of_text|>

I don't have RTX 4090 or GPUS - Can I use something else?

Yes sure. What GPUs do you have? 2x RTX 3090 would work with above configuration as well. For RTX 5090 and any other NVidia GPU with at least 32 GiB GPU memory like A100 or more just delete th fsdp/fsdp_config section as they only require a single GPU.

I have the free tier of Google Colab - 8 hours per week of TPU

I never tried running axolotl on Google Colab using a TPU. I don't think TPUs are supported by axolotl. There are Google Colab with GPUs as far I'm aware. You could also rent some GPUs on RunPod. They even have an axolotl template. But no worries I can also train this for you tomorrow but I can't continue doing this for you forever as doing so is quite time and resource intensive so at some point you will need to find a way to finetune by your own.

Could you try to train this for me tomorrow? Last model.

Yes sure no problem I will do. I'm currently using the GPUs to train Medra 27B but the next checkpoint should be reached in around 5 hours which should hopefully align with when I wake up. After that I will generate some importance matrix to empty the mradermacher matrix queue and then I will train your model which again should take around 2 hours. So expect it early in the afternoon if everything goes according to plan (which it rarely does).

this model looks cool - How could I see how far thru the training is?

It's in fact already long done for 12 hours and I just didn't have the time to upload it yet or to be more precise write the model card which I always do before uploading. I will do so now.

@Enderchef and @Sabresooth Your model is now ready!

I also queued it so we soon have GGUFs of it:
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth-GGUF for quants to appear.

Question: how do i, on my own device, increase the context? Is it in config.json?

Question: how do i, on my own device, increase the context? Is it in config.json?

The model has 128k which is the same as every Meta-Llama-3.1-8B-Instruct based model so no need to increase it. Depending on the interference engine you use there is usually an UI option or command line argument to specify it because almost nobody has enough GPU memory to run a model at 128k context. If you use the GGUF using llama.cpp you can specify the context length using something like --ctx-size 17000

Your inference software should have a max context length setting somewhere.

Sign up or log in to comment