Uploaded model
Developed by: KasparZ
License: apache-2.0
Finetuned from model : unsloth/mistral-7b-bnb-4bit
max_seq_length = 4096
new_tokens = ["<|s|>", "<|e|>"]
LoRA:
r = 128,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_alpha = 32,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
Training:
per_device_train_batch_size = 1,
gradient_accumulation_steps = 8,
warmup_ratio = 0.1,
num_train_epochs = 1, # on Google Colab: x2 sequentially
learning_rate = 5e-5,
embedding_learning_rate = 5e-6,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.00,
lr_scheduler_type = "cosine",
seed = 3407,
output_dir = "outputs",
report_to = "none",
dataset include EOS at the end of each chunk. Might results strange behaviour
This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.
Model tree for KasparZ/mtext-210525_mistral-7B-v0.1_LoRA_II
Base model
unsloth/mistral-7b-bnb-4bit