Uploaded model

  • Developed by: KasparZ

  • License: apache-2.0

  • Finetuned from model : unsloth/mistral-7b-bnb-4bit

  • max_seq_length = 4096

  • new_tokens = ["<|s|>", "<|e|>"]

  • LoRA:

  • r = 128,

  • target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],

  • lora_alpha = 32,

  • lora_dropout = 0, # Supports any, but = 0 is optimized

  • bias = "none", # Supports any, but = "none" is optimized

  • use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context

  • random_state = 3407,

  • use_rslora = False, # We support rank stabilized LoRA

  • loftq_config = None, # And LoftQ

  • Training:

  • per_device_train_batch_size = 1,

  • gradient_accumulation_steps = 8,

  • warmup_ratio = 0.1,

  • num_train_epochs = 1, # on Google Colab: x2 sequentially

  • learning_rate = 5e-5,

  • embedding_learning_rate = 5e-6,

  • fp16 = not is_bfloat16_supported(),

  • bf16 = is_bfloat16_supported(),

  • logging_steps = 1,

  • optim = "adamw_8bit",

  • weight_decay = 0.00,

  • lr_scheduler_type = "cosine",

  • seed = 3407,

  • output_dir = "outputs",

  • report_to = "none",

  • dataset include EOS at the end of each chunk. Might results strange behaviour

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KasparZ/mtext-210525_mistral-7B-v0.1_LoRA_II

Finetuned
(537)
this model

Dataset used to train KasparZ/mtext-210525_mistral-7B-v0.1_LoRA_II