Built with Axolotl

See axolotl config

axolotl version: 0.4.0

base_model: mistralai/Mistral-7B-v0.1
base_model_config: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true
 
load_in_8bit: true
load_in_4bit: false

bf16: true
fp16: false
tf32: false

bfloat16: true

datasets:
  - path: indiehackers/telugu_romanized_2048_mistral
    type: completion
    field: text

#dataset_prepared_path: ./dataset_tt23

hub_model_id: indiehackers/mistral-tenglish-april5_2
hf_use_auth_token: true 
val_set_size: 0.0

sequence_len: 2048
pad_to_sequence_len: true
sample_packing: true
# eval_sample_packing: false

adapter: lora
lora_r: 128
lora_alpha: 256
lora_dropout: 0.05
lora_target_linear: true

wandb_project: mistral-tenglish
wandb_entity: team-nik
#wandb_log_model: end

output_dir: ./mistral-tenglish-out

# Training hyperparameters
gradient_accumulation_steps: 2
micro_batch_size: 7
warmup_steps: 50
learning_rate: 0.00002
logging_steps: 1
evals_per_epoch: 
save_strategy: steps
save_steps: 100
save_total_limit: 10
num_epochs: 1
#max_steps: 162945
# eval_table_size:
# eval_max_new_tokens: 128

train_on_inputs: false
group_by_length: false

gradient_checkpointing: true
early_stopping_patience:

lr_scheduler: linear

optimizer: adamw_bnb_8bit

weight_decay: 0.01

xformers_attention:
flash_attention: true
resume_from_checkpoint:
auto_resume_from_checkpoints: true

local_rank:

fsdp:
fsdp_config:

deepspeed:

debug:

strict: false

# load_best_model_at_end: True
max_grad_norm: 0.3

mistral-tenglish-april5_2

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 7
  • eval_batch_size: 7
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 56
  • total_eval_batch_size: 28
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 1

Training results

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.0.dev0
  • Pytorch 2.2.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.0
Downloads last month
13
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for indiehackers/mistral-tenglish-april5_2

Adapter
(1783)
this model