Built with Axolotl

See axolotl config

axolotl version: 0.10.0.dev0

base_model: Dans-DiscountModels/Mistral-Nemo-Base-2407-DanChat
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code:

# wandb configuration
wandb_project: 12b-mn-dans-personality-engine
wandb_watch:

wandb_run_id: V1.3.0-1-4 # V{Version}-{Run Number}-{Attempt Number}
wandb_log_model:

# push checkpoints to hub
hub_model_id: Dans-DiscountModels/12b-mn-dans-personality-engine-v1.3.0-TestArticle-1
# how to push checkpoints to hub
# https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
hub_strategy: "every_save"
# Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
# Required to be true when used in combination with `push_dataset_to_hub`
hf_use_auth_token: true

# where to save the finished model to
output_dir: ./12b-mn-dans-personality-engine-v1.3.0

# dataset settings (local or huggingface repo)
datasets:
  - path: Dans-DiscountModels/pretokenization-test-5
    ds_type: parquet
    type:

plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true
cut_cross_entropy: true

load_in_8bit: false
load_in_4bit: false
strict: false

adapter:
lora_model_dir:

dataset_prepared_path: ./12b-mn-dans-personality-engine-data
val_set_size: 0.003

sequence_len: 32768

sample_packing: true
eval_sample_packing: true

pad_to_sequence_len: true

gradient_checkpointing: true

gradient_accumulation_steps: 2
micro_batch_size: 2

num_epochs: 2

optimizer: ademamix_8bit
optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"

lr_scheduler: rex
learning_rate: 0.00001
cosine_min_lr_ratio:

weight_decay:

max_grad_norm: 0.001

train_on_inputs: false
group_by_length: false

bf16: true
fp16: false
tf32: false

early_stopping_patience:

resume_from_checkpoint:
auto_resume_from_checkpoints: true

local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1

evals_per_epoch: 24
eval_table_size:
eval_max_new_tokens:

saves_per_epoch: 2
save_total_limit: 1

debug: false

deepspeed: deepspeed_configs/zero3_bf16.json

fsdp:
fsdp_config:

special_tokens:

12b-mn-dans-personality-engine-v1.3.0-TestArticle-1

This model is a fine-tuned version of Dans-DiscountModels/Mistral-Nemo-Base-2407-DanChat on the Dans-DiscountModels/pretokenization-test-5 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4392

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Use ademamix_8bit and the args are: beta1=0.9,beta2=0.999,beta3=0.999,alpha=5
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 321
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss
1.8086 0.0006 1 1.7459
1.593 0.0417 67 1.5911
1.5578 0.0833 134 1.5565
1.5782 0.1250 201 1.5436
1.5702 0.1666 268 1.5377
1.5926 0.2083 335 1.5328
1.6364 0.2499 402 1.5291
1.5082 0.2916 469 1.5234
1.6002 0.3332 536 1.5197
1.5252 0.3749 603 1.5162
1.5915 0.4165 670 1.5121
1.5108 0.4582 737 1.5103
1.5663 0.4998 804 1.5063
1.5085 0.5415 871 1.5037
1.4273 0.5832 938 1.5024
1.5528 0.6248 1005 1.4994
1.6072 0.6665 1072 1.4975
1.6074 0.7081 1139 1.4920
1.5495 0.7498 1206 1.4904
1.6117 0.7914 1273 1.4883
1.4621 0.8331 1340 1.4850
1.6381 0.8747 1407 1.4838
1.4221 0.9164 1474 1.4813
1.5812 0.9580 1541 1.4789
1.4581 0.9997 1608 1.4750
1.4608 1.0417 1675 1.4800
1.5261 1.0833 1742 1.4798
1.3856 1.1250 1809 1.4796
1.4469 1.1666 1876 1.4766
1.4783 1.2083 1943 1.4741
1.5025 1.2499 2010 1.4733
1.4531 1.2916 2077 1.4726
1.4719 1.3332 2144 1.4712
1.4123 1.3749 2211 1.4700
1.4653 1.4165 2278 1.4673
1.4571 1.4582 2345 1.4660
1.4261 1.4998 2412 1.4660
1.3212 1.5415 2479 1.4620
1.3828 1.5832 2546 1.4617
1.3617 1.6248 2613 1.4597
1.4364 1.6665 2680 1.4567
1.4686 1.7081 2747 1.4549
1.3317 1.7498 2814 1.4530
1.3749 1.7914 2881 1.4506
1.4116 1.8331 2948 1.4468
1.3988 1.8747 3015 1.4456
1.2534 1.9164 3082 1.4448
1.3564 1.9580 3149 1.4412
1.3668 1.9997 3216 1.4392

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.4.1+cu121
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
39
Safetensors
Model size
12.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Dans-DiscountModels/12b-mn-dans-personality-engine-v1.3.0-TestArticle-1

Finetuned
(1)
this model
Quantizations
2 models