llama-3.1-fineweb-edu-1m-Inst
Use default tokenizer
๊ฐ ๋จ๊ณ๋ณ ํ๋ จ ํ, embed_tokens์ lm_head๋ ๋ณํฉ
phase
hyperparameter in phase
- pre1
hyperparameter test: rank=16 lr=1e-5 weight_decay=0.01 drop_out=0.1 dataset: richard-park/llama-recipe-pre1-fineweb-edu-1m-split-text llama-3.1 Base Model llama recipe 1 ํ๋ จ
- pre2
hyperparameter ๋ณ๊ฒฝ: lr= 1e-5->2e-6, epoch: 3->4, lora dropout: 0.1 -> 0.3 dataset: richard-park/llama-recipe-pre2-fineweb-edu-1m-split-text llama-3.1 Base Model llama recipe 2 ํ๋ จ nohup tensorboard --logdir=sapie-fineweb-edu/outputs/pre2/llama31-base-1m-fineweb-edu_20250128-01/runs --host 0.0.0.0 --port=5406 > tensorboard.log 2>&1 & disown
- pre3
hyperparameter: rank=16/32, lr=1e-5, weight_decay=0.1 drop_out=0.3 dataset: richard-park/sapie-dataset-pre3-1m-gt50-le256-split llama-3.1 Base Model llama recipe 3 ํ๋ จ nohup tensorboard --logdir=sapie-fineweb-edu/outputs/pre3/llama31-base-1m-aihub-trans_20250130-01/runs --host 0.0.0.0 --port=5406 > tensorboard.log 2>&1 & disown
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the DARE TIES merge method using ../models/Llama-3.1-8B-Instruct as a base.
Models Merged
The following models were included in the merge:
- ../models/llama31-base-pre3-finweb-edu-1m
Configuration
The following YAML configuration was used to produce this model:
models:
- model: ../models/Llama-3.1-8B-Instruct
parameters:
density: [0.6, 0.8, 1] # ํ์โ์ค๊ฐโ์์ ์, ์์ ๋ณด์กด
weight:
- filter: mlp
value: 0.8 # MLP ๋ ์ด์ด์ ๋ ํฐ ๊ฐ์ค์น (์ถ๋ ฅ์ ๊ธฐ์ฌ)
- value: 0.5 # ๋๋จธ์ง ๋ ์ด์ด
- model: ../models/llama31-base-pre3-finweb-edu-1m # base model + ํ๊ตญ์ด pretrain
parameters:
density: [1, 0.6, 0.4] # ํ์์ ๋ ํฐ ์ํฅ์ ์ฃผ๋๋ก ์ค์
weight:
- filter: attention
value: 0.7 # Attention ๋ ์ด์ด์ ๋ ํฐ ๊ฐ์ค์น (ํ๊ตญ์ด ํ์ต ๋ณด๊ฐ)
- value: 0.3 # ๋๋จธ์ง ๋ ์ด์ด
merge_method: dare_ties
base_model: ../models/Llama-3.1-8B-Instruct
dtype: bfloat16
- {'answer_relevancy': 0.7495, 'faithfulness': 0.6831}
- {'answer_relevancy': 0.7406, 'faithfulness': 0.7337}
- {'answer_relevancy': 0.7356, 'faithfulness': 0.6814}
- Downloads last month
- 14
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.