llama-3.1-fineweb-edu-1m-Inst

  • Use default tokenizer

  • ๊ฐ ๋‹จ๊ณ„๋ณ„ ํ›ˆ๋ จ ํ›„, embed_tokens์™€ lm_head๋„ ๋ณ‘ํ•ฉ

  • phase

    hyperparameter in phase
    • pre1
    hyperparameter test: rank=16 lr=1e-5 weight_decay=0.01 drop_out=0.1  
    dataset: richard-park/llama-recipe-pre1-fineweb-edu-1m-split-text  
    llama-3.1 Base Model llama recipe 1 ํ›ˆ๋ จ  
    
    • pre2
      hyperparameter ๋ณ€๊ฒฝ: lr= 1e-5->2e-6, epoch: 3->4, lora dropout: 0.1 -> 0.3  
      dataset: richard-park/llama-recipe-pre2-fineweb-edu-1m-split-text  
      llama-3.1 Base Model llama recipe 2 ํ›ˆ๋ จ  
      nohup tensorboard --logdir=sapie-fineweb-edu/outputs/pre2/llama31-base-1m-fineweb-edu_20250128-01/runs --host 0.0.0.0 --port=5406 > tensorboard.log 2>&1 & disown  
      
    • pre3
      hyperparameter: rank=16/32, lr=1e-5, weight_decay=0.1 drop_out=0.3  
      dataset: richard-park/sapie-dataset-pre3-1m-gt50-le256-split  
      llama-3.1 Base Model llama recipe 3 ํ›ˆ๋ จ  
      nohup tensorboard --logdir=sapie-fineweb-edu/outputs/pre3/llama31-base-1m-aihub-trans_20250130-01/runs --host 0.0.0.0 --port=5406 > tensorboard.log 2>&1 & disown  
      

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using ../models/Llama-3.1-8B-Instruct as a base.

Models Merged

The following models were included in the merge:

  • ../models/llama31-base-pre3-finweb-edu-1m

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: ../models/Llama-3.1-8B-Instruct
    parameters:
      density: [0.6, 0.8, 1]  # ํ•˜์œ„โ†’์ค‘๊ฐ„โ†’์ƒ์œ„ ์ˆœ, ์ƒ์œ„ ๋ณด์กด
      weight:
        - filter: mlp
          value: 0.8         # MLP ๋ ˆ์ด์–ด์— ๋” ํฐ ๊ฐ€์ค‘์น˜ (์ถœ๋ ฅ์— ๊ธฐ์—ฌ)
        - value: 0.5         # ๋‚˜๋จธ์ง€ ๋ ˆ์ด์–ด
  - model: ../models/llama31-base-pre3-finweb-edu-1m  # base model + ํ•œ๊ตญ์–ด pretrain
    parameters:
      density: [1, 0.6, 0.4]  # ํ•˜์œ„์— ๋” ํฐ ์˜ํ–ฅ์„ ์ฃผ๋„๋ก ์„ค์ •
      weight:
        - filter: attention
          value: 0.7         # Attention ๋ ˆ์ด์–ด์— ๋” ํฐ ๊ฐ€์ค‘์น˜ (ํ•œ๊ตญ์–ด ํ•™์Šต ๋ณด๊ฐ•)
        - value: 0.3         # ๋‚˜๋จธ์ง€ ๋ ˆ์ด์–ด
merge_method: dare_ties
base_model: ../models/Llama-3.1-8B-Instruct
dtype: bfloat16
- {'answer_relevancy': 0.7495, 'faithfulness': 0.6831}
- {'answer_relevancy': 0.7406, 'faithfulness': 0.7337}
- {'answer_relevancy': 0.7356, 'faithfulness': 0.6814}
Downloads last month
14
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.