werty1248's picture
Update README.md
90b0fb3 verified
---
library_name: transformers
license: other
license_name: exaone
license_link: LICENSE
language:
- en
- ko
datasets:
- GAIR/LIMO
- junnei/ko-limo
- exp-models/GAIR-LIMO-KOREAN
base_model:
- LGAI-EXAONE/EXAONE-3.5-32B-Instruct
---
### 데이터 μ…‹
#### LIMO
- [GAIR/LIMO](https://huggingface.co/datasets/GAIR/LIMO) (μ˜μ–΄, 원본)
#### LIMO ν•œκ΅­μ–΄ λ²ˆμ—­
- [exp-models/GAIR-LIMO-KOREAN](https://huggingface.co/datasets/exp-models/GAIR-LIMO-KOREAN) (ν•œκ΅­μ–΄ λ²ˆμ—­)
- [junnei/ko-limo](https://huggingface.co/datasets/junnei/ko-limo) (ν•œκ΅­μ–΄ λ²ˆμ—­)
### νŠΉμ΄μ‚¬ν•­
- μ›λž˜ [LIMO](https://github.com/GAIR-NLP/LIMO/blob/main/train/data/limo.json)μ—μ„œλŠ” 15 epoch ν•™μŠ΅μ„ μˆ˜ν–‰ν•¨
- μ˜μ–΄1+ν•œκ΅­μ–΄2 데이터 셋을 μ„žμ€ ν›„ 5 epoch ν•™μŠ΅μ‹œμΌœ μ›λž˜ ν•™μŠ΅ 방법과 μœ μ‚¬ν•œ 횟수만큼, κ·ΈλŸ¬λ‚˜ μ•½κ°„μ˜ λ³€ν˜•μ΄ μžˆλ„λ‘ ν•™μŠ΅μ‹œν‚€λ €κ³  함
- κ·ΈλŸ¬λ‚˜ μ •μ„± ν‰κ°€μ—μ„œ 4 epoch μ‹œμ μ˜ checkpointκ°€ κ°€μž₯ μ„±λŠ₯이 μ’‹μ•„ λ³΄μ˜€μŒ
### Training Details
- 4xH200 SXM, 13.5 Hours
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/08RXo5k87nW-iWqwJharz.png)
<details><summary>Axolotl config</summary>
```
base_model: beomi/EXAONE-3.5-32B-Instruct-Llamafied
model_type: AutoModelForCausalLM
tokenizer_config: beomi/EXAONE-3.5-32B-Instruct-Llamafied
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: werty1248/kk_oo_llliiimmmooo
field_messages: conversations
type: chat_template
chat_template: tokenizer_default
dataset_prepared_path: ./data_preparation
output_dir: /workspace/data
hf_use_auth_token: true
sequence_len: 32768
sample_packing: false
pad_to_sequence_len: true
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true
wandb_project:
#wandb_entity:
#wandb_watch:
wandb_name:
#wandb_log_model:
gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 5
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 5.0e-6
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_ratio: 0.05
eval_table_size:
save_total_limit: 2
deepspeed: ./deepspeed_configs/zero3_bf16.json
special_tokens:
pad_token: "[|endofturn|]"
```
</details>