werty1248
/

EXAONE-3.5-32B-LIMO-Ko-e4

Text Generation

text-generation-inference

Model card Files Files and versions Community

EXAONE-3.5-32B-LIMO-Ko-e4 / README.md

werty1248's picture

Update README.md

90b0fb3 verified 2 months ago

|

history blame contribute delete

2.52 kB

	---
	library_name: transformers
	license: other
	license_name: exaone
	license_link: LICENSE
	language:
	- en
	- ko
	datasets:
	- GAIR/LIMO
	- junnei/ko-limo
	- exp-models/GAIR-LIMO-KOREAN
	base_model:
	- LGAI-EXAONE/EXAONE-3.5-32B-Instruct
	---

	### 데이터 셋

	#### LIMO

	- [GAIR/LIMO](https://huggingface.co/datasets/GAIR/LIMO) (영어, 원본)

	#### LIMO 한국어 번역

	- [exp-models/GAIR-LIMO-KOREAN](https://huggingface.co/datasets/exp-models/GAIR-LIMO-KOREAN) (한국어 번역)
	- [junnei/ko-limo](https://huggingface.co/datasets/junnei/ko-limo) (한국어 번역)

	### 특이사항

	- 원래 [LIMO](https://github.com/GAIR-NLP/LIMO/blob/main/train/data/limo.json)에서는 15 epoch 학습을 수행함
	- 영어1+한국어2 데이터 셋을 섞은 후 5 epoch 학습시켜 원래 학습 방법과 유사한 횟수만큼, 그러나 약간의 변형이 있도록 학습시키려고 함
	- 그러나 정성 평가에서 4 epoch 시점의 checkpoint가 가장 성능이 좋아 보였음

	### Training Details

	- 4xH200 SXM, 13.5 Hours

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/08RXo5k87nW-iWqwJharz.png)

	<details><summary>Axolotl config</summary>

	```
	base_model: beomi/EXAONE-3.5-32B-Instruct-Llamafied
	model_type: AutoModelForCausalLM
	tokenizer_config: beomi/EXAONE-3.5-32B-Instruct-Llamafied
	tokenizer_type: AutoTokenizer

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: werty1248/kk_oo_llliiimmmooo
	field_messages: conversations
	type: chat_template
	chat_template: tokenizer_default

	dataset_prepared_path: ./data_preparation
	output_dir: /workspace/data

	hf_use_auth_token: true

	sequence_len: 32768
	sample_packing: false
	pad_to_sequence_len: true

	plugins:
	- axolotl.integrations.liger.LigerPlugin
	liger_rope: true
	liger_rms_norm: true
	liger_layer_norm: true
	liger_glu_activation: true
	liger_fused_linear_cross_entropy: true

	wandb_project:
	#wandb_entity:
	#wandb_watch:
	wandb_name:
	#wandb_log_model:

	gradient_accumulation_steps: 2
	micro_batch_size: 1
	num_epochs: 5
	optimizer: paged_adamw_8bit
	lr_scheduler: cosine
	learning_rate: 5.0e-6

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_ratio: 0.05
	eval_table_size:

	save_total_limit: 2

	deepspeed: ./deepspeed_configs/zero3_bf16.json

	special_tokens:
	pad_token: "[\|endofturn\|]"
	```

	</details>