--- license: apache-2.0 datasets: - HuggingFaceH4/ultrafeedback_binarized base_model: - AIR-hl/Qwen2.5-1.5B-ultrachat200k pipeline_tag: text-generation tags: - trl - qwen - simpo - alignment - transformers - custome - chat --- # Qwen2.5-1.5B-SimPO ## Model Details - **Model type:** aligned model - **License:** Apache license 2.0 - **Finetuned from model:** [AIR-hl/Qwen2.5-1.5B-ultrachat200k](https://huggingface.co/AIR-hl/Qwen2.5-1.5B-ultrachat200k) - **Training data:** [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) - **Training framework:** [trl](https://github.com/huggingface/trl) ## Training Details devices: 4 * NPU 910B-64GB \ precision: bf16 mixed-precision \ global_batch_size: 128 ### Training Hyperparameters `beta`: 1 \ `gamma`: 0.1 \ `bf16`: True \ `learning_rate`: 1e-6 \ `lr_scheduler_type`: cosine \ `per_device_train_batch_size`: 16 \ `gradient_accumulation_steps`: 2 \ `torch_dtype`: bfloat16 \ `num_train_epochs`: 1 \ `max_prompt_length`: 512 \ `max_length`: 1024 \ `warmup_ratio`: 0.05 ### Results `init_train_loss`: 0.7551 \ `final_train_loss`: 0.6715 \ `accuracy`: 0.6375 \ `reward_margin`: 0.3633 ### Training script ```python import torch from datasets import load_dataset from transformers import AutoModelForCausalLM, AutoTokenizer from trl import ( CPOConfig, CPOTrainer, ModelConfig, ScriptArguments, TrlParser, get_kbit_device_map, get_peft_config, get_quantization_config, ) from trl.trainer.utils import SIMPLE_CHAT_TEMPLATE if __name__ == "__main__": parser = TrlParser((ScriptArguments, CPOConfig, ModelConfig)) script_args, training_args, model_config = parser.parse_args_and_config() torch_dtype = ( model_config.torch_dtype if model_config.torch_dtype in ["auto", None] else getattr(torch, model_config.torch_dtype) ) quantization_config = get_quantization_config(model_config) model_kwargs = dict( revision=model_config.model_revision, attn_implementation=model_config.attn_implementation, torch_dtype=torch_dtype, use_cache=False if training_args.gradient_checkpointing else True, device_map=get_kbit_device_map() if quantization_config is not None else None, quantization_config=quantization_config, ) model = AutoModelForCausalLM.from_pretrained( model_config.model_name_or_path, trust_remote_code=model_config.trust_remote_code, **model_kwargs ) peft_config = get_peft_config(model_config) tokenizer = AutoTokenizer.from_pretrained( model_config.model_name_or_path, trust_remote_code=model_config.trust_remote_code ) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token if tokenizer.chat_template is None: tokenizer.chat_template = SIMPLE_CHAT_TEMPLATE if script_args.ignore_bias_buffers: model._ddp_params_and_buffers_to_ignore = [ name for name, buffer in model.named_buffers() if buffer.dtype == torch.bool ] dataset=load_dataset(script_args.dataset_name, split=script_args.dataset_train_split) dataset=dataset.select_columns(['prompt', 'chosen', 'rejected']) trainer = CPOTrainer( model, args=training_args, train_dataset=dataset, processing_class=tokenizer, peft_config=peft_config, ) trainer.train() trainer.save_model(training_args.output_dir) ```