Upload folder using huggingface_hub

Browse files

Files changed (17) hide show

.gitignore +8 -0
README.md +63 -183
assets/img/alpaca_blog.png +0 -0
assets/img/mtbench_hf.png +0 -0
main.py +203 -0
outputs/alpacaeval/Mistral-ORPO-alpha.json +0 -0
outputs/alpacaeval/Mistral-ORPO-beta.json +0 -0
outputs/mtbench/Mistral-ORPO-alpha.jsonl +0 -0
outputs/mtbench/Mistral-ORPO-beta.jsonl +0 -0
requirements.txt +114 -0
scripts/run_mistral_orpo_beta.sh +20 -0
scripts/run_mistral_orpo_capybara.sh +22 -0
src/accelerate/ds2.yaml +21 -0
src/args.py +34 -0
src/orpo_trainer.py +83 -0
src/utils.py +20 -0
trl/test_orpo_trainer_demo.py +100 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,8 @@

+wandb
+src/__pycache__
+scripts/run_orpo.sh
+src/accelerate/fsdp.yaml
+scripts/run_orpo.sh
+src/__pycache__/args.cpython-311.pyc
+src/__pycache__/utils.cpython-311.pyc
+src/accelerate/fsdp.yaml

README.md CHANGED Viewed

@@ -1,183 +1,63 @@
----
-tags:
-- merge
-- mergekit
-- lazymergekit
-- flemmingmiguel/NeuDist-Ro-7B
-- johannhartmann/Brezn3
-- ResplendentAI/Flora_DPO_7B
-base_model:
-- flemmingmiguel/NeuDist-Ro-7B
-- johannhartmann/Brezn3
-- ResplendentAI/Flora_DPO_7B
-language:
-- de
-- en
----
-# Spaetzle-v8-7b
-This model is supposed to show adequate performance in German and English on a number of tasks, while mostly behaving well, that is, without rambling on, intermixing tokens from different templates in training and adapting, etc.
-It is mostly a quick test, and considerably weaker in German grammar and orthography than DiscoLM e.g., but for use cases where this is not too important, but e.g. instruction following, reasoning, etc, it might actually be a little bit preferable.
-It is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
-* [flemmingmiguel/NeuDist-Ro-7B](https://huggingface.co/flemmingmiguel/NeuDist-Ro-7B)
-* [johannhartmann/Brezn3](https://huggingface.co/johannhartmann/Brezn3)
-* [ResplendentAI/Flora_DPO_7B](https://huggingface.co/ResplendentAI/Flora_DPO_7B)
-* on the basis of [mayflowergmbh/Wiedervereinigung-7b-dpo-laser](https://huggingface.co/mayflowergmbh/Wiedervereinigung-7b-dpo-laser)
-All credits are due to the creators of those original models and the training datasets involved.
-For a suitable quantized version, try [cstr/Spaetzle-v8-7b-GGUF](https://huggingface.co/cstr/Spaetzle-v8-7b-GGUF)
-## Evaluation
-[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
-Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_cstr__Spaetzle-v8-7b)
-|             Metric              |Value|
-|---------------------------------|----:|
-|Avg.                             |72.27|
-|AI2 Reasoning Challenge (25-Shot)|68.69|
-|HellaSwag (10-Shot)              |86.68|
-|MMLU (5-Shot)                    |64.60|
-|TruthfulQA (0-shot)              |64.05|
-|Winogrande (5-shot)              |81.45|
-|GSM8k (5-shot)                   |68.16|
-EQ-Bench (v2_de): 61.04 / english (v2): 78.3
-|                           Model                            |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
-|------------------------------------------------------------|------:|------:|---------:|-------:|------:|
-|[Spaetzle-v8-7b](https://huggingface.co/cstr/Spaetzle-v8-7b)|  45.31|  75.69|     63.94|   45.57|  57.63|
-### AGIEval
-|             Task             |Version| Metric |Value|   |Stderr|
-|------------------------------|------:|--------|----:|---|-----:|
-|agieval_aqua_rat              |      0|acc     |25.59|±  |  2.74|
-|                              |       |acc_norm|24.80|±  |  2.72|
-|agieval_logiqa_en             |      0|acc     |39.63|±  |  1.92|
-|                              |       |acc_norm|39.78|±  |  1.92|
-|agieval_lsat_ar               |      0|acc     |23.48|±  |  2.80|
-|                              |       |acc_norm|24.35|±  |  2.84|
-|agieval_lsat_lr               |      0|acc     |50.98|±  |  2.22|
-|                              |       |acc_norm|51.96|±  |  2.21|
-|agieval_lsat_rc               |      0|acc     |62.08|±  |  2.96|
-|                              |       |acc_norm|62.83|±  |  2.95|
-|agieval_sat_en                |      0|acc     |78.64|±  |  2.86|
-|                              |       |acc_norm|79.13|±  |  2.84|
-|agieval_sat_en_without_passage|      0|acc     |44.66|±  |  3.47|
-|                              |       |acc_norm|44.66|±  |  3.47|
-|agieval_sat_math              |      0|acc     |37.27|±  |  3.27|
-|                              |       |acc_norm|35.00|±  |  3.22|
-Average: 45.31%
-### GPT4All
-|    Task     |Version| Metric |Value|   |Stderr|
-|-------------|------:|--------|----:|---|-----:|
-|arc_challenge|      0|acc     |63.14|±  |  1.41|
-|             |       |acc_norm|64.51|±  |  1.40|
-|arc_easy     |      0|acc     |85.98|±  |  0.71|
-|             |       |acc_norm|82.49|±  |  0.78|
-|boolq        |      1|acc     |88.10|±  |  0.57|
-|hellaswag    |      0|acc     |66.31|±  |  0.47|
-|             |       |acc_norm|85.17|±  |  0.35|
-|openbookqa   |      0|acc     |38.00|±  |  2.17|
-|             |       |acc_norm|47.20|±  |  2.23|
-|piqa         |      0|acc     |83.35|±  |  0.87|
-|             |       |acc_norm|84.17|±  |  0.85|
-|winogrande   |      0|acc     |78.22|±  |  1.16|
-Average: 75.69%
-### TruthfulQA
-|    Task     |Version|Metric|Value|   |Stderr|
-|-------------|------:|------|----:|---|-----:|
-|truthfulqa_mc|      1|mc1   |47.74|±  |  1.75|
-|             |       |mc2   |63.94|±  |  1.53|
-Average: 63.94%
-### Bigbench
-|                      Task                      |Version|       Metric        |Value|   |Stderr|
-|------------------------------------------------|------:|---------------------|----:|---|-----:|
-|bigbench_causal_judgement                       |      0|multiple_choice_grade|56.84|±  |  3.60|
-|bigbench_date_understanding                     |      0|multiple_choice_grade|66.12|±  |  2.47|
-|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|41.47|±  |  3.07|
-|bigbench_geometric_shapes                       |      0|multiple_choice_grade|22.01|±  |  2.19|
-|                                                |       |exact_str_match      | 0.00|±  |  0.00|
-|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|31.40|±  |  2.08|
-|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|23.14|±  |  1.60|
-|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|56.00|±  |  2.87|
-|bigbench_movie_recommendation                   |      0|multiple_choice_grade|45.00|±  |  2.23|
-|bigbench_navigate                               |      0|multiple_choice_grade|50.70|±  |  1.58|
-|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|70.05|±  |  1.02|
-|bigbench_ruin_names                             |      0|multiple_choice_grade|45.54|±  |  2.36|
-|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|26.05|±  |  1.39|
-|bigbench_snarks                                 |      0|multiple_choice_grade|71.82|±  |  3.35|
-|bigbench_sports_understanding                   |      0|multiple_choice_grade|72.92|±  |  1.42|
-|bigbench_temporal_sequences                     |      0|multiple_choice_grade|44.20|±  |  1.57|
-|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|22.80|±  |  1.19|
-|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|18.23|±  |  0.92|
-|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|56.00|±  |  2.87|
-Average: 45.57%
-Average score: 57.63%
-## 💻 Usage
-```python
-!pip install -qU transformers accelerate
-from transformers import AutoTokenizer
-import transformers
-import torch
-model = "cstr/Spaetzle-v8-7b"
-messages = [{"role": "user", "content": "What is a large language model?"}]
-tokenizer = AutoTokenizer.from_pretrained(model)
-prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-pipeline = transformers.pipeline(
-    "text-generation",
-    model=model,
-    torch_dtype=torch.float16,
-    device_map="auto",
-)
-outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
-print(outputs[0]["generated_text"])
-```
-## 🧩 Configuration
-The model uses ChatML and should work well with this (as it is merged from models which (mostly) saw ChatML templates in training).
-```yaml
-models:
-  - model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
-    # no parameters necessary for base model
-  - model: flemmingmiguel/NeuDist-Ro-7B
-    parameters:
-      density: 0.60
-      weight: 0.30
-  - model: johannhartmann/Brezn3
-    parameters:
-      density: 0.65
-      weight: 0.40
-  - model: ResplendentAI/Flora_DPO_7B
-    parameters:
-      density: 0.6
-      weight: 0.3
-merge_method: dare_ties
-base_model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
-parameters:
-  int8_mask: true
-dtype: bfloat16
-random_seed: 0
-tokenizer_source: base
-```

+# **ORPO**
+### **`Updates (24.03.25)`**
+- [X] Sample script for ORPOTrainer in 🤗<a class="link" href="https://github.com/huggingface/trl">TRL</a> is added to `trl/test_orpo_trainer_demo.py`
+- [X] New model, 🤗<a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k">kaist-ai/mistral-orpo-capybara-7k</a>, is added to 🤗<a class="link" href="https://huggingface.co/collections/kaist-ai/orpo-65efef87544ba100aef30013">ORPO Collection</a>
+- [X] Now you can try ORPO in 🤗<a class="link" href="https://github.com/huggingface/trl">TRL</a> and <a class="link" href="https://github.com/OpenAccess-AI-Collective/axolotl">Axolotl</a>🔥
+- [X] We are making general guideline for training LLMs with ORPO, stay tuned🔥
+- [X] **Mistral-ORPO-β** achieved a 14.7% in the length-controlled (LC) win rate on <a class="link" href="https://tatsu-lab.github.io/alpaca_eval/">official AlpacaEval Leaderboard</a>🔥
+&nbsp;
+This is the official repository for <a class="link" href="https://arxiv.org/abs/2403.07691">**ORPO: Monolithic Preference Optimization without Reference Model**</a>. The detailed results in the paper can be found in:
+- [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta)
+- [AlpacaEval](#alpacaeval)
+- [MT-Bench](#mt-bench)
+- [IFEval](#ifeval)
+### **`Model Checkpoints`**
+Our models trained with ORPO can be found in:
+- [X] **Mistral-ORPO-Capybara-7k**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k">kaist-ai/mistral-orpo-capybara-7k</a>
+- [X] **Mistral-ORPO-⍺**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-alpha">kaist-ai/mistral-orpo-alpha</a>
+- [X] **Mistral-ORPO-β**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-beta">kaist-ai/mistral-orpo-beta</a>
+And the corresponding logs for the average log probabilities of chosen/rejected responses during training are reported in:
+- [X] **Mistral-ORPO-Capybara-7k**: TBU
+- [X] **Mistral-ORPO-⍺**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE1NzE0?accessToken=rms6o4mg5vo3feu1bvbpk632m4cspe19l0u1p4he3othx5bgean82chn9neiile6">Wandb Report for Mistral-ORPO-⍺</a>
+- [X] **Mistral-ORPO-β**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE3MzMy?accessToken=dij4qbp6dcrofsanzbgobjsne9el8a2zkly2u5z82rxisd4wiwv1rhp0s2dub11e">Wandb Report for Mistral-ORPO-β</a>
+&nbsp;
+### **`AlpacaEval`**
+<figure>
+  <img class="png" src="/assets/img/alpaca_blog.png" alt="Description of the image">
+  <figcaption><b>Figure 1.</b> AlpacaEval 2.0 score for the models trained with different alignment methods.</figcaption>
+</figure>
+&nbsp;
+### **`MT-Bench`**
+<figure>
+  <img class="png" src="/assets/img/mtbench_hf.png" alt="Description of the image">
+  <figcaption><b>Figure 2.</b> MT-Bench result by category.</figcaption>
+</figure>
+&nbsp;
+### **`IFEval`**
+IFEval scores are measured with <a class="link" href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI/lm-evaluation-harness</a> by applying the chat template. The scores for Llama-2-Chat (70B), Zephyr-β (7B), and Mixtral-8X7B-Instruct-v0.1 are originally reported in <a class="link" href="https://twitter.com/wiskojo/status/1739767758462877823">this tweet</a>.
+| **Model Type**     | **Prompt-Strict** | **Prompt-Loose** | **Inst-Strict** | **Inst-Loose** |
+|--------------------|:-----------------:|:----------------:|:---------------:|----------------|
+| **Llama-2-Chat (70B)** |       0.4436      |      0.5342      |      0.5468     |     0.6319     |
+| **Zephyr-β (7B)** |       0.4233      |      0.4547      |      0.5492     |     0.5767     |
+| **Mixtral-8X7B-Instruct-v0.1** |       0.5213      |      **0.5712**      |      0.6343     |     **0.6823**     |
+| **Mistral-ORPO-⍺ (7B)** |       0.5009      |      0.5083      |      0.5995     |     0.6163     |
+| **Mistral-ORPO-β (7B)** |       **0.5287**      |      0.5564      |      **0.6355**     |     0.6619     |

assets/img/alpaca_blog.png ADDED Viewed

assets/img/mtbench_hf.png ADDED Viewed

main.py ADDED Viewed

	@@ -0,0 +1,203 @@

+import os
+import time
+import wandb
+import torch
+import argparse
+from datasets import load_dataset
+from typing import List, Dict, Union
+from transformers import (
+    AutoTokenizer,
+    AutoModelForCausalLM,
+    TrainingArguments,
+    DataCollatorForLanguageModeling
+)
+from src.args import default_args
+from src.orpo_trainer import ORPOTrainer
+from src.utils import preprocess_logits_for_metrics, dataset_split_selector
+class ORPO(object):
+    def __init__(self, args) -> None:
+        self.start = time.gmtime()
+        self.args = args
+        # Load Tokenizer
+        print(">>> 1. Loading Tokenizer")
+        self.tokenizer = AutoTokenizer.from_pretrained(self.args.model_name, cache_dir=self.args.cache_dir)
+        if self.tokenizer.chat_template is None:
+            self.tokenizer.chat_template = "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
+            print("     1-1. Chat Template Applied (<|user|> <|assistant|>)")
+        else:
+            pass
+        self.tokenizer.pad_token_id = self.tokenizer.eos_token_id
+        # Load Model
+        print(">>> 2. Loading Model")
+        if self.args.flash_attention_2:
+            self.model = AutoModelForCausalLM.from_pretrained(self.args.model_name,
+                                                              cache_dir=self.args.cache_dir,
+                                                              torch_dtype=torch.bfloat16,
+                                                              attn_implementation="flash_attention_2")
+        else:
+            self.model = AutoModelForCausalLM.from_pretrained(self.args.model_name,
+                                                              cache_dir=self.args.cache_dir,
+                                                              torch_dtype=torch.bfloat16)
+        # Load Dataset
+        print(">>> 3. Loading Dataset")
+        self.data = load_dataset(self.args.data_name, cache_dir=self.args.cache_dir)
+        # Preprocess Dataset
+        print(">>> 4. Filtering and Preprocessing Dataset")
+        data_split = dataset_split_selector(self.data)
+        if len(data_split) == 1:
+            self.is_test = False
+            train_split = data_split[0]
+            print(f"   >>> Test Set = {self.is_test}")
+        else:
+            self.is_test = True
+            train_split = data_split[0]
+            test_split = data_split[1]
+            test = self.data[test_split].filter(self.filter_dataset)
+            self.test = test.map(self.preprocess_dataset, batched=True, num_proc=self.args.num_proc, remove_columns=self.data[test_split].column_names)
+        train = self.data[train_split].filter(self.filter_dataset)
+        print(f"\n\n>>> {len(train)} / {len(self.data[train_split])} rows left after filtering by prompt length.")
+        self.train = train.map(self.preprocess_dataset, batched=True, num_proc=self.args.num_proc, remove_columns=self.data[train_split].column_names)
+        # Set WANDB & Logging Configurations
+        self.run_name = f"{self.args.model_name.split('/')[-1]}-{self.args.data_name.split('/')[-1]}-lambda{self.args.alpha}-ORPO-{self.start.tm_mday}-{self.start.tm_hour}-{self.start.tm_min}"
+        self.save_dir = os.path.join('./checkpoints/', f"{self.args.data_name.split('/')[-1]}/{self.run_name}")
+        self.log_dir = os.path.join('./checkpoints/', f"{self.args.data_name.split('/')[-1]}/{self.run_name}/logs")
+        os.makedirs(self.save_dir, exist_ok=True)
+        os.makedirs(self.log_dir, exist_ok=True)
+    def preprocess_dataset(self, examples: Union[List, Dict]):
+        if ('instruction' in examples.keys()) or ('question' in examples.keys()):
+            prompt_key = 'instruction' if 'instruction' in examples.keys() else 'question'
+            prompt = [self.tokenizer.apply_chat_template([{'role': 'user', 'content': item}], tokenize=False, add_generation_prompt=True) for item in examples[prompt_key]]
+            chosen = [self.tokenizer.apply_chat_template([{'role': 'user', 'content': item_prompt}, {'role': 'assistant', 'content': item_chosen}], tokenize=False) for item_prompt, item_chosen in zip(examples[prompt_key], examples['chosen'])]
+            rejected = [self.tokenizer.apply_chat_template([{'role': 'user', 'content': item_prompt}, {'role': 'assistant', 'content': item_rejected}], tokenize=False) for item_prompt, item_rejected in zip(examples[prompt_key], examples['rejected'])]
+        else:
+            prompt = [self.tokenizer.apply_chat_template([item[0]], tokenize=False, add_generation_prompt=True) for item in examples['chosen']]
+            chosen = [self.tokenizer.apply_chat_template(item, tokenize=False) for item in examples['chosen']]
+            rejected = [self.tokenizer.apply_chat_template(item, tokenize=False) for item in examples['rejected']]
+        model_inputs = self.tokenizer(prompt,
+                                      max_length=self.args.response_max_length,
+                                      padding='max_length',
+                                      truncation=True,
+                                      return_tensors='pt')
+        pos_labels = self.tokenizer(chosen,
+                                    max_length=self.args.response_max_length,
+                                    padding='max_length',
+                                    truncation=True,
+                                    return_tensors='pt')
+        neg_labels = self.tokenizer(rejected,
+                                    max_length=self.args.response_max_length,
+                                    padding='max_length',
+                                    truncation=True,
+                                    return_tensors='pt')
+        model_inputs['positive_input_ids'] = pos_labels['input_ids']
+        model_inputs['positive_attention_mask'] = pos_labels['attention_mask']
+        model_inputs['negative_input_ids'] = neg_labels['input_ids']
+        model_inputs['negative_attention_mask'] = neg_labels['attention_mask']
+        return model_inputs
+    def filter_dataset(self, examples: Union[List, Dict]):
+        if 'instruction' in examples.keys():
+            query = examples['instruction']
+            prompt_length = self.tokenizer.apply_chat_template([{'content': query, 'role': 'user'}], tokenize=True, add_generation_prompt=True, return_tensors='pt').size(-1)
+        elif 'question' in examples.keys():
+            query = examples['question']
+            prompt_length = self.tokenizer.apply_chat_template([{'content': query, 'role': 'user'}], tokenize=True, add_generation_prompt=True, return_tensors='pt').size(-1)
+        else:
+            prompt_length = self.tokenizer.apply_chat_template([examples['chosen'][0]], tokenize=True, add_generation_prompt=True, return_tensors='pt').size(-1)
+        if prompt_length < self.args.prompt_max_length:
+            return True
+        else:
+            return False
+    def prepare_trainer(self):
+        wandb.init(name=self.run_name)
+        arguments = TrainingArguments(
+            output_dir=self.save_dir,  # The output directory
+            logging_dir=self.log_dir,
+            logging_steps=50,
+            learning_rate=self.args.lr,
+            overwrite_output_dir=True,  # overwrite the content of the output directory
+            num_train_epochs=self.args.num_train_epochs,  # number of training epochs
+            per_device_train_batch_size=self.args.per_device_train_batch_size,  # batch size for training
+            per_device_eval_batch_size=self.args.per_device_eval_batch_size,  # batch size for evaluation
+            evaluation_strategy=self.args.evaluation_strategy if self.is_test else 'no',  # batch size for evaluation
+            save_strategy=self.args.evaluation_strategy,
+            optim=self.args.optim,
+            warmup_steps=self.args.warmup_steps,
+            gradient_accumulation_steps=self.args.gradient_accumulation_steps,
+            gradient_checkpointing=True, #if ('llama' in self.args.model_name.lower()) or ('mistral' in self.args.model_name.lower()) else False,
+            gradient_checkpointing_kwargs={'use_reentrant':True},
+            load_best_model_at_end=self.is_test,
+            do_train=True,
+            do_eval=self.is_test,
+            lr_scheduler_type=self.args.lr_scheduler_type,
+            remove_unused_columns=False,
+            report_to='wandb',
+            run_name=self.run_name,
+            bf16=True
+        )
+        data_collator = DataCollatorForLanguageModeling(tokenizer=self.tokenizer, mlm=False)
+        self.trainer = ORPOTrainer(
+            model=self.model,
+            alpha=self.args.alpha,
+            pad=self.tokenizer.pad_token_id,
+            args=arguments,
+            train_dataset=self.train,
+            eval_dataset=self.test if self.is_test else None,
+            data_collator=data_collator,
+            preprocess_logits_for_metrics=preprocess_logits_for_metrics
+        )
+    def run(self):
+        print(">>> 5. Preparing ORPOTrainer")
+        self.prepare_trainer()
+        self.trainer.train()
+        # Saving code for FSDP
+        if self.trainer.is_fsdp_enabled:
+            self.trainer.accelerator.state.fsdp_plugin.set_state_dict_type("FULL_STATE_DICT")
+        self.trainer.save_model()
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser("ORPO")
+    args = default_args(parser)
+    # Set WANDB configurations
+    if args.wandb_entity is not None and args.wandb_project_name is not None:
+        os.environ["WANDB_ENTITY"] = args.wandb_entity
+        os.environ["WANDB_PROJECT"] = args.wandb_project_name
+    else:
+        pass
+    os.environ["TOKENIZERS_PARALLELISM"] = 'false'
+    print("================================================================================================\n")
+    print(f">>> Fine-tuning {args.model_name} with ORPO on {args.data_name}\n")
+    print("================================================================================================")
+    print("\n\n>>> Summary:")
+    print(f"    - Lambda              : {args.alpha}")
+    print(f"    - Training Epochs     : {args.num_train_epochs}")
+    print(f"    - Prompt Max Length   : {args.prompt_max_length}")
+    print(f"    - Response Max Length : {args.response_max_length}")
+    item = ORPO(args=args)
+    item.run()

outputs/alpacaeval/Mistral-ORPO-alpha.json ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs/alpacaeval/Mistral-ORPO-beta.json ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs/mtbench/Mistral-ORPO-alpha.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

outputs/mtbench/Mistral-ORPO-beta.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,114 @@

+accelerate @ file:///home/conda/feedstock_root/build_artifacts/accelerate_1710334587919/work
+aiohttp @ file:///croot/aiohttp_1707342283163/work
+aiosignal @ file:///tmp/build/80754af9/aiosignal_1637843061372/work
+appdirs==1.4.4
+asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
+attrs @ file:///croot/attrs_1695717823297/work
+bitsandbytes==0.43.0
+Bottleneck @ file:///croot/bottleneck_1707864210935/work
+Brotli @ file:///work/ci_py311/brotli-split_1676830125088/work
+cachetools==5.3.3
+certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1707022139797/work/certifi
+cffi @ file:///croot/cffi_1700254295673/work
+charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
+click @ file:///croot/click_1698129812380/work
+comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work
+datasets @ file:///home/conda/feedstock_root/build_artifacts/datasets_1709395865330/work
+debugpy @ file:///croot/debugpy_1690905042057/work
+decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
+dill @ file:///croot/dill_1692271232022/work
+docker-pycreds @ file:///Users/ktietz/demo/mc3/conda-bld/docker-pycreds_1630654474270/work
+einops==0.7.0
+exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1704921103267/work
+executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1698579936712/work
+filelock @ file:///croot/filelock_1700591183607/work
+flash-attn==2.5.6
+frozenlist @ file:///croot/frozenlist_1698702560391/work
+fsspec==2023.4.0
+gitdb @ file:///tmp/build/80754af9/gitdb_1617117951232/work
+GitPython @ file:///croot/gitpython_1696936983078/work
+gmpy2 @ file:///work/ci_py311/gmpy2_1676839849213/work
+huggingface-hub @ file:///croot/huggingface_hub_1708634519519/work
+idna @ file:///work/ci_py311/idna_1676822698822/work
+importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1709821103657/work
+ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1708996548741/work
+ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1709559745751/work
+jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
+Jinja2==3.1.2
+jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1710255804825/work
+jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1710257359434/work
+MarkupSafe @ file:///croot/markupsafe_1704205993651/work
+matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work
+mkl-fft @ file:///croot/mkl_fft_1695058164594/work
+mkl-random @ file:///croot/mkl_random_1695059800811/work
+mkl-service==2.4.0
+mpmath @ file:///croot/mpmath_1690848262763/work
+multidict @ file:///croot/multidict_1701096859099/work
+multiprocess @ file:///croot/multiprocess_1692294385131/work
+nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
+networkx==3.2.1
+ninja==1.11.1.1
+numexpr @ file:///croot/numexpr_1696515281613/work
+numpy @ file:///croot/numpy_and_numpy_base_1708638617955/work/dist/numpy-1.26.4-cp311-cp311-linux_x86_64.whl#sha256=5f96f274d410a1682519282ae769c877d32fdbf171aa8badec7bf5e1d3a1748a
+nvidia-cublas-cu11==11.11.3.6
+nvidia-cuda-cupti-cu11==11.8.87
+nvidia-cuda-nvrtc-cu11==11.8.89
+nvidia-cuda-runtime-cu11==11.8.89
+nvidia-cudnn-cu11==8.7.0.84
+nvidia-cufft-cu11==10.9.0.58
+nvidia-curand-cu11==10.3.0.86
+nvidia-cusolver-cu11==11.4.1.48
+nvidia-cusparse-cu11==11.7.5.86
+nvidia-ml-py==12.535.133
+nvidia-nccl-cu11==2.19.3
+nvidia-nvtx-cu11==11.8.86
+nvitop==1.3.2
+packaging @ file:///croot/packaging_1693575174725/work
+pandas @ file:///croot/pandas_1709590491089/work/dist/pandas-2.2.1-cp311-cp311-linux_x86_64.whl#sha256=0a2793a31a0135a35735e1431d453a06186a3a7c607d9b441d9bd5f0fe4ded31
+parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work
+pathtools @ file:///Users/ktietz/demo/mc3/conda-bld/pathtools_1629713893697/work
+pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work
+pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
+pillow==10.2.0
+platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1706713388748/work
+prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1702399386289/work
+protobuf==3.20.3
+psutil @ file:///work/ci_py311_2/psutil_1679337388738/work
+ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
+pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work
+pyarrow @ file:///croot/pyarrow_1707330824290/work/python
+pyarrow-hotfix @ file:///home/conda/feedstock_root/build_artifacts/pyarrow-hotfix_1700596371886/work
+pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
+Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1700607939962/work
+PySocks @ file:///work/ci_py311/pysocks_1676822712504/work
+python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
+pytz @ file:///croot/pytz_1695131579487/work
+PyYAML @ file:///croot/pyyaml_1698096049011/work
+pyzmq @ file:///croot/pyzmq_1705605076900/work
+regex @ file:///croot/regex_1696515298636/work
+requests @ file:///croot/requests_1707355572290/work
+safetensors @ file:///croot/safetensors_1708633833937/work
+sentry-sdk @ file:///work/ci_py311/sentry-sdk_1676862120883/work
+setproctitle @ file:///work/ci_py311/setproctitle_1676838789127/work
+six @ file:///tmp/build/80754af9/six_1644875935023/work
+smmap @ file:///tmp/build/80754af9/smmap_1611694433573/work
+stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
+sympy @ file:///croot/sympy_1701397643339/work
+termcolor==2.4.0
+tokenizers @ file:///croot/tokenizers_1708633814160/work
+torch==2.2.1+cu118
+torchaudio==2.2.1+cu118
+torchvision==0.17.1+cu118
+tornado @ file:///croot/tornado_1696936946304/work
+tqdm @ file:///croot/tqdm_1679561862951/work
+traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1710254411456/work
+transformers @ file:///home/conda/feedstock_root/build_artifacts/transformers_1709308155748/work
+triton==2.2.0
+typing_extensions==4.8.0
+tzdata @ file:///croot/python-tzdata_1690578112552/work
+urllib3 @ file:///croot/urllib3_1707770551213/work
+wandb @ file:///home/conda/feedstock_root/build_artifacts/wandb_1707246480133/work
+wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
+xxhash @ file:///work/ci_py311/python-xxhash_1676842384694/work
+yarl @ file:///croot/yarl_1701105127787/work
+zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1695255097490/work

scripts/run_mistral_orpo_beta.sh ADDED Viewed

	@@ -0,0 +1,20 @@

+#!/bin/bash
+# Mistral-ORPO series are trained on 4 * A100s
+accelerate launch --config_file ./src/accelerate/fsdp.yaml main.py \
+    --lr 5e-6 \
+    --lr_scheduler_type inverse_sqrt \
+    --alpha 0.1 \
+    --torch_compile False \
+    --warmup_steps 200 \
+    --model_name mistralai/Mistral-7B-v0.1 \
+    --data_name argilla/ultrafeedback-binarized-preferences-cleaned \
+    --num_train_epochs 5 \
+    --prompt_max_length 1792 \
+    --response_max_length 2048 \
+    --per_device_train_batch_size 8 \
+    --per_device_eval_batch_size 8 \
+    --gradient_accumulation_steps 1 \
+    --num_proc 8 \
+    --flash_attention_2

scripts/run_mistral_orpo_capybara.sh ADDED Viewed

	@@ -0,0 +1,22 @@

+#!/bin/bash
+# Mistral-ORPO series are trained on 4 * A100s
+accelerate launch --config_file ./src/accelerate/fsdp.yaml main.py \
+    --lr 5e-6 \
+    --torch_compile False \
+    --alpha 0.05 \
+    --lr_scheduler_type inverse_sqrt \
+    --cache_dir /projects/hf_cache/ \
+    --warmup_steps 100 \
+    --model_name mistralai/Mistral-7B-v0.1 \
+    --data_name argilla/distilabel-capybara-dpo-7k-binarized \
+    --num_train_epochs 3 \
+    --optim adamw_bnb_8bit \
+    --gradient_accumulation_steps 1 \
+    --prompt_max_length 1792 \
+    --response_max_length 2048 \
+    --per_device_train_batch_size 8 \
+    --per_device_eval_batch_size 8 \
+    --num_proc 8 \
+    --flash_attention_2

src/accelerate/ds2.yaml ADDED Viewed

	@@ -0,0 +1,21 @@

+compute_environment: LOCAL_MACHINE
+debug: false
+deepspeed_config:
+  gradient_accumulation_steps: 1
+  offload_optimizer_device: none
+  offload_param_device: none
+  zero3_init_flag: false
+  zero_stage: 2
+distributed_type: DEEPSPEED
+downcast_bf16: 'no'
+machine_rank: 0
+main_training_function: main
+mixed_precision: bf16
+num_machines: 1
+num_processes: 2
+rdzv_backend: static
+same_network: true
+tpu_env: []
+tpu_use_cluster: false
+tpu_use_sudo: false
+use_cpu: false

src/args.py ADDED Viewed

	@@ -0,0 +1,34 @@

+def default_args(parser):
+    parser.add_argument("--cache_dir", default=None, type=str)
+    parser.add_argument("--save_dir", default='./saved', type=str)
+    parser.add_argument("--data_name", default='HuggingfaceH4/UltraFeedback', type=str)
+    parser.add_argument("--model_name", default="gpt2", type=str)
+    # Training Arguments
+    parser.add_argument("--torch_compile", default=False, type=bool)
+    parser.add_argument("--flash_attention_2", action='store_true')
+    parser.add_argument("--lr_scheduler_type", default="cosine", type=str)
+    parser.add_argument("--optim", default="paged_adamw_32bit", type=str)
+    parser.add_argument("--overwrite_output_dir", default=True, type=bool)
+    parser.add_argument("--lr", default=2e-5, type=float)
+    parser.add_argument("--num_proc", default=1, type=int)
+    parser.add_argument("--num_train_epochs", default=10, type=int)
+    parser.add_argument("--per_device_train_batch_size", default=2, type=int)
+    parser.add_argument("--per_device_eval_batch_size", default=2, type=int)
+    parser.add_argument("--warmup_steps", default=5000, type=int)
+    parser.add_argument("--evaluation_strategy", default='epoch', type=str)
+    parser.add_argument("--do_eval", action='store_true')
+    parser.add_argument("--gradient_accumulation_steps", default=1, type=int)
+    parser.add_argument("--save_strategy", default='epoch', type=str)
+    parser.add_argument("--prompt_max_length", default=256, type=int)
+    parser.add_argument("--response_max_length", default=1024, type=int)
+    parser.add_argument("--alpha", default=1.0, type=float, help="Hyperparameter for weighting L_OR")
+    # Wandb Configurations
+    parser.add_argument("--wandb_entity", default=None, type=str)
+    parser.add_argument("--wandb_project_name", default=None, type=str)
+    args = parser.parse_args()
+    return args

src/orpo_trainer.py ADDED Viewed

	@@ -0,0 +1,83 @@

+import torch
+import wandb
+from transformers import Trainer
+class ORPOTrainer(Trainer):
+    def __init__(self, alpha, pad, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.pad = pad
+        self.alpha = alpha
+        self.loss_fct = torch.nn.CrossEntropyLoss(reduction='none')
+        print("Pad Token ID: ", self.pad)
+    def compute_custom_loss(self, logits, labels):
+        logits = logits.contiguous()
+        if labels is not None:
+            # move labels to correct device to enable model parallelism
+            labels = labels.to(logits.device)
+            # Shift so that tokens < n predict n
+            shift_logits = logits[..., :-1, :].contiguous()
+            shift_labels = labels[..., 1:].contiguous()
+            # Flatten the tokens
+            loss = self.loss_fct(shift_logits.transpose(2, 1), shift_labels).mean(dim=-1)
+        return loss
+    def compute_logps(self, prompt_attention_mask, chosen_inputs, chosen_attention_mask, logits):
+        mask = chosen_attention_mask[:, :-1] - prompt_attention_mask[:, 1:]
+        per_token_logps = torch.gather(logits[:, :-1, :].log_softmax(-1), dim=2,
+                                       index=(mask * chosen_inputs[:, 1:]).unsqueeze(2)).squeeze(2)
+        return torch.mul(per_token_logps, mask.to(dtype=torch.bfloat16)).sum(dim=1).to(dtype=torch.float64) / mask.sum(dim=1).to(dtype=torch.float64)
+    def compute_loss(self, model, inputs, return_outputs=False):
+        if self.label_smoother is not None and "labels" in inputs:
+            labels = inputs.pop("labels")
+        else:
+            labels = None
+        # Generate the hidden states for 'chosen' and 'reject'
+        neg_labels = inputs['negative_input_ids'].clone()
+        pos_labels = inputs['positive_input_ids'].clone()
+        neg_labels[neg_labels == self.pad] = -100
+        pos_labels[pos_labels == self.pad] = -100
+        outputs_neg = model(**{'input_ids': inputs['negative_input_ids'],
+                               'attention_mask': inputs['negative_attention_mask'],
+                               'labels': neg_labels,}, output_hidden_states=True)
+        outputs_pos = model(**{'input_ids': inputs['positive_input_ids'],
+                               'attention_mask': inputs['positive_attention_mask'],
+                               'labels': pos_labels,}, output_hidden_states=True)
+        # Calculate NLL loss
+        pos_loss = self.compute_custom_loss(logits=outputs_pos.logits, labels=inputs['positive_input_ids'])
+        # Calculate Log Probability
+        pos_prob = self.compute_logps(prompt_attention_mask=inputs['attention_mask'],
+                                      chosen_inputs=inputs['positive_input_ids'],
+                                      chosen_attention_mask=inputs['positive_attention_mask'],
+                                      logits=outputs_pos.logits)
+        neg_prob = self.compute_logps(prompt_attention_mask=inputs['attention_mask'],
+                                      chosen_inputs=inputs['negative_input_ids'],
+                                      chosen_attention_mask=inputs['negative_attention_mask'],
+                                      logits=outputs_neg.logits)
+        # Calculate log odds
+        log_odds = (pos_prob - neg_prob) - (torch.log(1 - torch.exp(pos_prob)) - torch.log(1 - torch.exp(neg_prob)))
+        sig_ratio = torch.nn.functional.sigmoid(log_odds)
+        ratio = torch.log(sig_ratio)
+        # Calculate the Final Loss
+        loss = torch.mean(pos_loss - self.alpha * ratio).to(dtype=torch.bfloat16)
+        wandb.log({'Positive Geometric Mean': torch.mean(pos_prob).item(),
+                   'Negative Geometric Mean': torch.mean(neg_prob).item(),
+                   'Log Odds Ratio': torch.mean(ratio).item(),
+                   'Log Odds': torch.mean(log_odds).item()})
+        return (loss, outputs_pos) if return_outputs else loss

src/utils.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from typing import List
+def preprocess_logits_for_metrics(logits, labels):
+    if isinstance(logits, tuple):
+        logits = logits[0]
+    return logits.argmax(dim=-1)
+def dataset_split_selector(data) -> List:
+    """
+    This is a function for automating the process of selecting data split.
+    Will be further updated.
+    """
+    if len(data.keys()) == 1:
+        return ['train']
+    else:
+        if 'train_prefs' in data.keys():
+            return ['train_prefs', 'test_prefs']
+        else:
+            return ['train', 'test']

trl/test_orpo_trainer_demo.py ADDED Viewed

	@@ -0,0 +1,100 @@

+from dataclasses import dataclass, field
+from typing import Optional
+import os
+import torch
+from datasets import load_dataset
+from tqdm import tqdm
+from transformers import AutoTokenizer, HfArgumentParser, pipeline
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from trl import ORPOConfig, ORPOTrainer, set_seed
+from trl.core import LengthSampler
+# This code is built on top of the example code from Huggingface TRL Team
+tqdm.pandas()
+@dataclass
+class ScriptArguments:
+    model_name: Optional[str] = field(default="microsoft/phi-2", metadata={"help": "the model name"})
+    optim: Optional[str] = field(default="adamw_torch", metadata={"help": "the model name"})
+    data_name: Optional[str] = field(default="argilla/dpo-mix-7k", metadata={"help": "the model name"})
+    cache_dir: Optional[str] = field(default="", metadata={"help": "the model name"})
+    log_with: Optional[str] = field(default='wandb', metadata={"help": "use 'wandb' to log with wandb"})
+    output_dir: Optional[str] = field(default='', metadata={"help": "use 'wandb' to log with wandb"})
+    learning_rate: Optional[float] = field(default=1.41e-5, metadata={"help": "the learning rate"})
+    lr_scheduler_type: Optional[str] = field(default='cosine', metadata={"help": "the learning rate scheduler"})
+    per_device_train_batch_size: Optional[int] = field(default=4, metadata={"help": "the batch size"})
+    num_train_epochs: Optional[int] = field(default=5, metadata={"help": "the batch size"})
+    beta: Optional[float] = field(default=0.25, metadata={"help": "weighting hyperparameter for L_OR"})
+    gradient_accumulation_steps: Optional[int] = field(
+        default=1, metadata={"help": "the number of gradient accumulation steps"}
+    )
+parser = HfArgumentParser(ScriptArguments)
+script_args = parser.parse_args_into_dataclasses()[0]
+config = ORPOConfig(
+    output_dir=script_args.output_dir,
+    max_prompt_length=1024,
+    max_length=2048,
+    logging_steps=100,
+    save_strategy='no',
+    max_completion_length=2048,
+    per_device_train_batch_size=script_args.per_device_train_batch_size,
+    remove_unused_columns=False,
+    gradient_accumulation_steps=script_args.gradient_accumulation_steps,
+    learning_rate=script_args.learning_rate,
+    optim=script_args.optim,
+    lr_scheduler_type=script_args.lr_scheduler_type,
+    gradient_checkpointing=True,
+    gradient_checkpointing_kwargs={'use_reentrant':True},
+    evaluation_strategy='epoch',
+    beta=script_args.beta,
+    report_to='wandb',
+    num_train_epochs=script_args.num_train_epochs,
+    bf16=True,
+    do_eval=True
+)
+model = AutoModelForCausalLM.from_pretrained(script_args.model_name,
+                                             cache_dir=script_args.cache_dir,
+                                             attn_implementation='flash_attention_2',
+                                             torch_dtype=torch.bfloat16)
+tokenizer = AutoTokenizer.from_pretrained(script_args.model_name,
+                                          cache_dir=script_args.cache_dir)
+tokenizer.pad_token_id = tokenizer.eos_token_id
+tokenizer.chat_template = "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
+def build_dataset(tokenizer):
+    ds_train = load_dataset(script_args.data_name, split="train",
+                            cache_dir=script_args.cache_dir)
+    ds_test = load_dataset(script_args.data_name, split="test",
+                           cache_dir=script_args.cache_dir)
+    def chat_template_to_text(sample):
+        sample["chosen"] = [tokenizer.apply_chat_template(item_chosen, tokenize=False) for item_chosen in sample['chosen']]
+        sample["rejected"] = [tokenizer.apply_chat_template(item_rejected, tokenize=False) for item_rejected in sample['rejected']]
+        sample['prompt'] = [tokenizer.apply_chat_template([item[0]], tokenize=False, add_generation_prompt=True) for item in sample['chosen']]
+        return sample
+    ds_train = ds_train.map(chat_template_to_text, batched=True, num_proc=8)
+    ds_test = ds_test.map(chat_template_to_text, batched=True, num_proc=8)
+    return ds_train, ds_test
+train, test = build_dataset(tokenizer=tokenizer)
+trainer = ORPOTrainer(
+                model=model,
+                args=config,
+                tokenizer=tokenizer,
+                train_dataset=train,
+                eval_dataset=test
+            )
+trainer.train()