silanm
/

nlp-a5

@@ -1,74 +1,80 @@
----
-library_name: transformers
-license: mit
-base_model: gpt2
-tags:
-- trl
-- dpo
-- generated_from_trainer
-model-index:
-- name: results_orca_dpo_wandb
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# results_orca_dpo_wandb
-This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6150
-- Rewards/chosen: -0.2701
-- Rewards/rejected: -2.5585
-- Rewards/accuracies: 0.7940
-- Rewards/margins: 2.2885
-- Logps/rejected: -425.4867
-- Logps/chosen: -344.9728
-- Logits/rejected: -76.3682
-- Logits/chosen: -76.4329
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 4
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 8
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 50
-- training_steps: 200
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
-|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.9158        | 0.0346 | 50   | 0.7779          | -1.2673        | -3.1650          | 0.7319             | 1.8977          | -431.5514      | -354.9452    | -99.3161        | -98.3339      |
-| 1.2481        | 0.0691 | 100  | 0.9942          | -3.1400        | -6.5742          | 0.7368             | 3.4342          | -465.6436      | -373.6723    | -86.8154        | -86.6002      |
-| 0.6814        | 0.1037 | 150  | 0.7237          | -0.3674        | -2.6648          | 0.7635             | 2.2974          | -426.5488      | -345.9457    | -75.5469        | -75.8445      |
-| 0.6615        | 0.1382 | 200  | 0.6150          | -0.2701        | -2.5585          | 0.7940             | 2.2885          | -425.4867      | -344.9728    | -76.3682        | -76.4329      |
-### Framework versions
-- Transformers 4.45.0
-- Pytorch 2.4.0+cu124
-- Datasets 3.2.0
-- Tokenizers 0.20.3

+---
+library_name: transformers
+base_model: gpt2
+tags:
+- trl
+- dpo
+- generated_from_trainer
+model-index:
+- name: results_orca_dpo_wandb
+  results: []
+datasets:
+- argilla/distilabel-intel-orca-dpo-pairs
+language:
+- en
+metrics:
+- accuracy
+pipeline_tag: question-answering
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# results_orca_dpo_wandb
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.6150
+- Rewards/chosen: -0.2701
+- Rewards/rejected: -2.5585
+- Rewards/accuracies: 0.7940
+- Rewards/margins: 2.2885
+- Logps/rejected: -425.4867
+- Logps/chosen: -344.9728
+- Logits/rejected: -76.3682
+- Logits/chosen: -76.4329
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 4
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 8
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 50
+- training_steps: 200
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.9158        | 0.0346 | 50   | 0.7779          | -1.2673        | -3.1650          | 0.7319             | 1.8977          | -431.5514      | -354.9452    | -99.3161        | -98.3339      |
+| 1.2481        | 0.0691 | 100  | 0.9942          | -3.1400        | -6.5742          | 0.7368             | 3.4342          | -465.6436      | -373.6723    | -86.8154        | -86.6002      |
+| 0.6814        | 0.1037 | 150  | 0.7237          | -0.3674        | -2.6648          | 0.7635             | 2.2974          | -426.5488      | -345.9457    | -75.5469        | -75.8445      |
+| 0.6615        | 0.1382 | 200  | 0.6150          | -0.2701        | -2.5585          | 0.7940             | 2.2885          | -425.4867      | -344.9728    | -76.3682        | -76.4329      |
+### Framework versions
+- Transformers 4.45.0
+- Pytorch 2.4.0+cu124
+- Datasets 3.2.0
+- Tokenizers 0.20.3