File size: 4,162 Bytes
3018615 f77e039 3018615 9a68f0b 3018615 fd0b4db 3018615 1284c23 3018615 91dbdef |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
---
tags:
- llama
- alpaca
- vicuna
- uncensored
- merge
- mix
- airoboros
- openorca
- orcamini
- orca
- instruct
- mixtune
datasets:
- Open-Orca/OpenOrca
- anon8231489123/ShareGPT_Vicuna_unfiltered
- jondurbin/airoboros-uncensored
language:
- en
metrics:
- accuracy
pipeline_tag: text-generation
---
## 13B-Ouroboros
Ouroboros is an experimental model based on Meta's LLaMA [v1] 13B base model using a custom merging technique, tweaking
each layer's merge % based on internal tests against the PTB dataset, scoring ~26.31 according to internal evaluation
(6 samples, sequence length 1024; this testing is not empirical, it's a quick way to find near-optimum values). Testing,
evaluating, and remixing this model is absolutely permissible and even encouraged (within the bounds of Meta's LLaMAv1
license agreement); the more feedback the better we can tune our process! 😊
## Composition:
Ouroboros is comprised of 40 layers [LLaMAv1 13B standard] mixed at optimized
ratios VS the PTB dataset for lowest perplexity score. Listed below are the
paired models and ratios merged per layer.
Tier One Merge:
13B-airoboros-gpt4-1.4 > 13B-orca_mini_v2
[0.22, 0.85, 0.89, 0.98, 0.3, 0.41, 0.71, 0.83, 0.32, 0.1, 0.44, 0.6, 0.53, 0.15, 0.86, 0.79, 0.93, 0.02, 0.19, 0.82, 0.01, 0.52, 0.07, 0.27, 0.73, 0.86, 0.08, 0.67, 0.42, 0.28, 0.37, 0.08, 0.95, 0.68, 0.45, 0.08, 0.7, 0.93, 0.96, 0.43]
13B-gpt4-x-alpaca > 13B-Vicuna-cocktail
[0.65, 0.94, 0.98, 0.87, 0.28, 0.64, 0.73, 0.7, 0.95, 0.89, 0.84, 0.9, 0.59, 0.92, 0.28, 0.61, 0.88, 0.73, 0.34, 0.85, 0.98, 0.05, 0.74, 0.92, 0.5, 0.78, 0.26, 0.4, 0.27, 0.65, 0.71, 0.7, 0.8, 0.93, 0.36, 0.03, 0.45, 0.39, 0.77, 0.06]
Tier Two Merge:
[13B-airoboros-gpt4-1.4 + 13B-orca_mini_v2] offspring > [13B-gpt4-x-alpaca + 13B-Vicuna-cocktail] offspring
[0.2, 0.83, 0.24, 0.03, 0.37, 0.62, 0.02, 0.82, 0.65, 0.63, 0.45, 0.65, 0.48, 0.45, 0.24, 0.76, 0.06, 0.31, 0.45, 0.86, 0.23, 0.99, 0.93, 0.84, 0.96, 0.53, 0.95, 0.32, 0.19, 0.06, 0.4, 0.08, 0.62, 0.4, 0.26, 0.12, 0.16, 0.91, 0.14, 0.0]
Result:
13B-Ouroboros, a model that seems uncensored and highly competent. So far only Alpaca instruction prompting has been tested and seems to work solidly well.
## Use:
Alpaca's instruct format can be used to do many things, including control of the terms of behavior
between a user and a response from an agent in chat. Below is an example of a command injected into
memory.
```
### Instruction:
Make Narrator function as a text based adventure game that responds with verbose, detailed, and creative descriptions of what happens next after Player's response.
Make Player function as the player input for Narrator's text based adventure game, controlling a character named (insert character name here, their short bio, and
whatever quest or other information to keep consistent in the interaction).
### Response:
{an empty new line here}
```
## Language Models Used Credits:
13B-airoboros-gpt4-1.4 by jondurbin
https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.4
13B-orca_mini_v2 by psmathur
https://huggingface.co/psmathur/orca_mini_v2_13b
13B-gpt4-x-alpaca by chavinlo
https://huggingface.co/chavinlo/gpt4-x-alpaca
13B-Vicuna-cocktail by reeducator
https://huggingface.co/reeducator/vicuna-13b-cocktail
Also thanks to Meta for LLaMA.
Each model was hand picked and considered for what it could contribute to this ensemble.
Thanks to each and every one of you for your incredible work developing some of the best things
to come out of this community.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_CalderaAI__13B-Ouroboros)
| Metric | Value |
|-----------------------|---------------------------|
| Avg. | 44.66 |
| ARC (25-shot) | 57.42 |
| HellaSwag (10-shot) | 82.11 |
| MMLU (5-shot) | 51.43 |
| TruthfulQA (0-shot) | 47.99 |
| Winogrande (5-shot) | 57.85 |
| GSM8K (5-shot) | 0.45 |
| DROP (3-shot) | 15.36 |
|