Elizezen commited on
Commit
4c3d46c
1 Parent(s): 398ab8f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -3
README.md CHANGED
@@ -27,8 +27,9 @@ In extensive testing and benchmarks, SniffyOtter has proven to be an exceptional
27
 
28
  | Model | average | eroticism | complexity | contextual maintenance |
29
  | ----------------------------------- | --------- | --------- | ---------- | ---------------------- |
30
- | Antler-RP-ja-westlake-chatvector | **48.88** | 4.65 | **47.1** | **94.9** |
31
- | **SniffyOtter-7B** | 48.80 | **5.7** | 46.2 | 94.5 |
 
32
  | Antler-7B | 47.62 | 5.25 | 45.3 | 92.3 |
33
  | Nocturn-7B | 47.25 | 5.15 | 44.7 | 91.9 |
34
  | Sapphire-7B | 46.90 | 4.9 | 43.5 | 92.3 |
@@ -37,9 +38,13 @@ In extensive testing and benchmarks, SniffyOtter has proven to be an exceptional
37
  | chatntq-ja-7b-v1.0 | 45.12 | 2.55 | 41.4 | 91.4 |
38
  | Calm2-7B-Chat | 45.07 | 3.4 | 40.2 | 91.6 |
39
 
 
 
 
 
40
  **Benchmark Metrics:**
41
  - Eroticism: Measures the frequency of erotic words in the generated text. Calculated using a predefined set of words considered erotic.
42
  - Complexity: Evaluates the model's ability to produce non-repetitive responses. Higher scores indicate more diverse and less repetitive text, calculated using zlib.compress, which I find effective at detecting significantly repetitive texts.
43
  - Context Maintenance: Assesses how well the model maintains the given topic. Responses that stray from the context result in lower scores. Calculated using japanese-reranker-cross-encoder-large-v1 to measure relevance between the input and the generated response.
44
 
45
- *Note: While the benchmark provides some insights, it is important to consider that the specific set of erotic words and the undisclosed details of the benchmark may introduce biases. Therefore, it is recommended to take this result with a grain of salt for now.*
 
27
 
28
  | Model | average | eroticism | complexity | contextual maintenance |
29
  | ----------------------------------- | --------- | --------- | ---------- | ---------------------- |
30
+ | Antler-RP-ja-westlake-chatvector | 49.17 | 5.5 | 47.1 | 94.9 |
31
+ | **SniffyOtter-7B** | 48.80 | 5.7 | 46.2 | 94.5 |
32
+ | Sabbath-7B | 48.10 | 4.8 | 45.8 | 93.7 |
33
  | Antler-7B | 47.62 | 5.25 | 45.3 | 92.3 |
34
  | Nocturn-7B | 47.25 | 5.15 | 44.7 | 91.9 |
35
  | Sapphire-7B | 46.90 | 4.9 | 43.5 | 92.3 |
 
38
  | chatntq-ja-7b-v1.0 | 45.12 | 2.55 | 41.4 | 91.4 |
39
  | Calm2-7B-Chat | 45.07 | 3.4 | 40.2 | 91.6 |
40
 
41
+ Eroticism: Frequency of erotic
42
+
43
+ *tested in 8bit version because of lack of GPU memory
44
+
45
  **Benchmark Metrics:**
46
  - Eroticism: Measures the frequency of erotic words in the generated text. Calculated using a predefined set of words considered erotic.
47
  - Complexity: Evaluates the model's ability to produce non-repetitive responses. Higher scores indicate more diverse and less repetitive text, calculated using zlib.compress, which I find effective at detecting significantly repetitive texts.
48
  - Context Maintenance: Assesses how well the model maintains the given topic. Responses that stray from the context result in lower scores. Calculated using japanese-reranker-cross-encoder-large-v1 to measure relevance between the input and the generated response.
49
 
50
+ The benchmark is a refined version of what I used in [Sapphire7B](https://huggingface.co/Elizezen/Sapphire-7B). *While it provides some insights, it is important to consider that the specific set of erotic words and the undisclosed details of the benchmark may introduce biases. Therefore, it is recommended to take this result with a grain of salt for now.*