added eval
Browse files
README.md
CHANGED
@@ -112,14 +112,15 @@ A sample of the data used to train our model can be found [here](https://github.
|
|
112 |
|
113 |
## Evaluation
|
114 |
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
-
|
|
121 |
-
|
|
122 |
-
|
|
|
|
123 |
|
124 |
|
125 |
#### Summary
|
|
|
112 |
|
113 |
## Evaluation
|
114 |
|
115 |
+
The right hand side shows the results of the manual evaluation, done on the outputs from each model for 35 texts. M.P. stands for meaning preservation, S for simplification, C for coherence, F for factuality; the score represents the percentage of `yes' answers.
|
116 |
+
More details on the evaluation can be found in the paper. For all metrics, higher is better.
|
117 |
+
|
118 |
+
| **Model** | **Prompt** | **Test set** | **SARI** | **FRE** | **M.P.** | **S** | **C** | **F** | **Avg.** |
|
119 |
+
|--------------------|---------------------|-----------------------|------------------------------|-----------------------------|------------------------------|---------------------------|---------------------------|---------------------------|------------------------------|
|
120 |
+
| Baseline | Basic | A2 | 41.2 | 59.4 | .89 | .38 | .96 | .84 | .77 |
|
121 |
+
| FT-A2 | Basic | A2 | 44.0 | 70.6 | .49 | .82 | .56 | .64 | .63 |
|
122 |
+
| Baseline | Basic | B1 | 42.3 | 56.8 | .85 | .4 | .9 | .9 | .76 |
|
123 |
+
| FT-B1 | Basic | B1 | 42.4 | 60.0 | .75 | .55 | .6 | .75 | .66 |
|
124 |
|
125 |
|
126 |
#### Summary
|