frhew
/

sigdial_ft_a2

@@ -112,14 +112,15 @@ A sample of the data used to train our model can be found [here](https://github.
 ## Evaluation
-More details on the evaluation can be found in the paper.
-| \textbf{Model} | \textbf{Prompt} | \textbf{Test set} | \textbf{SARI $\uparrow$} | \textbf{FRE $\uparrow$} | \textbf{M.P. $\uparrow$} | \textbf{S $\uparrow$} | \textbf{C $\uparrow$} | \textbf{F $\uparrow$} | \textbf{Avg. $\uparrow$} |
-|----------------|-----------------|-------------------|--------------------------|-------------------------|--------------------------|-----------------------|-----------------------|-----------------------|--------------------------|
-| Baseline       | Basic$^{A2}$    | A2                | 41.2                     | 59.4                    | \textbf{.89}             | .38                   | \textbf{.96}          | \textbf{.84}          | \textbf{.77}             |
-| FT-A2          | Basic$^{A2}$    | A2                | \textbf{44.0}            | \textbf{70.6}           | .49                      | \textbf{.82}          | .56                   | .64                   | .63                      |
-| Baseline       | Basic$^{B1}$    | B1                | 42.3                     | 56.8                    | \textbf{.85}             | .4                    | \textbf{.9}           | \textbf{.9}           | \textbf{.76}             |
-| FT-B1          | Basic$^{B1}$    | B1                | \textbf{42.4}            | \textbf{60.0}           | .75                      | \textbf{.55}          | .6                    | .75                   | .66                      |
 #### Summary

 ## Evaluation
+The right hand side shows the results of the manual evaluation, done on the outputs from each model for 35 texts. M.P. stands for meaning preservation, S for simplification, C for coherence, F for factuality; the score represents the percentage of `yes' answers.
+More details on the evaluation can be found in the paper. For all metrics, higher is better.
+| **Model** | **Prompt** | **Test set** | **SARI** | **FRE** | **M.P.** | **S** | **C** | **F** | **Avg.** |
+|--------------------|---------------------|-----------------------|------------------------------|-----------------------------|------------------------------|---------------------------|---------------------------|---------------------------|------------------------------|
+| Baseline           | Basic        | A2                    | 41.2                         | 59.4                        | .89                 | .38                       | .96             | .84             | .77                |
+| FT-A2              | Basic        | A2                    | 44.0               | 70.6               | .49                          | .82              | .56                       | .64                       | .63                          |
+| Baseline           | Basic        | B1                    | 42.3                         | 56.8                        | .85                | .4                        | .9               | .9               | .76                 |
+| FT-B1              | Basic        | B1                    | 42.4                | 60.0               | .75                          | .55             | .6                        | .75                       | .66                          |
 #### Summary