mfajcik commited on
Commit
4a4e00a
•
1 Parent(s): ebb78cc

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +1 -1
content.py CHANGED
@@ -88,7 +88,7 @@ We use the following metrics for following tasks:
88
  On every task, for every metric we compute test for statistical significance at α=0.05, i.e., the probability that performance model A is equal to the performance model B is estimated to be less then 0.05.
89
  We use the following tests, with varying statistical power:
90
  - accuracy and exact-match: one-tailed paired t-test,
91
- - average area under the curve: bayesian test inspired with (Goutte et al., 2005)[https://link.springer.com/chapter/10.1007/978-3-540-31865-1_25],
92
  - summarization & perplexity: bootstrapping.
93
 
94
  ### Duel Scoring Mechanism, Win Score
 
88
  On every task, for every metric we compute test for statistical significance at α=0.05, i.e., the probability that performance model A is equal to the performance model B is estimated to be less then 0.05.
89
  We use the following tests, with varying statistical power:
90
  - accuracy and exact-match: one-tailed paired t-test,
91
+ - average area under the curve: bayesian test inspired with [Goutte et al., 2005](https://link.springer.com/chapter/10.1007/978-3-540-31865-1_25),
92
  - summarization & perplexity: bootstrapping.
93
 
94
  ### Duel Scoring Mechanism, Win Score