drvenabili
commited on
Commit
•
2a9d5b4
1
Parent(s):
47be8a1
Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,25 @@
|
|
1 |
---
|
2 |
widget:
|
3 |
- text: Simon dog i <mask> i går.
|
4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
widget:
|
3 |
- text: Simon dog i <mask> i går.
|
4 |
+
license: mit
|
5 |
+
datasets:
|
6 |
+
- ChangeIsKey/kubhist2
|
7 |
+
language:
|
8 |
+
- sv
|
9 |
+
library_name: transformers
|
10 |
+
---
|
11 |
+
|
12 |
+
This is a roberta model trained on kubhist2 (https://spraakbanken.gu.se/blogg/index.php/2019/09/15/the-kubhist-corpus-of-swedish-newspapers/). For a HF version of kubhist2, see here: https://huggingface.co/datasets/ChangeIsKey/kubhist2
|
13 |
+
|
14 |
+
This is a work in progress, the quality of the model -- just like the quality of the training data -- is far from great.
|
15 |
+
|
16 |
+
Shared here with no guarantee whatsoever, will likely change, use at your own risk, etc.
|
17 |
+
|
18 |
+
### Discussion of Biases
|
19 |
+
This is trained on historical data. As such, outdated views might be present in the data.
|
20 |
+
|
21 |
+
### Other Known Limitations
|
22 |
+
The data comes from an OCR process. The text is thus not perfect, especially so in the earlier decades.
|
23 |
+
|
24 |
+
### Contact
|
25 |
+
Simon Hengchen
|