apapagi commited on
Commit
1d351c2
·
1 Parent(s): 79a422d
Files changed (4) hide show
  1. README.md +79 -4
  2. mlb.pickle +2 -2
  3. model.safetensors +2 -2
  4. pytorch_model.bin +2 -2
README.md CHANGED
@@ -1,9 +1,22 @@
1
  ---
2
  license: eupl-1.2
3
- datasets:
4
- - EuropeanParliament/cellar_eurovoc
5
  language:
6
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  tags:
8
  - eurovoc
9
  pipeline_tag: text-classification
@@ -18,9 +31,69 @@ widget:
18
  [EuroVoc](https://op.europa.eu/fr/web/eu-vocabularies) is a large multidisciplinary multilingual (24 languages of 🇪🇺) hierarchical thesaurus of more than 7000 classes covering the activities of EU institutions.
19
  Given the number of legal documents produced every day and the huge mass of pre-existing documents to be classified high quality automated or semi-automated classification methods are most welcome in this domain.
20
 
21
- This model based on BERT Deep Neural Network was trained on more than 3.9 million documents to achieve that task and is used in a production environment via the huggingface inference endpoint.
22
  This model support the 24 languages of the European Union.
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ## Architecture
25
 
26
  ![architecture](architecture.png)
@@ -73,4 +146,6 @@ Default value, topk = 5 and threshold = 0.16
73
 
74
  ## Author(s)
75
 
76
- Andreas Papagiannis <andreas.papagiannis@europarl.europa.eu>
 
 
 
1
  ---
2
  license: eupl-1.2
 
 
3
  language:
4
  - en
5
+ metrics:
6
+ - type: f1
7
+ value: 0.8345
8
+ name: micro F1
9
+ args:
10
+ threshold: 0.46
11
+ - type: NDCG@3
12
+ value: 0.8819
13
+ name: NDCG@5
14
+ - type: NDCG@5
15
+ value: 0.8689
16
+ name: NDCG@5
17
+ - type: NDCG@10
18
+ value: 0.8780
19
+ name: NDCG@10
20
  tags:
21
  - eurovoc
22
  pipeline_tag: text-classification
 
31
  [EuroVoc](https://op.europa.eu/fr/web/eu-vocabularies) is a large multidisciplinary multilingual (24 languages of 🇪🇺) hierarchical thesaurus of more than 7000 classes covering the activities of EU institutions.
32
  Given the number of legal documents produced every day and the huge mass of pre-existing documents to be classified high quality automated or semi-automated classification methods are most welcome in this domain.
33
 
34
+ This model based on BERT Deep Neural Network was trained on more than 3, 200,000 documents to achieve that task and is used in a production environment via the huggingface inference endpoint.
35
  This model support the 24 languages of the European Union.
36
 
37
+
38
+ ## Examples
39
+
40
+ In English 🇬🇧 :
41
+
42
+ ```
43
+ text = "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities."
44
+
45
+
46
+ human rights 0.984
47
+ ethnic group 0.9743
48
+ Burma/Myanmar 0.9727
49
+ protection of minorities 0.9586
50
+ religious discrimination 0.6038
51
+ ethnic discrimination 0.5834
52
+ political violence 0.5828
53
+ ```
54
+
55
+ In French 🇫🇷:
56
+
57
+ ```
58
+ text = "En juillet 2023, la Commission a présenté un paquet de propositions pour l'écologisation du transport de marchandises. Parmi les trois propositions, l'une porte sur l'amélioration de l'utilisation des capacités de l'infrastructure ferroviaire. Le texte proposé comprend des modifications des règles relatives à la planification et à la répartition des capacités d'infrastructure ferroviaire, actuellement couvertes par la directive 2012/34/UE et le règlement (UE) n° 913/2010. L'objectif de ces modifications est de permettre une gestion plus efficace des capacités de l'infrastructure ferroviaire et du trafic, afin d'améliorer la qualité des services et d'optimiser l'utilisation du réseau ferroviaire, d'accueillir des volumes de trafic plus importants et de veiller à ce que le secteur des transports contribue à la décarbonisation."
59
+
60
+ transport infrastructure 0.998161256313324
61
+ rail network 0.9951391220092773
62
+ common transport policy 0.9791265726089478
63
+ transport market 0.9368429780006409
64
+ trans-European network 0.9098047614097595
65
+ high-speed transport 0.4887568950653076
66
+ carriage of goods 0.4874659776687622
67
+ ```
68
+
69
+ In German 🇩🇪:
70
+
71
+ ```
72
+ text = "Am 14. September 2022 schlug die Kommission eine Verordnung zum Verbot von Produkten, die unter Einsatz von Zwangsarbeit, einschließlich Kinderarbeit, hergestellt wurden, auf dem Binnenmarkt der Europäischen Union (EU) vor. Der Vorschlag bezieht sich auf alle Produkte, die auf dem EU-Markt angeboten werden, unabhängig davon, ob sie in der EU für den Inlandsverbrauch oder für die Ausfuhr hergestellt oder eingeführt werden. Er gilt für Produkte aller Art, einschließlich ihrer Bestandteile, aus allen Sektoren und Branchen. Die EU-Mitgliedstaaten wären für die Durchsetzung der Bestimmungen zuständig, und ihre nationalen Behörden könnten Produkte, die unter Einsatz von Zwangsarbeit hergestellt wurden, vom EU-Markt nehmen. Die Zollbehörden würden solche Produkte an den EU-Grenzen identifizieren und aufhalten. "
73
+
74
+ goods and services 0.9618138670921326
75
+ single market 0.9268659949302673
76
+ market approval 0.6425430774688721
77
+ export restriction 0.5231644511222839
78
+ EU Member State 0.4724983870983124
79
+ free movement of goods 0.38777536153793335
80
+ electronic commerce 0.31897953152656555
81
+ ```
82
+
83
+ In Bulgarian 🇧🇬:
84
+
85
+ ```
86
+ text = "В тази кратка бележка се обобщава проучването, в което се оценяват предизвикателствата, възможностите и средносрочните перспективи пред млечния сектор в ЕС в светлината на премахването на квотите за мляко. Проучването се фокусира върху структурните промени в сектора, динамиката на пазара на млечни продукти, необходимостта от екологична устойчивост и устойчивостта на селските райони. Разгледани са и специфичните проблеми на млечните региони в неравностойно положение. Докладът предлага политически препоръки за разглеждане от Европейския парламент с цел ефективно подпомагане на млечното животновъдство и поддържане на селските общности, като същевременно се отговори на изискванията за устойчивост на сектора."
87
+
88
+ reform of the CAP 0.38253700733184814
89
+ milk 0.35211247205734253
90
+ milk product 0.2761436402797699
91
+ agricultural quota 0.24940797686576843
92
+ dairy production 0.2132476419210434
93
+ EU Member State 0.09408465027809143
94
+ ```
95
+
96
+
97
  ## Architecture
98
 
99
  ![architecture](architecture.png)
 
146
 
147
  ## Author(s)
148
 
149
+ Sébastien Campion <sebastien.campion@europarl.europa.eu>
150
+
151
+ Andreas Papagiannis <[email protected]>
mlb.pickle CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9ef6c77d4be99dc73994099ea02207deca2449b7f4675464285fd41262146f49
3
- size 131
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f9fdfdfb5ed735f6ef28a11421ca69da05fa7aeafaf78ef33c7d4d332518b9e
3
+ size 127562
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e1638381e5fafe20ad1f06b8662a69d369fefdb435f6aecf61bb6ef8e5ed1780
3
- size 134
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0779659c4f086d712291b95c2789ecf6d65acb4d57bbf3dc84208226c3829f8
3
+ size 395976376
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d8d7dac5e88e6a751793812b04a026786e6f84c8ba2c20c9a1e3693ad8a5b65a
3
- size 134
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd5f1094cb7b7a719f0718e5f3a3aff5db427056b1ba1e76b3fbe8d1cfc89e53
3
+ size 396005425