ntgiaky commited on
Commit
9888d64
·
2 Parent(s): 596cb5f e288dc3

Merge branch 'main' of https://huggingface.co/ntgiaky/phobert-ner-smart-home

Browse files
Files changed (1) hide show
  1. README.md +254 -0
README.md ADDED
@@ -0,0 +1,254 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: vi
3
+ tags:
4
+ - ner
5
+ - named-entity-recognition
6
+ - slot-filling
7
+ - smart-home
8
+ - vietnamese
9
+ - phobert
10
+ - token-classification
11
+ license: mit
12
+ datasets:
13
+ - custom-vn-slu-augmented
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ - precision
18
+ - recall
19
+ model-index:
20
+ - name: PhoBERT NER for Vietnamese Smart Home Slot Filling
21
+ results:
22
+ - task:
23
+ type: token-classification
24
+ name: Named Entity Recognition
25
+ dataset:
26
+ name: VN-SLU Augmented Dataset
27
+ type: custom
28
+ metrics:
29
+ - type: accuracy
30
+ value: 96.64
31
+ name: Accuracy
32
+ - type: f1
33
+ value: 86.55
34
+ name: F1 Score (Weighted)
35
+ - type: f1
36
+ value: 67.04
37
+ name: F1 Score (Macro)
38
+ widget:
39
+ - text: "bật đèn phòng khách"
40
+ - text: "tắt quạt phòng ngủ lúc 10 giờ tối"
41
+ - text: "điều chỉnh nhiệt độ điều hòa 25 độ"
42
+ - text: "mở cửa garage sau 5 phút"
43
+ ---
44
+
45
+ # PhoBERT Fine-tuned for Vietnamese Smart Home NER/Slot Filling
46
+
47
+ This model is a fine-tuned version of [vinai/phobert-base](https://huggingface.co/vinai/phobert-base) for Named Entity Recognition (NER) in Vietnamese smart home commands. It extracts slot values such as devices, locations, times, and numeric values from user commands.
48
+
49
+ ## Model Description
50
+
51
+ - **Base Model**: vinai/phobert-base
52
+ - **Task**: Token Classification / Slot Filling for Smart Home Commands
53
+ - **Language**: Vietnamese
54
+ - **Training Data**: VN-SLU Augmented Dataset (4,000 training samples)
55
+ - **Number of Entity Types**: 13
56
+
57
+ ## Intended Uses & Limitations
58
+
59
+ ### Intended Uses
60
+ - Extracting entities from Vietnamese smart home voice commands
61
+ - Slot filling for voice assistant systems
62
+ - Integration with intent classification for complete NLU pipeline
63
+ - Research in Vietnamese NLP for IoT applications
64
+
65
+ ### Limitations
66
+ - Optimized specifically for smart home domain
67
+ - May not generalize well to other domains
68
+ - Trained on Vietnamese language only
69
+ - Best performance when used with corresponding intent classifier
70
+
71
+ ## Entity Types (Slot Labels)
72
+
73
+ The model recognizes 13 types of entities:
74
+
75
+ 1. `B-device` / `I-device` - Device names (e.g., "đèn", "quạt", "điều hòa")
76
+ 2. `B-living_space` / `I-living_space` - Room/location names (e.g., "phòng khách", "phòng ngủ")
77
+ 3. `B-time_at` / `I-time_at` - Specific times (e.g., "10 giờ tối", "7 giờ sáng")
78
+ 4. `B-duration` / `I-duration` - Time durations (e.g., "5 phút", "2 giờ")
79
+ 5. `B-target_number` / `I-target_number` - Target values (e.g., "25 độ", "50%")
80
+ 6. `B-changing_value` / `I-changing_value` - Change amounts (e.g., "tăng 10%")
81
+ 7. `O` - Outside/No entity
82
+
83
+ ## How to Use
84
+
85
+ ### Using Transformers Library
86
+
87
+ ```python
88
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
89
+ import torch
90
+ import json
91
+
92
+ # Load model and tokenizer
93
+ model_name = "ntgiaky/phobert-ner-smart-home"
94
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
95
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
96
+
97
+ # Load label mappings
98
+ with open('label_mappings.json', 'r') as f:
99
+ label_mappings = json.load(f)
100
+ id2label = {int(k): v for k, v in label_mappings['id2label'].items()}
101
+
102
+ def extract_entities(text):
103
+ # Tokenize
104
+ inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
105
+ tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
106
+
107
+ # Predict
108
+ with torch.no_grad():
109
+ outputs = model(**inputs)
110
+ predictions = torch.argmax(outputs.logits, dim=2)
111
+
112
+ # Extract entities
113
+ entities = []
114
+ current_entity = None
115
+ current_tokens = []
116
+
117
+ for token, pred_id in zip(tokens, predictions[0]):
118
+ label = id2label[pred_id.item()]
119
+
120
+ if label.startswith('B-'):
121
+ # Save previous entity if exists
122
+ if current_entity:
123
+ entities.append({
124
+ 'type': current_entity,
125
+ 'text': tokenizer.convert_tokens_to_string(current_tokens)
126
+ })
127
+ # Start new entity
128
+ current_entity = label[2:]
129
+ current_tokens = [token]
130
+ elif label.startswith('I-') and current_entity == label[2:]:
131
+ # Continue current entity
132
+ current_tokens.append(token)
133
+ else:
134
+ # End current entity
135
+ if current_entity:
136
+ entities.append({
137
+ 'type': current_entity,
138
+ 'text': tokenizer.convert_tokens_to_string(current_tokens)
139
+ })
140
+ current_entity = None
141
+ current_tokens = []
142
+
143
+ # Don't forget last entity
144
+ if current_entity:
145
+ entities.append({
146
+ 'type': current_entity,
147
+ 'text': tokenizer.convert_tokens_to_string(current_tokens)
148
+ })
149
+
150
+ return entities
151
+
152
+ # Example usage
153
+ text = "bật đèn phòng khách lúc 7 giờ tối"
154
+ entities = extract_entities(text)
155
+ print(f"Input: {text}")
156
+ print(f"Entities: {entities}")
157
+ ```
158
+
159
+ ### Using Pipeline
160
+
161
+ ```python
162
+ from transformers import pipeline
163
+
164
+ # Load NER pipeline
165
+ ner = pipeline(
166
+ "token-classification",
167
+ model="ntgiaky/phobert-ner-smart-home",
168
+ aggregation_strategy="simple"
169
+ )
170
+
171
+ # Extract entities
172
+ result = ner("tắt quạt phòng ngủ sau 10 phút")
173
+ print(result)
174
+ ```
175
+
176
+ ## Integration with Intent Classification
177
+
178
+ For a complete NLU pipeline:
179
+
180
+ ```python
181
+ from transformers import pipeline
182
+
183
+ # Load both models
184
+ intent_classifier = pipeline("text-classification", model="ntgiaky/phobert-intent-classifier-smart-home")
185
+ ner = pipeline("token-classification", model="ntgiaky/phobert-ner-smart-home", aggregation_strategy="simple")
186
+
187
+ def process_command(text):
188
+ # Get intent
189
+ intent_result = intent_classifier(text)
190
+ intent = intent_result[0]['label']
191
+
192
+ # Get entities
193
+ entities = ner(text)
194
+
195
+ # Combine results
196
+ return {
197
+ 'text': text,
198
+ 'intent': intent,
199
+ 'entities': entities
200
+ }
201
+
202
+ # Example
203
+ command = "điều chỉnh nhiệt độ điều hòa 25 độ"
204
+ result = process_command(command)
205
+ print(result)
206
+ ```
207
+
208
+ ## Example Outputs
209
+
210
+ ```python
211
+ # Input: "bật đèn phòng khách"
212
+ # Entities: [
213
+ # {'type': 'device', 'text': 'đèn'},
214
+ # {'type': 'living_space', 'text': 'phòng khách'}
215
+ # ]
216
+
217
+ # Input: "tắt quạt phòng ngủ lúc 10 giờ tối"
218
+ # Entities: [
219
+ # {'type': 'device', 'text': 'quạt'},
220
+ # {'type': 'living_space', 'text': 'phòng ngủ'},
221
+ # {'type': 'time_at', 'text': '10 giờ tối'}
222
+ # ]
223
+
224
+ # Input: "điều chỉnh nhiệt độ điều hòa 25 độ"
225
+ # Entities: [
226
+ # {'type': 'device', 'text': 'điều hòa'},
227
+ # {'type': 'target_number', 'text': '25 độ'}
228
+ # ]
229
+ ```
230
+
231
+ ## Citation
232
+
233
+ If you use this model, please cite:
234
+
235
+ ```bibtex
236
+ @misc{phobert-ner-smart-home-2025,
237
+ author = {Trần Quang Huy and Nguyễn Trần Gia Kỳ},
238
+ title = {PhoBERT Fine-tuned for Vietnamese Smart Home NER},
239
+ year = {2025},
240
+ publisher = {Hugging Face},
241
+ journal = {Hugging Face Model Hub},
242
+ howpublished = {\url{https://huggingface.co/ntgiaky/phobert-ner-smart-home}}
243
+ }
244
+ ```
245
+
246
+ ## Authors
247
+
248
+ - **Trần Quang Huy**
249
+ - **Nguyễn Trần Gia Kỳ**
250
+ - **Advisor**: TS. Đoàn Duy
251
+
252
+ ## License
253
+
254
+ This model is released under the MIT License.