Text Classification
Transformers
Safetensors
English
bert
fill-mask
BERT
transformer
nlp
bert-lite
edge-ai
low-resource
micro-nlp
quantized
iot
wearable-ai
offline-assistant
intent-detection
real-time
smart-home
embedded-systems
command-classification
toy-robotics
voice-ai
eco-ai
english
lightweight
mobile-nlp
ner
on-device-nlp
privacy-first
cpu-inference
speech-intent
offline-nlp
tiny-bert
bert-variant
efficient-nlp
edge-ml
tiny-ml
aiot
embedded-nlp
low-latency
smart-devices
edge-inference
ml-on-microcontrollers
android-nlp
offline-chatbot
esp32-nlp
tflite-compatible
Update README.md
Browse files
README.md
CHANGED
@@ -72,7 +72,7 @@ library_name: transformers
|
|
72 |
|
73 |
## Overview
|
74 |
|
75 |
-
`BERT-Lite` is an **ultra-lightweight** NLP model derived from **google/
|
76 |
|
77 |
- **Model Name**: BERT-Lite
|
78 |
- **Size**: ~10MB (quantized)
|
@@ -312,82 +312,80 @@ To adapt BERT-Lite for custom IoT tasks (e.g., specific smart home commands):
|
|
312 |
1. **Prepare Dataset**: Collect labeled data (e.g., commands with intents or masked sentences).
|
313 |
2. **Fine-Tune with Hugging Face**:
|
314 |
```python
|
315 |
-
|
316 |
-
|
317 |
-
|
318 |
-
|
319 |
-
|
320 |
-
|
321 |
-
|
322 |
-
|
323 |
-
|
324 |
-
"
|
325 |
-
|
326 |
-
|
327 |
-
|
328 |
-
|
329 |
-
|
330 |
-
|
331 |
-
|
332 |
-
|
333 |
-
|
334 |
-
|
335 |
-
|
336 |
-
|
337 |
-
|
338 |
-
|
339 |
-
|
340 |
-
|
341 |
-
|
342 |
-
|
343 |
-
|
344 |
-
|
345 |
-
|
346 |
-
|
347 |
-
|
348 |
-
|
349 |
-
|
350 |
-
|
351 |
-
|
352 |
-
|
353 |
-
|
354 |
-
|
355 |
-
|
356 |
-
|
357 |
-
|
358 |
-
|
359 |
-
|
360 |
-
|
361 |
-
|
362 |
-
|
363 |
-
|
364 |
-
|
365 |
-
|
366 |
-
|
367 |
-
|
368 |
-
|
369 |
-
|
370 |
-
|
371 |
-
|
372 |
-
|
373 |
-
|
374 |
-
|
375 |
-
|
376 |
-
|
377 |
-
|
378 |
-
|
379 |
-
|
380 |
-
|
381 |
-
|
382 |
-
|
383 |
-
|
384 |
-
model
|
385 |
-
|
386 |
-
|
387 |
-
|
388 |
-
|
389 |
-
|
390 |
-
print(f"Predicted class for '{text}': {'✅ Valid IoT Command' if predicted_class == 1 else '❌ Invalid Command'}")
|
391 |
```
|
392 |
3. **Deploy**: Export the fine-tuned model to ONNX or TensorFlow Lite for edge devices.
|
393 |
|
|
|
72 |
|
73 |
## Overview
|
74 |
|
75 |
+
`BERT-Lite` is an **ultra-lightweight** NLP model derived from **google/bert_uncased_L-2_H-64_A-2**, optimized for **real-time inference** on **edge and IoT devices**. With a quantized size of **~10MB** and **~2M parameters**, it delivers efficient contextual language understanding for highly resource-constrained environments like microcontrollers, wearables, and smart home devices. Designed for **low-latency** and **offline operation**, BERT-Lite is perfect for privacy-first applications requiring intent detection, text classification, or semantic understanding with minimal connectivity.
|
76 |
|
77 |
- **Model Name**: BERT-Lite
|
78 |
- **Size**: ~10MB (quantized)
|
|
|
312 |
1. **Prepare Dataset**: Collect labeled data (e.g., commands with intents or masked sentences).
|
313 |
2. **Fine-Tune with Hugging Face**:
|
314 |
```python
|
315 |
+
import torch
|
316 |
+
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
|
317 |
+
from datasets import Dataset
|
318 |
+
import pandas as pd
|
319 |
+
|
320 |
+
# 1. Prepare the sample IoT dataset
|
321 |
+
data = {
|
322 |
+
"text": [
|
323 |
+
"Turn on the fan",
|
324 |
+
"Switch off the light",
|
325 |
+
"Invalid command",
|
326 |
+
"Activate the air conditioner",
|
327 |
+
"Turn off the heater",
|
328 |
+
"Gibberish input"
|
329 |
+
],
|
330 |
+
"label": [1, 1, 0, 1, 1, 0] # 1 = Valid command, 0 = Invalid
|
331 |
+
}
|
332 |
+
df = pd.DataFrame(data)
|
333 |
+
dataset = Dataset.from_pandas(df)
|
334 |
+
|
335 |
+
# 2. Load tokenizer and model
|
336 |
+
model_name = "boltuix/bert-lite" # Replace with any small/quantized BERT
|
337 |
+
tokenizer = BertTokenizer.from_pretrained(model_name)
|
338 |
+
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)
|
339 |
+
|
340 |
+
# 3. Tokenize the dataset
|
341 |
+
def tokenize_function(examples):
|
342 |
+
return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=64)
|
343 |
+
|
344 |
+
tokenized_dataset = dataset.map(tokenize_function, batched=True)
|
345 |
+
|
346 |
+
# 4. Manually convert columns to tensors (NumPy 2.0 safe)
|
347 |
+
tokenized_dataset = tokenized_dataset.map(lambda x: {
|
348 |
+
"input_ids": torch.tensor(x["input_ids"]),
|
349 |
+
"attention_mask": torch.tensor(x["attention_mask"]),
|
350 |
+
"label": torch.tensor(x["label"])
|
351 |
+
})
|
352 |
+
|
353 |
+
# 5. Define training arguments
|
354 |
+
training_args = TrainingArguments(
|
355 |
+
output_dir="./bert_lite_results",
|
356 |
+
num_train_epochs=5,
|
357 |
+
per_device_train_batch_size=2,
|
358 |
+
logging_dir="./bert_lite_logs",
|
359 |
+
logging_steps=10,
|
360 |
+
save_steps=100,
|
361 |
+
eval_strategy="no",
|
362 |
+
learning_rate=5e-5,
|
363 |
+
)
|
364 |
+
|
365 |
+
# 6. Initialize Trainer
|
366 |
+
trainer = Trainer(
|
367 |
+
model=model,
|
368 |
+
args=training_args,
|
369 |
+
train_dataset=tokenized_dataset,
|
370 |
+
)
|
371 |
+
|
372 |
+
# 7. Fine-tune the model
|
373 |
+
trainer.train()
|
374 |
+
|
375 |
+
# 8. Save the fine-tuned model
|
376 |
+
model.save_pretrained("./fine_tuned_bert_lite")
|
377 |
+
tokenizer.save_pretrained("./fine_tuned_bert_lite")
|
378 |
+
|
379 |
+
# 9. Inference example
|
380 |
+
text = "Turn on the light"
|
381 |
+
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
|
382 |
+
model.eval()
|
383 |
+
with torch.no_grad():
|
384 |
+
outputs = model(**inputs)
|
385 |
+
logits = outputs.logits
|
386 |
+
predicted_class = torch.argmax(logits, dim=1).item()
|
387 |
+
|
388 |
+
print(f"Predicted class for '{text}': {'✅ Valid IoT Command' if predicted_class == 1 else '❌ Invalid Command'}")
|
|
|
|
|
389 |
```
|
390 |
3. **Deploy**: Export the fine-tuned model to ONNX or TensorFlow Lite for edge devices.
|
391 |
|