boltuix commited on
Commit
7fff39f
·
verified ·
1 Parent(s): 92bb737

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -77
README.md CHANGED
@@ -72,7 +72,7 @@ library_name: transformers
72
 
73
  ## Overview
74
 
75
- `BERT-Lite` is an **ultra-lightweight** NLP model derived from **google/bert-base-uncased**, optimized for **real-time inference** on **edge and IoT devices**. With a quantized size of **~10MB** and **~2M parameters**, it delivers efficient contextual language understanding for highly resource-constrained environments like microcontrollers, wearables, and smart home devices. Designed for **low-latency** and **offline operation**, BERT-Lite is perfect for privacy-first applications requiring intent detection, text classification, or semantic understanding with minimal connectivity.
76
 
77
  - **Model Name**: BERT-Lite
78
  - **Size**: ~10MB (quantized)
@@ -312,82 +312,80 @@ To adapt BERT-Lite for custom IoT tasks (e.g., specific smart home commands):
312
  1. **Prepare Dataset**: Collect labeled data (e.g., commands with intents or masked sentences).
313
  2. **Fine-Tune with Hugging Face**:
314
  ```python
315
- # !pip install transformers datasets torch --upgrade
316
-
317
- import torch
318
- from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
319
- from datasets import Dataset
320
- import pandas as pd
321
-
322
- # 1. Prepare the sample IoT dataset
323
- data = {
324
- "text": [
325
- "Turn on the fan",
326
- "Switch off the light",
327
- "Invalid command",
328
- "Activate the air conditioner",
329
- "Turn off the heater",
330
- "Gibberish input"
331
- ],
332
- "label": [1, 1, 0, 1, 1, 0] # 1 = Valid command, 0 = Invalid
333
- }
334
- df = pd.DataFrame(data)
335
- dataset = Dataset.from_pandas(df)
336
-
337
- # 2. Load tokenizer and model
338
- model_name = "boltuix/bert-lite" # Replace with any small/quantized BERT
339
- tokenizer = BertTokenizer.from_pretrained(model_name)
340
- model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)
341
-
342
- # 3. Tokenize the dataset
343
- def tokenize_function(examples):
344
- return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=64)
345
-
346
- tokenized_dataset = dataset.map(tokenize_function, batched=True)
347
-
348
- # 4. Manually convert columns to tensors (NumPy 2.0 safe)
349
- tokenized_dataset = tokenized_dataset.map(lambda x: {
350
- "input_ids": torch.tensor(x["input_ids"]),
351
- "attention_mask": torch.tensor(x["attention_mask"]),
352
- "label": torch.tensor(x["label"])
353
- })
354
-
355
- # 5. Define training arguments
356
- training_args = TrainingArguments(
357
- output_dir="./bert_lite_results",
358
- num_train_epochs=5,
359
- per_device_train_batch_size=2,
360
- logging_dir="./bert_lite_logs",
361
- logging_steps=10,
362
- save_steps=100,
363
- eval_strategy="no",
364
- learning_rate=5e-5,
365
- )
366
-
367
- # 6. Initialize Trainer
368
- trainer = Trainer(
369
- model=model,
370
- args=training_args,
371
- train_dataset=tokenized_dataset,
372
- )
373
-
374
- # 7. Fine-tune the model
375
- trainer.train()
376
-
377
- # 8. Save the fine-tuned model
378
- model.save_pretrained("./fine_tuned_bert_lite")
379
- tokenizer.save_pretrained("./fine_tuned_bert_lite")
380
-
381
- # 9. Inference example
382
- text = "Turn on the light"
383
- inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
384
- model.eval()
385
- with torch.no_grad():
386
- outputs = model(**inputs)
387
- logits = outputs.logits
388
- predicted_class = torch.argmax(logits, dim=1).item()
389
-
390
- print(f"Predicted class for '{text}': {'✅ Valid IoT Command' if predicted_class == 1 else '❌ Invalid Command'}")
391
  ```
392
  3. **Deploy**: Export the fine-tuned model to ONNX or TensorFlow Lite for edge devices.
393
 
 
72
 
73
  ## Overview
74
 
75
+ `BERT-Lite` is an **ultra-lightweight** NLP model derived from **google/bert_uncased_L-2_H-64_A-2**, optimized for **real-time inference** on **edge and IoT devices**. With a quantized size of **~10MB** and **~2M parameters**, it delivers efficient contextual language understanding for highly resource-constrained environments like microcontrollers, wearables, and smart home devices. Designed for **low-latency** and **offline operation**, BERT-Lite is perfect for privacy-first applications requiring intent detection, text classification, or semantic understanding with minimal connectivity.
76
 
77
  - **Model Name**: BERT-Lite
78
  - **Size**: ~10MB (quantized)
 
312
  1. **Prepare Dataset**: Collect labeled data (e.g., commands with intents or masked sentences).
313
  2. **Fine-Tune with Hugging Face**:
314
  ```python
315
+ import torch
316
+ from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
317
+ from datasets import Dataset
318
+ import pandas as pd
319
+
320
+ # 1. Prepare the sample IoT dataset
321
+ data = {
322
+ "text": [
323
+ "Turn on the fan",
324
+ "Switch off the light",
325
+ "Invalid command",
326
+ "Activate the air conditioner",
327
+ "Turn off the heater",
328
+ "Gibberish input"
329
+ ],
330
+ "label": [1, 1, 0, 1, 1, 0] # 1 = Valid command, 0 = Invalid
331
+ }
332
+ df = pd.DataFrame(data)
333
+ dataset = Dataset.from_pandas(df)
334
+
335
+ # 2. Load tokenizer and model
336
+ model_name = "boltuix/bert-lite" # Replace with any small/quantized BERT
337
+ tokenizer = BertTokenizer.from_pretrained(model_name)
338
+ model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)
339
+
340
+ # 3. Tokenize the dataset
341
+ def tokenize_function(examples):
342
+ return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=64)
343
+
344
+ tokenized_dataset = dataset.map(tokenize_function, batched=True)
345
+
346
+ # 4. Manually convert columns to tensors (NumPy 2.0 safe)
347
+ tokenized_dataset = tokenized_dataset.map(lambda x: {
348
+ "input_ids": torch.tensor(x["input_ids"]),
349
+ "attention_mask": torch.tensor(x["attention_mask"]),
350
+ "label": torch.tensor(x["label"])
351
+ })
352
+
353
+ # 5. Define training arguments
354
+ training_args = TrainingArguments(
355
+ output_dir="./bert_lite_results",
356
+ num_train_epochs=5,
357
+ per_device_train_batch_size=2,
358
+ logging_dir="./bert_lite_logs",
359
+ logging_steps=10,
360
+ save_steps=100,
361
+ eval_strategy="no",
362
+ learning_rate=5e-5,
363
+ )
364
+
365
+ # 6. Initialize Trainer
366
+ trainer = Trainer(
367
+ model=model,
368
+ args=training_args,
369
+ train_dataset=tokenized_dataset,
370
+ )
371
+
372
+ # 7. Fine-tune the model
373
+ trainer.train()
374
+
375
+ # 8. Save the fine-tuned model
376
+ model.save_pretrained("./fine_tuned_bert_lite")
377
+ tokenizer.save_pretrained("./fine_tuned_bert_lite")
378
+
379
+ # 9. Inference example
380
+ text = "Turn on the light"
381
+ inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
382
+ model.eval()
383
+ with torch.no_grad():
384
+ outputs = model(**inputs)
385
+ logits = outputs.logits
386
+ predicted_class = torch.argmax(logits, dim=1).item()
387
+
388
+ print(f"Predicted class for '{text}': {'✅ Valid IoT Command' if predicted_class == 1 else '❌ Invalid Command'}")
 
 
389
  ```
390
  3. **Deploy**: Export the fine-tuned model to ONNX or TensorFlow Lite for edge devices.
391