Model Card: AI-Belha Classifier
Inference Demo
🐝 Try it out here → AI-Belha-Classifier (Live Demo)
This demo uses a YAMNet + Fine-Tuned classifier to detect Queen Bee Status from beehive recordings.
Model Description
The AI-Belha Classifier is a fine-tuned neural network based on YAMNet, adapted for acoustic monitoring of queen bee status inside beehives. Through analysis of hive audio signals, the model classifies each clip into one of four predefined bee queen states:
- Queen not present
- Queen present and newly accepted
- Queen present and rejected
- Queen present (original queen)
This model extends the original YAMNet framework through transfer learning to address tasks specific to smart apiculture and automated hive monitoring.
Key Features
- Extension of YAMNet with custom bee-specific labels
- Raw waveform input processing with per-frame class prediction outputs
- Softmax probability scores for prediction interpretation
- Modular architecture suitable for transfer learning and fine-tuning
Dataset and Preprocessing
The model was trained on the Smart Bee Colony Monitor: Clips of Beehive Sounds dataset from Kaggle (https://www.kaggle.com/datasets/annajyang/beehive-sounds). The dataset comprises approximately 7,100 labeled audio samples, each corresponding to one of the four bee queen states.
Data Preparation Process
Manual Annotation: Audio clips received labels with queen bee states based on domain expertise.
Feature Extraction:
- Conversion of audio to log-mel spectrograms
- Analysis of key audio features in time and frequency domains
Spectrogram Processing:
- Application of Short-Time Fourier Transform (STFT) with Hann window
- Implementation of Mel filter banks to correspond with human auditory perception
- Logarithmic compression for dynamic range reduction
Patch Creation:
- Division of spectrograms into fixed-size patches with tf.signal.frame
- Utilization of patches as individual classification units
Input Standardization:
- Zero-padding of audio clips shorter than required length
- This standardization ensures:
- Minimum of one complete spectrogram patch per clip
- Standard format for batch processing and model input requirements
Model Architecture
The AI-Belha Classifier utilizes YAMNet, a MobileNet-style convolutional neural network optimized for audio classification with depthwise separable convolutions. While the original YAMNet was trained on AudioSet (521 classes), this implementation replaces the classifier head and fine-tunes it for bee-specific states.
Core Structure (YAMNet Backbone)
- 1 initial Conv2D layer
- 13 Depthwise Separable Convolution layers
- Global Average Pooling for time-frequency feature aggregation
- Final Dense Layer for class logit projection (customized for transfer learning)
Transfer Learning Head
Construction on YAMNet embeddings:
- Input: 1024-dimensional embedding vector per audio patch
- First Hidden Layer:
- 1024 units
- ReLU activation
- L2 regularization
- Batch normalization
- Dropout
- Additional Layers (optional):
- Dynamic size scaling
- ReLU + BatchNorm + Dropout
- Output Layer:
- Softmax activation
- 4 units (bee class count)
Training Process
The training procedure utilized YAMNet embeddings as fixed features with a fully connected neural network head for classification.
Process Steps
Feature Extraction: Transformation of audio waveforms to embeddings via pretrained YAMNet.
Dataset Preparation:
- Extraction of embeddings and labels per audio file
- Treatment of each embedding (per patch) as an individual training sample
Label Encoding:
- Storage of class names and encoding with one-hot vectors via LabelBinarizer
Model Configuration:
- Loss: Categorical Crossentropy
- Optimizer: Adam
- Metrics: Accuracy
Training Parameters
Parameter | Value |
---|---|
Epochs | 100 (default) |
Batch Size | 32 (default) |
Learning Rate | 0.001 (default) |
Dropout Rate | 0.3 |
L2 Regularization | 0.01 |
Validation Split | 20% |
Early Stopping | Enabled (patience: 10) |
LR Scheduler | ReduceLROnPlateau (patience: 5) |
Logging | TensorBoard + CSV export |
Model Storage
- Storage of trained model as .h5
- Storage of class labels as .npy
- Storage of training history as .csv
- Availability of TensorBoard logs for training inspection
Evaluation Metrics
The classifier underwent evaluation on a held-out test set distributed across the four bee queen states.
Class-wise Performance
Class | Precision | Recall | F1-Score |
---|---|---|---|
Queen not present | 0.56 | 0.61 | 0.58 |
Queen present and newly accepted | 0.73 | 0.94 | 0.82 |
Queen present and rejected | 0.87 | 0.61 | 0.72 |
Queen present or original queen | 0.81 | 0.37 | 0.51 |
Overall Performance
Metric | Value |
---|---|
Accuracy | 0.73 |
Macro Avg F1 | 0.66 |
Weighted Avg F1 | 0.72 |
Considerations
- Superior performance on "Queen present and newly accepted" (F1: 0.82, Recall: 0.94)
- Moderate balance of precision/recall on "Queen not present"
- Deficient Recall (0.37) for "Queen present or original queen," which indicates potential for improvement
- Overall accuracy of 73% demonstrates sufficient performance given the complexity of acoustic analysis
- Macro vs. Weighted Averages:
- Macro F1 (0.66) indicates class imbalance
- Weighted F1 (0.72) demonstrates effective management of dominant classes
How to Use This Model
YAMNet Setup
Create a directory named
yamnet
Obtain necessary files from the YAMNet repository:
- Access
https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
- Download
yamnet_class_map.csv
- Obtain the YAMNet model with:
curl -O https://storage.googleapis.com/audioset/yamnet.h5
- Store both files in the
yamnet
directory
- Access
Verify YAMNet installation:
- Execute
python yamnet_test.py
- A successful installation will display "OK" for all 4 tests
- This verification confirms proper model setup
- Execute
Training Implementation
Audio data preparation:
- Store audio files for fine-tuning in the
data
directory - Structure audio files by labels in separate subdirectories within
data
- Audio files must conform to
.wav
format - For necessary conversion, execute:
python dataset_preprocess.py --input_dir <input_directory> --output_dir <output_directory>
- Store audio files for fine-tuning in the
Training execution:
- Execute
python train.py --data_path data/ --model_name <model_name>
- The system will store the fine-tuned model and labels (in
.npz
format) in themodel
directory
- Execute
Inference Execution
- Audio inference procedure:
- Execute
python inference.py test/*.wav
- The system will automatically utilize the most recent version of the model in the
model
directory
- Execute
References and Acknowledgments
This work builds upon significant contributions from the research community:
Smart Bee Colony Monitor Dataset:
@misc{anna_yang_2022, title={Smart Bee Colony Monitor: Clips of Beehive Sounds}, url={https://www.kaggle.com/dsv/4451415}, DOI={10.34740/KAGGLE/DSV/4451415}, publisher={Kaggle}, author={Anna Yang}, year={2022} }
YAMNet Model:
- Original implementation: https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
YAMNet Transfer Learning Framework:
We express our appreciation to the authors of these resources, which were essential to this work.
License
This model is released under the MIT License, which permits use, copying, modification, distribution, and sale of copies of the software, subject to the inclusion of the copyright notice and permission notice in all copies or substantial portions of the software.
For the complete license text, please refer to: https://opensource.org/licenses/MIT