Model Card: AI-Belha Classifier

Inference Demo

🐝 Try it out here → AI-Belha-Classifier (Live Demo)

This demo uses a YAMNet + Fine-Tuned classifier to detect Queen Bee Status from beehive recordings.

Model Description

The AI-Belha Classifier is a fine-tuned neural network based on YAMNet, adapted for acoustic monitoring of queen bee status inside beehives. Through analysis of hive audio signals, the model classifies each clip into one of four predefined bee queen states:

Queen not present
Queen present and newly accepted
Queen present and rejected
Queen present (original queen)

This model extends the original YAMNet framework through transfer learning to address tasks specific to smart apiculture and automated hive monitoring.

Key Features

Extension of YAMNet with custom bee-specific labels
Raw waveform input processing with per-frame class prediction outputs
Softmax probability scores for prediction interpretation
Modular architecture suitable for transfer learning and fine-tuning

Dataset and Preprocessing

The model was trained on the Smart Bee Colony Monitor: Clips of Beehive Sounds dataset from Kaggle (https://www.kaggle.com/datasets/annajyang/beehive-sounds). The dataset comprises approximately 7,100 labeled audio samples, each corresponding to one of the four bee queen states.

Data Preparation Process

Manual Annotation: Audio clips received labels with queen bee states based on domain expertise.
Feature Extraction:
- Conversion of audio to log-mel spectrograms
- Analysis of key audio features in time and frequency domains
Spectrogram Processing:
- Application of Short-Time Fourier Transform (STFT) with Hann window
- Implementation of Mel filter banks to correspond with human auditory perception
- Logarithmic compression for dynamic range reduction
Patch Creation:
- Division of spectrograms into fixed-size patches with tf.signal.frame
- Utilization of patches as individual classification units
Input Standardization:
- Zero-padding of audio clips shorter than required length
- This standardization ensures:
  - Minimum of one complete spectrogram patch per clip
  - Standard format for batch processing and model input requirements

Model Architecture

The AI-Belha Classifier utilizes YAMNet, a MobileNet-style convolutional neural network optimized for audio classification with depthwise separable convolutions. While the original YAMNet was trained on AudioSet (521 classes), this implementation replaces the classifier head and fine-tunes it for bee-specific states.

Core Structure (YAMNet Backbone)

1 initial Conv2D layer
13 Depthwise Separable Convolution layers
Global Average Pooling for time-frequency feature aggregation
Final Dense Layer for class logit projection (customized for transfer learning)

Transfer Learning Head

Construction on YAMNet embeddings:

Input: 1024-dimensional embedding vector per audio patch
First Hidden Layer:
- 1024 units
- ReLU activation
- L2 regularization
- Batch normalization
- Dropout
Additional Layers (optional):
- Dynamic size scaling
- ReLU + BatchNorm + Dropout
Output Layer:
- Softmax activation
- 4 units (bee class count)

Training Process

The training procedure utilized YAMNet embeddings as fixed features with a fully connected neural network head for classification.

Process Steps

Feature Extraction: Transformation of audio waveforms to embeddings via pretrained YAMNet.
Dataset Preparation:
- Extraction of embeddings and labels per audio file
- Treatment of each embedding (per patch) as an individual training sample
Label Encoding:
- Storage of class names and encoding with one-hot vectors via LabelBinarizer
Model Configuration:
- Loss: Categorical Crossentropy
- Optimizer: Adam
- Metrics: Accuracy

Training Parameters

Parameter	Value
Epochs	100 (default)
Batch Size	32 (default)
Learning Rate	0.001 (default)
Dropout Rate	0.3
L2 Regularization	0.01
Validation Split	20%
Early Stopping	Enabled (patience: 10)
LR Scheduler	ReduceLROnPlateau (patience: 5)
Logging	TensorBoard + CSV export

Model Storage

Storage of trained model as .h5
Storage of class labels as .npy
Storage of training history as .csv
Availability of TensorBoard logs for training inspection

Evaluation Metrics

The classifier underwent evaluation on a held-out test set distributed across the four bee queen states.

Class-wise Performance

Class	Precision	Recall	F1-Score
Queen not present	0.56	0.61	0.58
Queen present and newly accepted	0.73	0.94	0.82
Queen present and rejected	0.87	0.61	0.72
Queen present or original queen	0.81	0.37	0.51

Overall Performance

Metric	Value
Accuracy	0.73
Macro Avg F1	0.66
Weighted Avg F1	0.72

Considerations

Superior performance on "Queen present and newly accepted" (F1: 0.82, Recall: 0.94)
Moderate balance of precision/recall on "Queen not present"
Deficient Recall (0.37) for "Queen present or original queen," which indicates potential for improvement
Overall accuracy of 73% demonstrates sufficient performance given the complexity of acoustic analysis
Macro vs. Weighted Averages:
- Macro F1 (0.66) indicates class imbalance
- Weighted F1 (0.72) demonstrates effective management of dominant classes

How to Use This Model

YAMNet Setup

Create a directory named yamnet
Obtain necessary files from the YAMNet repository:
- Access https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
- Download yamnet_class_map.csv
- Obtain the YAMNet model with: curl -O https://storage.googleapis.com/audioset/yamnet.h5
- Store both files in the yamnet directory
Verify YAMNet installation:
- Execute python yamnet_test.py
- A successful installation will display "OK" for all 4 tests
- This verification confirms proper model setup

Training Implementation

Audio data preparation:
- Store audio files for fine-tuning in the data directory
- Structure audio files by labels in separate subdirectories within data
- Audio files must conform to .wav format
- For necessary conversion, execute: python dataset_preprocess.py --input_dir <input_directory> --output_dir <output_directory>
Training execution:
- Execute python train.py --data_path data/ --model_name <model_name>
- The system will store the fine-tuned model and labels (in .npz format) in the model directory

Inference Execution

Audio inference procedure:
- Execute python inference.py test/*.wav
- The system will automatically utilize the most recent version of the model in the model directory

References and Acknowledgments

This work builds upon significant contributions from the research community:

Smart Bee Colony Monitor Dataset:

@misc{anna_yang_2022,
  title={Smart Bee Colony Monitor: Clips of Beehive Sounds},
  url={https://www.kaggle.com/dsv/4451415},
  DOI={10.34740/KAGGLE/DSV/4451415},
  publisher={Kaggle},
  author={Anna Yang},
  year={2022}
}

YAMNet Model:
- Original implementation: https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
YAMNet Transfer Learning Framework:
- https://github.com/KosminD/YAMNet_transfer/

We express our appreciation to the authors of these resources, which were essential to this work.

License

This model is released under the MIT License, which permits use, copying, modification, distribution, and sale of copies of the software, subject to the inclusion of the copyright notice and permission notice in all copies or substantial portions of the software.

For the complete license text, please refer to: https://opensource.org/licenses/MIT