Model Card: AI-Belha Classifier

Inference Demo

🐝 Try it out here → AI-Belha-Classifier (Live Demo)

This demo uses a YAMNet + Fine-Tuned classifier to detect Queen Bee Status from beehive recordings.

Model Description

The AI-Belha Classifier is a fine-tuned neural network based on YAMNet, adapted for acoustic monitoring of queen bee status inside beehives. Through analysis of hive audio signals, the model classifies each clip into one of four predefined bee queen states:

  • Queen not present
  • Queen present and newly accepted
  • Queen present and rejected
  • Queen present (original queen)

This model extends the original YAMNet framework through transfer learning to address tasks specific to smart apiculture and automated hive monitoring.

Key Features

  • Extension of YAMNet with custom bee-specific labels
  • Raw waveform input processing with per-frame class prediction outputs
  • Softmax probability scores for prediction interpretation
  • Modular architecture suitable for transfer learning and fine-tuning

Dataset and Preprocessing

The model was trained on the Smart Bee Colony Monitor: Clips of Beehive Sounds dataset from Kaggle (https://www.kaggle.com/datasets/annajyang/beehive-sounds). The dataset comprises approximately 7,100 labeled audio samples, each corresponding to one of the four bee queen states.

Data Preparation Process

  1. Manual Annotation: Audio clips received labels with queen bee states based on domain expertise.

  2. Feature Extraction:

    • Conversion of audio to log-mel spectrograms
    • Analysis of key audio features in time and frequency domains
  3. Spectrogram Processing:

    • Application of Short-Time Fourier Transform (STFT) with Hann window
    • Implementation of Mel filter banks to correspond with human auditory perception
    • Logarithmic compression for dynamic range reduction
  4. Patch Creation:

    • Division of spectrograms into fixed-size patches with tf.signal.frame
    • Utilization of patches as individual classification units
  5. Input Standardization:

    • Zero-padding of audio clips shorter than required length
    • This standardization ensures:
      • Minimum of one complete spectrogram patch per clip
      • Standard format for batch processing and model input requirements

Model Architecture

The AI-Belha Classifier utilizes YAMNet, a MobileNet-style convolutional neural network optimized for audio classification with depthwise separable convolutions. While the original YAMNet was trained on AudioSet (521 classes), this implementation replaces the classifier head and fine-tunes it for bee-specific states.

Core Structure (YAMNet Backbone)

  • 1 initial Conv2D layer
  • 13 Depthwise Separable Convolution layers
  • Global Average Pooling for time-frequency feature aggregation
  • Final Dense Layer for class logit projection (customized for transfer learning)

Transfer Learning Head

Construction on YAMNet embeddings:

  • Input: 1024-dimensional embedding vector per audio patch
  • First Hidden Layer:
    • 1024 units
    • ReLU activation
    • L2 regularization
    • Batch normalization
    • Dropout
  • Additional Layers (optional):
    • Dynamic size scaling
    • ReLU + BatchNorm + Dropout
  • Output Layer:
    • Softmax activation
    • 4 units (bee class count)

Training Process

The training procedure utilized YAMNet embeddings as fixed features with a fully connected neural network head for classification.

Process Steps

  1. Feature Extraction: Transformation of audio waveforms to embeddings via pretrained YAMNet.

  2. Dataset Preparation:

    • Extraction of embeddings and labels per audio file
    • Treatment of each embedding (per patch) as an individual training sample
  3. Label Encoding:

    • Storage of class names and encoding with one-hot vectors via LabelBinarizer
  4. Model Configuration:

    • Loss: Categorical Crossentropy
    • Optimizer: Adam
    • Metrics: Accuracy

Training Parameters

Parameter Value
Epochs 100 (default)
Batch Size 32 (default)
Learning Rate 0.001 (default)
Dropout Rate 0.3
L2 Regularization 0.01
Validation Split 20%
Early Stopping Enabled (patience: 10)
LR Scheduler ReduceLROnPlateau (patience: 5)
Logging TensorBoard + CSV export

Model Storage

  • Storage of trained model as .h5
  • Storage of class labels as .npy
  • Storage of training history as .csv
  • Availability of TensorBoard logs for training inspection

Evaluation Metrics

The classifier underwent evaluation on a held-out test set distributed across the four bee queen states.

Class-wise Performance

Class Precision Recall F1-Score
Queen not present 0.56 0.61 0.58
Queen present and newly accepted 0.73 0.94 0.82
Queen present and rejected 0.87 0.61 0.72
Queen present or original queen 0.81 0.37 0.51

Overall Performance

Metric Value
Accuracy 0.73
Macro Avg F1 0.66
Weighted Avg F1 0.72

Considerations

  • Superior performance on "Queen present and newly accepted" (F1: 0.82, Recall: 0.94)
  • Moderate balance of precision/recall on "Queen not present"
  • Deficient Recall (0.37) for "Queen present or original queen," which indicates potential for improvement
  • Overall accuracy of 73% demonstrates sufficient performance given the complexity of acoustic analysis
  • Macro vs. Weighted Averages:
    • Macro F1 (0.66) indicates class imbalance
    • Weighted F1 (0.72) demonstrates effective management of dominant classes

How to Use This Model

YAMNet Setup

  1. Create a directory named yamnet

  2. Obtain necessary files from the YAMNet repository:

    • Access https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
    • Download yamnet_class_map.csv
    • Obtain the YAMNet model with: curl -O https://storage.googleapis.com/audioset/yamnet.h5
    • Store both files in the yamnet directory
  3. Verify YAMNet installation:

    • Execute python yamnet_test.py
    • A successful installation will display "OK" for all 4 tests
    • This verification confirms proper model setup

Training Implementation

  1. Audio data preparation:

    • Store audio files for fine-tuning in the data directory
    • Structure audio files by labels in separate subdirectories within data
    • Audio files must conform to .wav format
    • For necessary conversion, execute: python dataset_preprocess.py --input_dir <input_directory> --output_dir <output_directory>
  2. Training execution:

    • Execute python train.py --data_path data/ --model_name <model_name>
    • The system will store the fine-tuned model and labels (in .npz format) in the model directory

Inference Execution

  1. Audio inference procedure:
    • Execute python inference.py test/*.wav
    • The system will automatically utilize the most recent version of the model in the model directory

References and Acknowledgments

This work builds upon significant contributions from the research community:

  1. Smart Bee Colony Monitor Dataset:

    @misc{anna_yang_2022,
      title={Smart Bee Colony Monitor: Clips of Beehive Sounds},
      url={https://www.kaggle.com/dsv/4451415},
      DOI={10.34740/KAGGLE/DSV/4451415},
      publisher={Kaggle},
      author={Anna Yang},
      year={2022}
    }
    
  2. YAMNet Model:

  3. YAMNet Transfer Learning Framework:

We express our appreciation to the authors of these resources, which were essential to this work.

License

This model is released under the MIT License, which permits use, copying, modification, distribution, and sale of copies of the software, subject to the inclusion of the copyright notice and permission notice in all copies or substantial portions of the software.

For the complete license text, please refer to: https://opensource.org/licenses/MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support