Cough Classification Model
This Random Forest model classifies cough audio recordings into three categories: COVID-19, healthy, and symptomatic.
Model Description
- Model Type: Random Forest Classifier (scikit-learn implementation)
- Features: Audio features extracted from cough recordings including:
- Temporal features: RMS energy, zero-crossing rate
- Spectral features: centroid, bandwidth, contrast, rolloff
- MFCCs (13 coefficients with means and standard deviations)
- Chroma features
- Classes: COVID-19, healthy, symptomatic
- Training Dataset: Balanced subset of the COUGHVID dataset
- Feature Extraction: Using librosa for audio processing
Intended Use
This model is intended for research purposes only and should not be used for medical diagnosis. It demonstrates how machine learning can identify patterns in cough audio that might correlate with health status.
Performance
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
COVID-19 | 0.82 | 0.75 | 0.78 | 20 |
healthy | 0.79 | 0.85 | 0.82 | 20 |
symptomatic | 0.70 | 0.70 | 0.70 | 20 |
accuracy | 0.77 | 60 | ||
macro avg | 0.77 | 0.77 | 0.77 | 60 |
weighted avg | 0.77 | 0.77 | 0.77 | 60 |
Limitations
- This model should not be used for medical diagnosis
- Performance may vary with different audio recording conditions
- The training data is relatively small and may not represent all populations
- Audio quality significantly impacts classification accuracy
- The model does not account for various confounding factors that may affect cough sounds
Ethical Considerations
- Health-related predictions should be treated with caution
- Users should be informed that this is a research tool, not a diagnostic device
- Privacy concerns regarding audio recordings should be addressed
Testing and Benchmarks
Test Methodology
- 80/20 train/test split of the balanced dataset
- StandardScaler applied to normalize features
- Performance evaluated using classification report and confusion matrix
Important Features
Top 5 features identified by the model:
- mfcc1_mean
- spectral_centroid_mean
- rolloff_mean
- mfcc2_mean
- spectral_bandwidth_mean
Benchmark Results
The model achieves 77% overall accuracy, with slightly better performance on healthy coughs compared to COVID-19 and symptomatic coughs.
Usage Example
import pickle
from librosa import load
import pandas as pd
import numpy as np
# Function to extract features (see source code for implementation)
def extract_all_features(audio_path):
# Implementation here - refer to original code
pass
# Load model components
with open('cough_classification_model.pkl', 'rb') as f:
components = pickle.load(f)
model = components['model']
scaler = components['scaler']
label_encoder = components['label_encoder']
feature_names = components['feature_names']
# Extract features from a new audio file
features = extract_all_features('path/to/cough_recording.wav')
# Prepare features
features_df = pd.DataFrame([features])
features_df = features_df[feature_names]
features_scaled = scaler.transform(features_df)
# Make prediction
prediction_idx = model.predict(features_scaled)[0]
prediction = label_encoder.inverse_transform([prediction_idx])[0]
probabilities = model.predict_proba(features_scaled)[0]
print(f"Predicted status: {prediction}")
print("Class probabilities:")
for idx, prob in enumerate(probabilities):
class_name = label_encoder.inverse_transform([idx])[0]
print(f" {class_name}: {prob:.4f}")
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The HF Inference API does not support audio-classification models for keras
library.
Model tree for greenarcade/cough-classification-model
Base model
google/hear-pytorchDataset used to train greenarcade/cough-classification-model
Evaluation results
- accuracy on CoughVid Dataset (Balanced Test)test set self-reported0.367
- auc_COVID-19 on CoughVid Dataset (Balanced Test)test set self-reported0.603
- auc_healthy on CoughVid Dataset (Balanced Test)test set self-reported0.564
- auc_symptomatic on CoughVid Dataset (Balanced Test)test set self-reported0.465
- f1_healthy on CoughVid Datasettest set self-reported0.410
- f1_COVID-19 on CoughVid Datasettest set self-reported0.400
- f1_symptomatic on CoughVid Datasettest set self-reported0.269
- healthy_accuracy on CoughVid Datasettest set self-reported0.533
- COVID-19_accuracy on CoughVid Datasettest set self-reported0.333
- symptomatic_accuracy on CoughVid Datasettest set self-reported0.233