WpythonW
/

ast-fakeaudio-detector

Audio Classification

audio-spectrogram-transformer

fake-audio-detection

Inference Endpoints

Model card Files Files and versions Community

WpythonW commited on Jan 23

Commit

7ad4ec8

·

verified ·

1 Parent(s): b743507

Update README.md

Files changed (1) hide show

README.md +2 -48

README.md CHANGED Viewed

@@ -57,42 +57,6 @@ This model is a binary classification head fine-tuned version of [MIT/ast-finetu
 - **Output**: Probabilities [fake_prob, real_prob]
 - **Training Hardware**: 2x NVIDIA T4 GPUs
-## Training Configuration
-```python
-{
-    'learning_rate': 1e-5,
-    'weight_decay': 0.01,
-    'n_iterations': 1500,
-    'batch_size': 16,
-    'gradient_accumulation_steps': 8,
-    'validate_every': 500,
-    'val_samples': 5000
-}
-```
-## Dataset Distribution
-The model was trained on a filtered dataset with the following class distribution:
-```
-Training Set:
-- Fake Audio (0): 29,089 samples (53.97%)
-- Real Audio (1): 24,813 samples (46.03%)
-Test Set:
-- Fake Audio (0): 7,229 samples (53.64%)
-- Real Audio (1): 6,247 samples (46.36%)
-```
-## Model Performance
-Final metrics on validation set:
-- Accuracy: 0.9662 (96.62%)
-- F1 Score: 0.9710 (97.10%)
-- Precision: 0.9692 (96.92%)
-- Recall: 0.9728 (97.28%)
 # Usage Guide
 ## Model Usage
@@ -167,16 +131,6 @@ for filename, probs in zip(audio_files, probabilities):
 ## Limitations
 Important considerations when using this model:
-1. The model works best with 16kHz audio input
 2. Performance may vary with different types of audio manipulation not present in training data
-3. Very short audio clips (<1 second) might not provide reliable results
-4. The model should not be used as the sole determiner for real/fake audio detection
-## Training Details
-The training process involved:
-1. Loading the base AST model pretrained on AudioSet
-2. Replacing the classification head with a binary classifier
-3. Fine-tuning on the fake audio detection dataset for 1500 iterations
-4. Using gradient accumulation (8 steps) with batch size 16
-5. Implementing validation checks every 500 steps

 - **Output**: Probabilities [fake_prob, real_prob]
 - **Training Hardware**: 2x NVIDIA T4 GPUs
 # Usage Guide
 ## Model Usage
 ## Limitations
 Important considerations when using this model:
+1. The model works with 16kHz audio input
 2. Performance may vary with different types of audio manipulation not present in training data
+3. The model was trained on audio samples ranging from 4 to 10 seconds in duration.