nielsr HF Staff commited on
Commit
82ca084
Β·
verified Β·
1 Parent(s): 8ddc011

Add link to Github repository

Browse files

This PR improves the model card by adding a link to the Github repository for easier access to the code.

Files changed (1) hide show
  1. README.md +19 -10
README.md CHANGED
@@ -1,13 +1,4 @@
1
  ---
2
- license: bsd-3-clause
3
- pipeline_tag: feature-extraction
4
- tags:
5
- - automatic-speech-recognition
6
- - audio-classification
7
- - audio
8
- - speech
9
- - music
10
- library_name: transformers
11
  datasets:
12
  - openslr/librispeech_asr
13
  - facebook/multilingual_librispeech
@@ -17,14 +8,32 @@ datasets:
17
  - agkphysics/AudioSet
18
  language:
19
  - en
 
 
 
 
 
 
 
 
 
20
  ---
 
21
  # USAD: Universal Speech and Audio Representation via Distillation
22
 
 
 
 
 
 
 
23
  **Universal Speech and Audio Distillation (USAD)** is a unified **speech**, **sound**, and **music** encoder distilled from domain-specific teachers.
24
  Trained on 126k hours of mixed data, USAD delivers competitive performance across diverse benchmarks (SUPERB, HEAR, and AudioSet) with a single model.
25
 
26
  [πŸ‘€ **Read Full Paper**](https://arxiv.org/abs/2506.18843)
27
 
 
 
28
  ---
29
 
30
  ## πŸ—‚οΈ Models
@@ -89,4 +98,4 @@ See [usad_model.py](https://huggingface.co/MIT-SLS/USAD-Base/blob/main/usad_mode
89
 
90
  ## πŸ™ Acknowledgement
91
 
92
- Our implementation is based on the awesome [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq), [cwx-worst-one/EAT](https://github.com/cwx-worst-one/EAT), and [sooftware/conformer](https://github.com/sooftware/conformer) repositories.
 
1
  ---
 
 
 
 
 
 
 
 
 
2
  datasets:
3
  - openslr/librispeech_asr
4
  - facebook/multilingual_librispeech
 
8
  - agkphysics/AudioSet
9
  language:
10
  - en
11
+ library_name: transformers
12
+ license: bsd-3-clause
13
+ pipeline_tag: feature-extraction
14
+ tags:
15
+ - automatic-speech-recognition
16
+ - audio-classification
17
+ - audio
18
+ - speech
19
+ - music
20
  ---
21
+
22
  # USAD: Universal Speech and Audio Representation via Distillation
23
 
24
+ The model was presented in the paper [USAD: Universal Speech and Audio Representation via Distillation](https://huggingface.co/papers/2506.18843).
25
+
26
+ The abstract of the paper is the following:
27
+
28
+ Self-supervised learning (SSL) has revolutionized audio representations, yet models often remain domain-specific, focusing on either speech or non-speech tasks. In this work, we present Universal Speech and Audio Distillation (USAD), a unified approach to audio representation learning that integrates diverse audio types - speech, sound, and music - into a single model. USAD employs efficient layer-to-layer distillation from domain-specific SSL models to train a student on a comprehensive audio dataset. USAD offers competitive performance across various benchmarks and datasets, including frame and instance-level speech processing tasks, audio tagging, and sound classification, achieving near state-of-the-art results with a single encoder on SUPERB and HEAR benchmarks.
29
+
30
  **Universal Speech and Audio Distillation (USAD)** is a unified **speech**, **sound**, and **music** encoder distilled from domain-specific teachers.
31
  Trained on 126k hours of mixed data, USAD delivers competitive performance across diverse benchmarks (SUPERB, HEAR, and AudioSet) with a single model.
32
 
33
  [πŸ‘€ **Read Full Paper**](https://arxiv.org/abs/2506.18843)
34
 
35
+ Code: [MIT-SLS/USAD](https://github.com/MIT-SLS/USAD) *(Assuming this is the correct repository. Please verify.)*
36
+
37
  ---
38
 
39
  ## πŸ—‚οΈ Models
 
98
 
99
  ## πŸ™ Acknowledgement
100
 
101
+ Our implementation is based on the awesome [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq), [cwx-worst-one/EAT](https://github.com/cwx-worst-one/EAT), and [sooftware/conformer](https://github.com/sooftware/conformer) repositories.