Audio-to-Audio
Transformers
Safetensors
speech_language_model
Inference Endpoints
gallilmaimon nielsr HF staff commited on
Commit
60f1002
·
verified ·
1 Parent(s): 0f9b7c2

Fix typos (#1)

Browse files

- Fix typos (b09eb4e62489bd6ab99efe37c5de5a191d297248)
- Update README.md (a86fb434f954626112b1d4aef73dcb3be46b77e1)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -10,14 +10,15 @@ base_model:
10
  pipeline_tag: audio-to-audio
11
  ---
12
 
13
- # Model Card for Model ID
14
- This is a Speech Lanaguage Model trained for generating speech contiuations over discrete [Hubert tokens](https://huggingface.co/slprl/mhubert-base-25hz).
 
15
 
16
 
17
  ## Model Details
18
 
19
  ### Model Description
20
- This is a Speech Lanaguage Model, introduced in "[_Slamming_: Training a Speech Language Model on One GPU in a Day](https://arxiv.org/abs/2502.15814)", focusing on efficient training.
21
  It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
22
  the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with
23
  slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
@@ -35,10 +36,10 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
35
 
36
  - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
37
  - **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
38
- - **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
39
 
40
  ## Uses
41
- This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the _SlamKit_
42
  [codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
43
 
44
  ### Out-of-Scope Use
@@ -47,7 +48,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo
47
 
48
 
49
  ## How to Get Started with the Model
50
- We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slamkit).
51
 
52
 
53
  ## Training Details
@@ -61,7 +62,7 @@ This model was trained on a subset of [LibriSpeech](https://huggingface.co/datas
61
  dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
62
 
63
  ### Training Procedure
64
- This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
65
  Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
66
 
67
  #### Preprocessing
@@ -93,7 +94,7 @@ This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores an
93
 
94
  #### Software
95
  The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
96
- easy and efficent training of Speech Language Models.
97
 
98
  ## Citation
99
 
 
10
  pipeline_tag: audio-to-audio
11
  ---
12
 
13
+ # Model Card for SLAM
14
+
15
+ This is a Speech Language Model trained for generating speech continuations over discrete [Hubert tokens](https://huggingface.co/slprl/mhubert-base-25hz).
16
 
17
 
18
  ## Model Details
19
 
20
  ### Model Description
21
+ This is a Speech Language Model, introduced in "[_Slamming_: Training a Speech Language Model on One GPU in a Day](https://arxiv.org/abs/2502.15814)", focusing on efficient training.
22
  It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
23
  the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with
24
  slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
 
36
 
37
  - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
38
  - **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
39
+ - **Demo:** [https://pages.cs.huji.ac.il/adiyoss-lab/slamming/](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
40
 
41
  ## Uses
42
+ This is a base SpeechLM and as such can be used to generate continuations for speech segments, or as base for further tuning. See the _SlamKit_
43
  [codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
44
 
45
  ### Out-of-Scope Use
 
48
 
49
 
50
  ## How to Get Started with the Model
51
+ We refer users to the official repository for full usage explanations - [github](https://github.com/slp-rl/slamkit).
52
 
53
 
54
  ## Training Details
 
62
  dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
63
 
64
  ### Training Procedure
65
+ This model was trained by next token prediction over several datasets, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
66
  Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
67
 
68
  #### Preprocessing
 
94
 
95
  #### Software
96
  The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
97
+ easy and efficient training of Speech Language Models.
98
 
99
  ## Citation
100