Fix typos (#1)
Browse files- Fix typos (b09eb4e62489bd6ab99efe37c5de5a191d297248)
- Update README.md (a86fb434f954626112b1d4aef73dcb3be46b77e1)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -10,14 +10,15 @@ base_model:
|
|
10 |
pipeline_tag: audio-to-audio
|
11 |
---
|
12 |
|
13 |
-
# Model Card for
|
14 |
-
|
|
|
15 |
|
16 |
|
17 |
## Model Details
|
18 |
|
19 |
### Model Description
|
20 |
-
This is a Speech
|
21 |
It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
|
22 |
the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with
|
23 |
slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
|
@@ -35,10 +36,10 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
|
|
35 |
|
36 |
- **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
|
37 |
- **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
|
38 |
-
- **Demo:** [
|
39 |
|
40 |
## Uses
|
41 |
-
This is a base SpeechLM and as such can be used to generate
|
42 |
[codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
|
43 |
|
44 |
### Out-of-Scope Use
|
@@ -47,7 +48,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo
|
|
47 |
|
48 |
|
49 |
## How to Get Started with the Model
|
50 |
-
We refer users to the official repository for full usage
|
51 |
|
52 |
|
53 |
## Training Details
|
@@ -61,7 +62,7 @@ This model was trained on a subset of [LibriSpeech](https://huggingface.co/datas
|
|
61 |
dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
62 |
|
63 |
### Training Procedure
|
64 |
-
This model was trained by next token prediction over several
|
65 |
Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
|
66 |
|
67 |
#### Preprocessing
|
@@ -93,7 +94,7 @@ This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores an
|
|
93 |
|
94 |
#### Software
|
95 |
The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
|
96 |
-
easy and
|
97 |
|
98 |
## Citation
|
99 |
|
|
|
10 |
pipeline_tag: audio-to-audio
|
11 |
---
|
12 |
|
13 |
+
# Model Card for SLAM
|
14 |
+
|
15 |
+
This is a Speech Language Model trained for generating speech continuations over discrete [Hubert tokens](https://huggingface.co/slprl/mhubert-base-25hz).
|
16 |
|
17 |
|
18 |
## Model Details
|
19 |
|
20 |
### Model Description
|
21 |
+
This is a Speech Language Model, introduced in "[_Slamming_: Training a Speech Language Model on One GPU in a Day](https://arxiv.org/abs/2502.15814)", focusing on efficient training.
|
22 |
It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
|
23 |
the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with
|
24 |
slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
|
|
|
36 |
|
37 |
- **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
|
38 |
- **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
|
39 |
+
- **Demo:** [https://pages.cs.huji.ac.il/adiyoss-lab/slamming/](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
|
40 |
|
41 |
## Uses
|
42 |
+
This is a base SpeechLM and as such can be used to generate continuations for speech segments, or as base for further tuning. See the _SlamKit_
|
43 |
[codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
|
44 |
|
45 |
### Out-of-Scope Use
|
|
|
48 |
|
49 |
|
50 |
## How to Get Started with the Model
|
51 |
+
We refer users to the official repository for full usage explanations - [github](https://github.com/slp-rl/slamkit).
|
52 |
|
53 |
|
54 |
## Training Details
|
|
|
62 |
dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
63 |
|
64 |
### Training Procedure
|
65 |
+
This model was trained by next token prediction over several datasets, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
66 |
Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
|
67 |
|
68 |
#### Preprocessing
|
|
|
94 |
|
95 |
#### Software
|
96 |
The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
|
97 |
+
easy and efficient training of Speech Language Models.
|
98 |
|
99 |
## Citation
|
100 |
|