nielsr HF Staff commited on
Commit
a82489f
·
verified ·
1 Parent(s): 79296a4

Update pipeline tag, add project page and paper links

Browse files

This PR improves the model card for AuriStream by:
- Updating the `pipeline_tag` from `feature-extraction` to `audio-to-audio`. This change better reflects the model's generative capabilities, particularly its ability to create "speech continuations", and enhances its discoverability on the Hugging Face Hub under the correct pipeline.
- Adding a direct link to the associated Hugging Face paper page: [Representing Speech Through Autoregressive Prediction of Cochlear Tokens](https://huggingface.co/papers/2508.11598).
- Including a link to the official project page: [https://tukoresearch.github.io/auristream-speech/](https://tukoresearch.github.io/auristream-speech/).

These additions provide users with a more accurate understanding of the model's functionality and direct access to its research and related resources.

Files changed (1) hide show
  1. README.md +8 -10
README.md CHANGED
@@ -1,24 +1,25 @@
1
  ---
 
 
2
  language:
3
  - en
4
  library_name: transformers
5
- pipeline_tag: feature-extraction
 
6
  tags:
7
  - audio
8
  - speech
9
  - autoregressive
10
  - transformers
11
  - custom_code
12
- datasets:
13
- - LibriLight
14
- license: apache-2.0
15
  pretty_name: AuriStream1B
16
  ---
17
 
18
-
19
  # AuriStream-1B
20
 
21
- **AuriStream** is a biologically-inspired, GPT-style autoregressive Transformer trained to predict tokens from the speech stream (denoted as **cochlear tokens**). These cochlear tokens are discrete codes produced by a companion “WavCoch” tokenizer (a model trained to predict the time-frequency cochleagram from a waveform, with a LFQ bottleneck for token read-out). AuriStream utilizes a long context window of (\~20 s, \~4096 tokens) and is trained on **LibriLight (\~60k hours)** for **500k steps**. It learns meaningful representations about e.g. phoneme/word identity and can predict future tokens to generate **speech continuations**. Inputs are cochlear **token IDs**; use it with a WavCoch tokenizer for audio -> tokens.
 
 
22
 
23
  ---
24
 
@@ -126,8 +127,6 @@ with torch.no_grad():
126
  prompt_tokens, rollout_steps, temp=0.7, top_k=50, top_p=0.95, seed=0
127
  )
128
  full_tokens = torch.cat([prompt_tokens, pred_tokens], dim=1) # (1, L+K)
129
-
130
-
131
  ```
132
 
133
  ## Architecture overview
@@ -152,5 +151,4 @@ If you use this model, please cite:
152
  doi = {10.21437/Interspeech.2025-2044},
153
  issn = {2958-1796}
154
  }
155
- ```
156
-
 
1
  ---
2
+ datasets:
3
+ - LibriLight
4
  language:
5
  - en
6
  library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: audio-to-audio
9
  tags:
10
  - audio
11
  - speech
12
  - autoregressive
13
  - transformers
14
  - custom_code
 
 
 
15
  pretty_name: AuriStream1B
16
  ---
17
 
 
18
  # AuriStream-1B
19
 
20
+ [📚 Paper](https://huggingface.co/papers/2508.11598) - [🌐 Project Page](https://tukoresearch.github.io/auristream-speech/)
21
+
22
+ **AuriStream** is a biologically-inspired, GPT-style autoregressive Transformer trained to predict tokens from the speech stream (denoted as **cochlear tokens**). These cochlear tokens are discrete codes produced by a companion “WavCoch” tokenizer (a model trained to predict the time-frequency cochleagram from a waveform, with a LFQ bottleneck for token read-out). AuriStream utilizes a long context window of (~20 s, ~4096 tokens) and is trained on **LibriLight (~60k hours)** for **500k steps**. It learns meaningful representations about e.g. phoneme/word identity and can predict future tokens to generate **speech continuations**. Inputs are cochlear **token IDs**; use it with a WavCoch tokenizer for audio -> tokens.
23
 
24
  ---
25
 
 
127
  prompt_tokens, rollout_steps, temp=0.7, top_k=50, top_p=0.95, seed=0
128
  )
129
  full_tokens = torch.cat([prompt_tokens, pred_tokens], dim=1) # (1, L+K)
 
 
130
  ```
131
 
132
  ## Architecture overview
 
151
  doi = {10.21437/Interspeech.2025-2044},
152
  issn = {2958-1796}
153
  }
154
+ ```