Update pipeline tag, add project page and paper links

This PR improves the model card for AuriStream by:
- Updating the `pipeline_tag` from `feature-extraction` to `audio-to-audio`. This change better reflects the model's generative capabilities, particularly its ability to create "speech continuations", and enhances its discoverability on the Hugging Face Hub under the correct pipeline.
- Adding a direct link to the associated Hugging Face paper page: [Representing Speech Through Autoregressive Prediction of Cochlear Tokens](https://huggingface.co/papers/2508.11598).
- Including a link to the official project page: [https://tukoresearch.github.io/auristream-speech/](https://tukoresearch.github.io/auristream-speech/).

These additions provide users with a more accurate understanding of the model's functionality and direct access to its research and related resources.

Files changed (1) hide show

README.md +8 -10

README.md CHANGED Viewed

@@ -1,24 +1,25 @@
 ---
 language:
 - en
 library_name: transformers
-pipeline_tag: feature-extraction
 tags:
 - audio
 - speech
 - autoregressive
 - transformers
 - custom_code
-datasets:
-- LibriLight
-license: apache-2.0
 pretty_name: AuriStream1B
 ---
 # AuriStream-1B
-**AuriStream** is a biologically-inspired, GPT-style autoregressive Transformer trained to predict tokens from the speech stream (denoted as **cochlear tokens**). These cochlear tokens are discrete codes produced by a companion “WavCoch” tokenizer (a model trained to predict the time-frequency cochleagram from a waveform, with a LFQ bottleneck for token read-out). AuriStream utilizes a long context window of (\~20 s, \~4096 tokens) and is trained on **LibriLight (\~60k hours)** for **500k steps**. It learns meaningful representations about e.g. phoneme/word identity and can predict future tokens to generate **speech continuations**. Inputs are cochlear **token IDs**; use it with a WavCoch tokenizer for audio -> tokens.
 ---
@@ -126,8 +127,6 @@ with torch.no_grad():
         prompt_tokens, rollout_steps, temp=0.7, top_k=50, top_p=0.95, seed=0
     )
     full_tokens = torch.cat([prompt_tokens, pred_tokens], dim=1)  # (1, L+K)
 ```
 ## Architecture overview
@@ -152,5 +151,4 @@ If you use this model, please cite:
   doi       = {10.21437/Interspeech.2025-2044},
   issn      = {2958-1796}
 }
-```

 ---
+datasets:
+- LibriLight
 language:
 - en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: audio-to-audio
 tags:
 - audio
 - speech
 - autoregressive
 - transformers
 - custom_code
 pretty_name: AuriStream1B
 ---
 # AuriStream-1B
+[📚 Paper](https://huggingface.co/papers/2508.11598) - [🌐 Project Page](https://tukoresearch.github.io/auristream-speech/)
+**AuriStream** is a biologically-inspired, GPT-style autoregressive Transformer trained to predict tokens from the speech stream (denoted as **cochlear tokens**). These cochlear tokens are discrete codes produced by a companion “WavCoch” tokenizer (a model trained to predict the time-frequency cochleagram from a waveform, with a LFQ bottleneck for token read-out). AuriStream utilizes a long context window of (~20 s, ~4096 tokens) and is trained on **LibriLight (~60k hours)** for **500k steps**. It learns meaningful representations about e.g. phoneme/word identity and can predict future tokens to generate **speech continuations**. Inputs are cochlear **token IDs**; use it with a WavCoch tokenizer for audio -> tokens.
 ---
         prompt_tokens, rollout_steps, temp=0.7, top_k=50, top_p=0.95, seed=0
     )
     full_tokens = torch.cat([prompt_tokens, pred_tokens], dim=1)  # (1, L+K)
 ```
 ## Architecture overview
   doi       = {10.21437/Interspeech.2025-2044},
   issn      = {2958-1796}
 }
+```