Audio-to-Audio
Transformers
Safetensors
speech_language_model
Inference Endpoints

This PR corrects the pipeline_tag in the model card metadata from audio-to-audio to audio-text-to-text. The model generates text continuations from audio input, making audio-text-to-text the accurate and more descriptive tag. This improves the model's searchability and discoverability on the Hugging Face Hub.

SLP-RL HUJI org

Hey @nielsr ,
I just want to clarify that this SpeechLM gets as input speech (tokens) and outputs speech (tokens), with no text anywhere, this is like TWIST - https://arxiv.org/abs/2305.13009 or GSLM - https://arxiv.org/abs/2102.01192 and those types of models. Therefore I think the audio-to-audio pipeline tag is correct. (I suspect the confusion cafe from models like QwenAudio which get speech as input/conditioning for a textLM).

I am happy to integrate the stylistic and linguistic improvements you made though :)

Thanks for providing context!

Note, this PR was entirely by an LLM - it needs some learning :)

nielsr changed pull request title from Correct pipeline tag in model card to Fix typos
SLP-RL HUJI org

Sounds good :) If you wanted you could apply the similar fixes to slprl/slam. Thanks again, merging!

gallilmaimon changed pull request status to merged

Sign up or log in to comment