Fix typos
This PR corrects the pipeline_tag
in the model card metadata from audio-to-audio
to audio-text-to-text
. The model generates text continuations from audio input, making audio-text-to-text
the accurate and more descriptive tag. This improves the model's searchability and discoverability on the Hugging Face Hub.
Hey
@nielsr
,
I just want to clarify that this SpeechLM gets as input speech (tokens) and outputs speech (tokens), with no text anywhere, this is like TWIST - https://arxiv.org/abs/2305.13009 or GSLM - https://arxiv.org/abs/2102.01192 and those types of models. Therefore I think the audio-to-audio
pipeline tag is correct. (I suspect the confusion cafe from models like QwenAudio which get speech as input/conditioning for a textLM).
I am happy to integrate the stylistic and linguistic improvements you made though :)
Thanks for providing context!
Note, this PR was entirely by an LLM - it needs some learning :)
Sounds good :) If you wanted you could apply the similar fixes to slprl/slam
. Thanks again, merging!