musicgen-streaming

Sleeping

App Files Files Community

TDN-M commited on Nov 5, 2024

Commit

3db8c0d

verified ·

1 Parent(s): 2944edc

Update app.py

Browse files

Files changed (1) hide show

app.py +2 -20

app.py CHANGED Viewed

@@ -15,37 +15,19 @@ import spaces
 model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
 processor = MusicgenProcessor.from_pretrained("facebook/musicgen-small")
-title = "MusicGen Streaming"
 description = """
-Stream the outputs of the MusicGen text-to-music model by playing the generated audio as soon as the first chunk is ready.
-Demo uses [MusicGen Small](https://huggingface.co/facebook/musicgen-small) in the 🤗 Transformers library. Note that the
-demo works best on the Chrome browser. If there is no audio output, try switching browser to Chrome.
 """
 article = """
 ## How Does It Work?
-MusicGen is an auto-regressive transformer-based model, meaning generates audio codes (tokens) in a causal fashion.
-At each decoding step, the model generates a new set of audio codes, conditional on the text input and all previous audio codes. From the
-frame rate of the [EnCodec model](https://huggingface.co/facebook/encodec_32khz) used to decode the generated codes to audio waveform,
-each set of generated audio codes corresponds to 0.02 seconds. This means we require a total of 1000 decoding steps to generate
-20 seconds of audio.
-Rather than waiting for the entire audio sequence to be generated, which would require the full 1000 decoding steps, we can start
-playing the audio after a specified number of decoding steps have been reached, a techinque known as [*streaming*](https://huggingface.co/docs/transformers/main/en/generation_strategies#streaming).
-For example, after 250 steps we have the first 5 seconds of audio ready, and so can play this without waiting for the remaining
-750 decoding steps to be complete. As we continue to generate with the MusicGen model, we append new chunks of generated audio
-to our output waveform on-the-fly. After the full 1000 decoding steps, the generated audio is complete, and is composed of four
-chunks of audio, each corresponding to 250 tokens.
 This method of playing incremental generations reduces the latency of the MusicGen model from the total time to generate 1000 tokens,
 to the time taken to play the first chunk of audio (250 tokens). This can result in significant improvements to perceived latency,
 particularly when the chunk size is chosen to be small. In practice, the chunk size should be tuned to your device: using a
 smaller chunk size will mean that the first chunk is ready faster, but should not be chosen so small that the model generates slower
 than the time it takes to play the audio.
-For details on how the streaming class works, check out the source code for the [MusicgenStreamer](https://huggingface.co/spaces/sanchit-gandhi/musicgen-streaming/blob/main/app.py#L52).
 """

 model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
 processor = MusicgenProcessor.from_pretrained("facebook/musicgen-small")
+title = "MUSIC GEN TEST"
 description = """
+Lưu ý rằng bản demo hoạt động tốt nhất trên trình duyệt Chrome. Nếu không có đầu ra âm thanh, hãy thử chuyển trình duyệt sang Chrome.
 """
 article = """
 ## How Does It Work?
 This method of playing incremental generations reduces the latency of the MusicGen model from the total time to generate 1000 tokens,
 to the time taken to play the first chunk of audio (250 tokens). This can result in significant improvements to perceived latency,
 particularly when the chunk size is chosen to be small. In practice, the chunk size should be tuned to your device: using a
 smaller chunk size will mean that the first chunk is ready faster, but should not be chosen so small that the model generates slower
 than the time it takes to play the audio.
 """