TDN-M commited on
Commit
3db8c0d
·
verified ·
1 Parent(s): 2944edc

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -20
app.py CHANGED
@@ -15,37 +15,19 @@ import spaces
15
  model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
16
  processor = MusicgenProcessor.from_pretrained("facebook/musicgen-small")
17
 
18
- title = "MusicGen Streaming"
19
 
20
  description = """
21
- Stream the outputs of the MusicGen text-to-music model by playing the generated audio as soon as the first chunk is ready.
22
- Demo uses [MusicGen Small](https://huggingface.co/facebook/musicgen-small) in the 🤗 Transformers library. Note that the
23
- demo works best on the Chrome browser. If there is no audio output, try switching browser to Chrome.
24
  """
25
 
26
  article = """
27
  ## How Does It Work?
28
-
29
- MusicGen is an auto-regressive transformer-based model, meaning generates audio codes (tokens) in a causal fashion.
30
- At each decoding step, the model generates a new set of audio codes, conditional on the text input and all previous audio codes. From the
31
- frame rate of the [EnCodec model](https://huggingface.co/facebook/encodec_32khz) used to decode the generated codes to audio waveform,
32
- each set of generated audio codes corresponds to 0.02 seconds. This means we require a total of 1000 decoding steps to generate
33
- 20 seconds of audio.
34
-
35
- Rather than waiting for the entire audio sequence to be generated, which would require the full 1000 decoding steps, we can start
36
- playing the audio after a specified number of decoding steps have been reached, a techinque known as [*streaming*](https://huggingface.co/docs/transformers/main/en/generation_strategies#streaming).
37
- For example, after 250 steps we have the first 5 seconds of audio ready, and so can play this without waiting for the remaining
38
- 750 decoding steps to be complete. As we continue to generate with the MusicGen model, we append new chunks of generated audio
39
- to our output waveform on-the-fly. After the full 1000 decoding steps, the generated audio is complete, and is composed of four
40
- chunks of audio, each corresponding to 250 tokens.
41
-
42
  This method of playing incremental generations reduces the latency of the MusicGen model from the total time to generate 1000 tokens,
43
  to the time taken to play the first chunk of audio (250 tokens). This can result in significant improvements to perceived latency,
44
  particularly when the chunk size is chosen to be small. In practice, the chunk size should be tuned to your device: using a
45
  smaller chunk size will mean that the first chunk is ready faster, but should not be chosen so small that the model generates slower
46
  than the time it takes to play the audio.
47
-
48
- For details on how the streaming class works, check out the source code for the [MusicgenStreamer](https://huggingface.co/spaces/sanchit-gandhi/musicgen-streaming/blob/main/app.py#L52).
49
  """
50
 
51
 
 
15
  model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
16
  processor = MusicgenProcessor.from_pretrained("facebook/musicgen-small")
17
 
18
+ title = "MUSIC GEN TEST"
19
 
20
  description = """
21
+ Lưu ý rằng bản demo hoạt động tốt nhất trên trình duyệt Chrome. Nếu không đầu ra âm thanh, hãy thử chuyển trình duyệt sang Chrome.
 
 
22
  """
23
 
24
  article = """
25
  ## How Does It Work?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  This method of playing incremental generations reduces the latency of the MusicGen model from the total time to generate 1000 tokens,
27
  to the time taken to play the first chunk of audio (250 tokens). This can result in significant improvements to perceived latency,
28
  particularly when the chunk size is chosen to be small. In practice, the chunk size should be tuned to your device: using a
29
  smaller chunk size will mean that the first chunk is ready faster, but should not be chosen so small that the model generates slower
30
  than the time it takes to play the audio.
 
 
31
  """
32
 
33