Spaces:
Runtime error
Runtime error
more info + correct model and dataset links
Browse files
README.md
CHANGED
@@ -11,9 +11,11 @@ license: agpl-3.0
|
|
11 |
library_name: diffusers
|
12 |
pipeline_tag: text-to-video
|
13 |
datasets:
|
14 |
-
- TempoFunk/tempofunk-
|
|
|
15 |
models:
|
16 |
- TempoFunk/makeavid-sd-jax
|
|
|
17 |
tags:
|
18 |
- jax-diffusers-event
|
19 |
---
|
|
|
11 |
library_name: diffusers
|
12 |
pipeline_tag: text-to-video
|
13 |
datasets:
|
14 |
+
- TempoFunk/tempofunk-sdance
|
15 |
+
- TempoFunk/tempofunk-m
|
16 |
models:
|
17 |
- TempoFunk/makeavid-sd-jax
|
18 |
+
- runwayml/stable-diffusion-v1-5
|
19 |
tags:
|
20 |
- jax-diffusers-event
|
21 |
---
|
app.py
CHANGED
@@ -121,15 +121,31 @@ with gr.Blocks(title = 'Make-A-Video Stable Diffusion JAX', analytics_enabled =
|
|
121 |
with gr.Column():
|
122 |
intro1 = gr.Markdown("""
|
123 |
# Make-A-Video Stable Diffusion JAX
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
**Please be patient. The model might have to compile with current parameters.**
|
125 |
|
126 |
This can take up to 5 minutes on the first run, and 2-3 minutes on later runs.
|
127 |
The compilation will be cached and consecutive runs with the same parameters
|
128 |
will be much faster.
|
129 |
-
|
130 |
-
|
131 |
-
intro2 = gr.Markdown("""
|
132 |
-
The following parameters require the model to compile
|
133 |
- Number of frames
|
134 |
- Width & Height
|
135 |
- Steps
|
@@ -153,7 +169,7 @@ with gr.Blocks(title = 'Make-A-Video Stable Diffusion JAX', analytics_enabled =
|
|
153 |
)
|
154 |
inference_steps_input = gr.Slider(
|
155 |
label = 'Steps',
|
156 |
-
minimum =
|
157 |
maximum = 100,
|
158 |
value = 20,
|
159 |
step = 1
|
@@ -222,6 +238,7 @@ with gr.Blocks(title = 'Make-A-Video Stable Diffusion JAX', analytics_enabled =
|
|
222 |
height_input.change(fn = trigger_check_fun, inputs = trigger_inputs, outputs = will_trigger)
|
223 |
width_input.change(fn = trigger_check_fun, inputs = trigger_inputs, outputs = will_trigger)
|
224 |
num_frames_input.change(fn = trigger_check_fun, inputs = trigger_inputs, outputs = will_trigger)
|
|
|
225 |
inference_steps_input.change(fn = trigger_check_fun, inputs = trigger_inputs, outputs = will_trigger)
|
226 |
will_trigger.value = trigger_check_fun(image_input.value, inference_steps_input.value, height_input.value, width_input.value, num_frames_input.value)
|
227 |
ev = submit_button.click(
|
@@ -254,6 +271,6 @@ with gr.Blocks(title = 'Make-A-Video Stable Diffusion JAX', analytics_enabled =
|
|
254 |
)
|
255 |
cancel_button.click(fn = lambda: None, cancels = ev)
|
256 |
|
257 |
-
demo.queue(concurrency_count = 1, max_size =
|
258 |
demo.launch()
|
259 |
|
|
|
121 |
with gr.Column():
|
122 |
intro1 = gr.Markdown("""
|
123 |
# Make-A-Video Stable Diffusion JAX
|
124 |
+
|
125 |
+
We have extended a pretrained LMD inpainting image generation model with temporal convolutions and attention.
|
126 |
+
We take advantage of the extra 5 input channels of the inpaint model to guide the video generation with a hint image and mask.
|
127 |
+
The hint image can be given by the users, otherwise it is generated by an generative image model.
|
128 |
+
|
129 |
+
The temporal convolution and attention is a port of [Make-A-Video Pytorch](https://github.com/lucidrains/make-a-video-pytorch/blob/main/make_a_video_pytorch)
|
130 |
+
to FLAX. It is a pseudo 3D convolution that seperately convolves accross the spatial dimension in 2D and over the temporal dimension in 1D.
|
131 |
+
Temporal attention is purely self attention and also separately attends to time and space.
|
132 |
+
|
133 |
+
Only the new temporal layers have been fine tuned on a dataset of videos themed around dance.
|
134 |
+
The model has been trained for 60 epochs on a dataset of 10,000 Videos with 120 frames each, randomly selecting a 24 frame range from each sample.
|
135 |
+
|
136 |
+
See model and dataset links in the metadata.
|
137 |
+
|
138 |
+
Model implementation and training code can be found at [https://github.com/lopho/makeavid-sd-tpu](https://github.com/lopho/makeavid-sd-tpu)
|
139 |
+
""")
|
140 |
+
with gr.Column():
|
141 |
+
intro3 = gr.Markdown("""
|
142 |
**Please be patient. The model might have to compile with current parameters.**
|
143 |
|
144 |
This can take up to 5 minutes on the first run, and 2-3 minutes on later runs.
|
145 |
The compilation will be cached and consecutive runs with the same parameters
|
146 |
will be much faster.
|
147 |
+
|
148 |
+
Changes to the following parameters require the model to compile
|
|
|
|
|
149 |
- Number of frames
|
150 |
- Width & Height
|
151 |
- Steps
|
|
|
169 |
)
|
170 |
inference_steps_input = gr.Slider(
|
171 |
label = 'Steps',
|
172 |
+
minimum = 2,
|
173 |
maximum = 100,
|
174 |
value = 20,
|
175 |
step = 1
|
|
|
238 |
height_input.change(fn = trigger_check_fun, inputs = trigger_inputs, outputs = will_trigger)
|
239 |
width_input.change(fn = trigger_check_fun, inputs = trigger_inputs, outputs = will_trigger)
|
240 |
num_frames_input.change(fn = trigger_check_fun, inputs = trigger_inputs, outputs = will_trigger)
|
241 |
+
image_input.change(fn = trigger_check_fun, inputs = trigger_inputs, outputs = will_trigger)
|
242 |
inference_steps_input.change(fn = trigger_check_fun, inputs = trigger_inputs, outputs = will_trigger)
|
243 |
will_trigger.value = trigger_check_fun(image_input.value, inference_steps_input.value, height_input.value, width_input.value, num_frames_input.value)
|
244 |
ev = submit_button.click(
|
|
|
271 |
)
|
272 |
cancel_button.click(fn = lambda: None, cancels = ev)
|
273 |
|
274 |
+
demo.queue(concurrency_count = 1, max_size = 32)
|
275 |
demo.launch()
|
276 |
|