zeroscope_v2 30x448x256

A watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions and a smooth video output. This model was trained from the original weights using 9,923 clips and 29,769 tagged frames at 30 frames, 448x256 resolution.

zeroscope_v2 30x448x256 is specifically designed for upscaling with Potat1 using vid2vid in the 1111 text2video extension by kabachuha. Leveraging this model as a preliminary step allows for superior overall compositions at higher resolutions in Potat1, permitting faster exploration in 448x256 before transitioning to a high-resolution render. See an example output that has been upscaled to 1152 x 640 using Potat1.

Using it with the 1111 text2video extension

Rename the file 'zeroscope_v2_30x448x256.pth' to 'text2video_pytorch_model.pth'.
Rename the file 'zeroscope_v2_30x448x256_text.bin' to 'open_clip_pytorch_model.bin'.
Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory.

Upscaling recommendations

For upscaling, it's recommended to use Potat1 via vid2vid in the 1111 extension. Aim for a resolution of 1152x640 and a denoise strength between 0.66 and 0.85. Remember to use the same prompt and settings that were used to generate the original clip.

Known issues

Lower resolutions or fewer frames could lead to suboptimal output.
Certain clips might appear with cuts. This will be fixed in the upcoming 2.1 version, which will incorporate a cleaner dataset. Some clips may playback too slowly, requiring prompt engineering for an increased pace.

Thanks to camenduru, kabachuha, ExponentialML, polyware, tin2tin