Important
Use this model instead of this one -> caT-text-to-video-2.3b
caT text to video
Conditionally augmented text-to-video model. Uses pre-trained weights from modelscope text-to-video model, augmented with temporal conditioning transformers to extend generated clips and create a smooth transition between them. Supports prompt interpolation as well to change scenes during clip extensions.
This model was trained at home as a hobby.
Do not expect high quality samples.
Installation
Clone the Repository
git clone https://github.com/motexture/caT-text-to-video.git
cd caT-text-to-video
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
python run.py
Visit the provided URL in your browser to interact with the interface and start generating videos.
Example:
"Darth Vader is surfing on the ocean -> Darth Vader is walking on the beach"
- Downloads last month
- 63
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the HF Inference API does not support diffusers models with pipeline type text-to-video
Model tree for motexture/caT-text-to-video
Base model
ali-vilab/text-to-video-ms-1.7b