Lightricks
/

LTX-Video

@@ -92,6 +92,72 @@ python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --height HEIGHT --width
 python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
 ```
 ## Limitations
 - This model is not intended or able to provide factual information.
 - As a statistical model this checkpoint might amplify existing societal biases.

 python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
 ```
+### Diffusers 🧨
+LTX Video is compatible with the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index). It supports both text-to-video and image-to-video generation.
+Make sure you install `diffusers` before trying out the examples below.
+```bash
+pip install -U git+https://github.com/huggingface/diffusers
+```
+Now, you can run the examples below:
+```py
+import torch
+from diffusers import LTXPipeline
+from diffusers.utils import export_to_video
+pipe = LTXPipeline.from_pretrained("Lightricks/LTX-Video", torch_dtype=torch.bfloat16)
+pipe.to("cuda")
+prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
+negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
+video = pipe(
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    width=704,
+    height=480,
+    num_frames=161,
+    num_inference_steps=50,
+).frames[0]
+export_to_video(video, "output.mp4", fps=24)
+```
+For image-to-video:
+```py
+import torch
+from diffusers import LTXImageToVideoPipeline
+from diffusers.utils import export_to_video, load_image
+pipe = LTXImageToVideoPipeline.from_pretrained("Lightricks/LTX-Video", torch_dtype=torch.bfloat16)
+pipe.to("cuda")
+image = load_image(
+    "https://huggingface.co/datasets/a-r-r-o-w/tiny-meme-dataset-captioned/resolve/main/images/8.png"
+)
+prompt = "A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene."
+negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
+video = pipe(
+    image=image,
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    width=704,
+    height=480,
+    num_frames=161,
+    num_inference_steps=50,
+).frames[0]
+export_to_video(video, "output.mp4", fps=24)
+```
+To learn more, check out the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
+Diffusers also supports directly loading from the original LTX checkpoints using the `from_single_file()` method. Check out [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video#loading-single-files) to learn more.
 ## Limitations
 - This model is not intended or able to provide factual information.
 - As a statistical model this checkpoint might amplify existing societal biases.