AndroWan T2V 1.3B
💻 Website | 🤗 Hugging Face | 💿 Discord
This is a PEFT LoRA derived from Wan-AI/Wan2.1-T2V-1.3B.
It was trained on and produces a small variety of softcore homoerotic content featuring hung twinks and other similarly-endowed males. Subjects were limited to the twink aesthetic morphotype with large, circumcised penises. This is an area where many base models struggle with inclusivity.
🙈 Hover over the samples to unblur them. Note that the samples may not be suitable for all audiences.
Version 10 Samples
Version 32 Samples
🍒 The above results are absolutely cherry-picked, so don't be fooled. The outputs are very much so not perfect and often have false compression artifacts learnt from the source material.
Prompting
Since annotations used a common pseudo-syntax, it's best to format your prompts similarly:
A [physique] [nude | {covered by: towel | clothing}] [man | male] with [distinctive features],
[and {an erect | a flaccid} penis (if {nude | revealed from coverage})] [and {saggy | } testicles],
[performing an action],
in [an environment with notable background details].
Here is a higher-level interpretation of this schema without any conditional syntax:
[Subject] + [Clothing/Nudity] + [Exposure Status] + [Action] + [Environment]
You can find more details regarding the text captions and schemas in the Annotations document.
Inference
The text encoder was not trained. You may reuse the base model text encoder for inference.
I recommend using the deafult workflow for Wan 2.1 in ComfyUI found here, with the only addition being splicing LoraLoaderModelOnly
into the model pipeline to load the LoRA.
You may also use 🤗 Diffusers to generate videos:
import torch
from diffusers.utils import export_to_video
from diffusers import AutoencoderKLWan, WanPipeline
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
# Model and LoRA settings
model_id = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
lora_id = "markury/AndroWan-2.1-T2V-1.3B"
lora_file = "safetensors/AndroWan_v10-0092.safetensors" # Specific file in subfolder
# Load model
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
# Set scheduler
pipe.scheduler = UniPCMultistepScheduler.from_config(
pipe.scheduler.config,
flow_shift=3.0
)
# Move to GPU
pipe.to("cuda")
# Load LoRA weights with specific file path
pipe.load_lora_weights(lora_id, weight_name=lora_file)
# Set LoRA scale
if hasattr(pipe, "set_adapters_scale"):
pipe.set_adapters_scale(1.0)
# Enable CPU offload for low VRAM
pipe.enable_model_cpu_offload()
# Generation parameters
prompt = "[your prompt here]"
negative_prompt = ""
# Generate video
generator = torch.Generator("cuda").manual_seed(42)
output = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
height=832,
width=480,
num_frames=33,
guidance_scale=4.0,
num_inference_steps=20,
generator=generator
).frames[0]
# Export to video
export_to_video(output, "android_output.mp4", fps=16)
Recommendations for inference
The following is taken directly from the 🤗 Diffusers documentation:
- VAE in torch.float32 for better decoding quality.
- num_frames should be of the form 4 * k + 1, for example 49 or 81.
- For smaller resolution videos, try lower values of shift (between 2.0 to 5.0) in the Scheduler.
- For larger resolution videos, try higher values (between 7.0 and 12.0). The default value is 3.0 for Wan.
As for adapter-specific recommendations, a lower guidance scale (2-3) seems to reduce smearing/jittering, and higher (4-5) might (?) be better for visual clarity/composition. Running a second pass with a different CFG seems to produce interesting results. YMMV.
By fixing the seed and tweaking the CFG, shift, and prompt verbiage, you can effectively avoid many of the artifacts (even the stubborn ones, like excessive swangin' and other hallucinated hands-free motion).
Limitations
Prompting and seed-shuffling will still take time. Artifacts like compression-like artifacts, duplicated anatomy, missing anatomy, distorted compositions, and temporal incoherence are inherent to the base model and are exacerbated by the adapter.
In favor of better detail convergence, the adapter is slightly overfit. You may wish to reduce the strength and CFG to acheive coherent results, especially as prompts become more complex and diverge from the training distribution.
If you are really looking for lifelike video, I might suggest something like I2V with a photorealistic image generation using Hunyuan Video, which was trained on more inclusive data. However, even that has limitations. Regardless, photorealism for the purposes of creating deepfakes or hardcore pornography is outside the scope of this project and does not reflect its values.
I kindly ask that this model not be re-uploaded or circulated on sites like Civitai or Tensor.art (among others)
- Downloads last month
- 0
Model tree for TheBulge/AndroWan-2.1-T2V-1.3B
Base model
Wan-AI/Wan2.1-T2V-1.3B