AndroWan T2V 1.3B

💻 Website | 🤗 Hugging Face | 💿 Discord

This is a PEFT LoRA derived from Wan-AI/Wan2.1-T2V-1.3B.

It was trained on and produces a small variety of softcore homoerotic content featuring hung twinks and other similarly-endowed males. Subjects were limited to the twink aesthetic morphotype with large, circumcised penises. This is an area where many base models struggle with inclusivity.

🙈 Hover over the samples to unblur them. Note that the samples may not be suitable for all audiences.

Version 10 Samples

Image	Flow Shift	CFG Scale	Prompt	Negative
	4.0	6.0	A muscular nude man with blond hair, holding a beer bottle in a club. He dances a little, his erect penis and saggy testicles visible as he moves slightly, neon lights reflecting off his skin. There are indistinct people in a crowd in the background.	色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走
	4.0	6.0	A nude muscular man with an erect penis and defined pecs, smiling on a Hawaiian resort balcony. He holds a tropical cocktail, with the ocean and palm trees behind him. He walks towards the camera, smiling, with warm sunlight and tropical plants around him. He winks at the viewer as he approaches.	色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走

Version 32 Samples

Image	Flow Shift	CFG Scale	Prompt	Negative
	5.0	3.0	a nude man with an erect penis riding a motorcycle, wearing a black helmet, and holding the handlebars, the background is a highway with trees and cars. the camera is in the car ahead of him	色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走
	5.0	4.0	An attractive nude man with a flaccid penis and a slim build, sitting at a small outdoor ice cream shop. His penis is flaccid with saggy testicles. He has brown eyes, a narrow face, and a strong jaw. Wearing an open jacket and white sneakers with no pants, he spreads his legs while raising a mint ice cream cone to his mouth with his left hand and licks the ice cream with his tongue.	自拍, 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿

🍒 The above results are absolutely cherry-picked, so don't be fooled. The outputs are very much so not perfect and often have false compression artifacts learnt from the source material.

Prompting

Since annotations used a common pseudo-syntax, it's best to format your prompts similarly:

A [physique] [nude | {covered by: towel | clothing}] [man | male] with [distinctive features], 
  [and {an erect | a flaccid} penis (if {nude | revealed from coverage})] [and {saggy | } testicles],  
  [performing an action],  
  in [an environment with notable background details].

Here is a higher-level interpretation of this schema without any conditional syntax:

[Subject] + [Clothing/Nudity] + [Exposure Status] + [Action] + [Environment]

You can find more details regarding the text captions and schemas in the Annotations document.

Inference

The text encoder was not trained. You may reuse the base model text encoder for inference.

I recommend using the deafult workflow for Wan 2.1 in ComfyUI found here, with the only addition being splicing LoraLoaderModelOnly into the model pipeline to load the LoRA.

You may also use 🤗 Diffusers to generate videos:

import torch
from diffusers.utils import export_to_video
from diffusers import AutoencoderKLWan, WanPipeline
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler

# Model and LoRA settings
model_id = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
lora_id = "markury/AndroWan-2.1-T2V-1.3B"
lora_file = "safetensors/AndroWan_v10-0092.safetensors"  # Specific file in subfolder

# Load model
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)

# Set scheduler
pipe.scheduler = UniPCMultistepScheduler.from_config(
    pipe.scheduler.config,
    flow_shift=3.0
)

# Move to GPU
pipe.to("cuda")

# Load LoRA weights with specific file path
pipe.load_lora_weights(lora_id, weight_name=lora_file)

# Set LoRA scale
if hasattr(pipe, "set_adapters_scale"):
    pipe.set_adapters_scale(1.0)

# Enable CPU offload for low VRAM
pipe.enable_model_cpu_offload()

# Generation parameters
prompt = "[your prompt here]"
negative_prompt = ""

# Generate video
generator = torch.Generator("cuda").manual_seed(42)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=832,
    width=480,
    num_frames=33,
    guidance_scale=4.0,
    num_inference_steps=20,
    generator=generator
).frames[0]

# Export to video
export_to_video(output, "android_output.mp4", fps=16)

Recommendations for inference

The following is taken directly from the 🤗 Diffusers documentation:

VAE in torch.float32 for better decoding quality.

num_frames should be of the form 4 * k + 1, for example 49 or 81.

For smaller resolution videos, try lower values of shift (between 2.0 to 5.0) in the Scheduler.

For larger resolution videos, try higher values (between 7.0 and 12.0). The default value is 3.0 for Wan.

As for adapter-specific recommendations, a lower guidance scale (2-3) seems to reduce smearing/jittering, and higher (4-5) might (?) be better for visual clarity/composition. Running a second pass with a different CFG seems to produce interesting results. YMMV.

By fixing the seed and tweaking the CFG, shift, and prompt verbiage, you can effectively avoid many of the artifacts (even the stubborn ones, like excessive swangin' and other hallucinated hands-free motion).

Limitations

Prompting and seed-shuffling will still take time. Artifacts like compression-like artifacts, duplicated anatomy, missing anatomy, distorted compositions, and temporal incoherence are inherent to the base model and are exacerbated by the adapter.

In favor of better detail convergence, the adapter is slightly overfit. You may wish to reduce the strength and CFG to acheive coherent results, especially as prompts become more complex and diverge from the training distribution.

If you are really looking for lifelike video, I might suggest something like I2V with a photorealistic image generation using Hunyuan Video, which was trained on more inclusive data. However, even that has limitations. Regardless, photorealism for the purposes of creating deepfakes or hardcore pornography is outside the scope of this project and does not reflect its values.

I kindly ask that this model not be re-uploaded or circulated on sites like Civitai or Tensor.art (among others)

TheBulge
/

AndroWan-2.1-T2V-1.3B

You need to agree to share your contact information to access this model

AndroWan T2V 1.3B

Version 10 Samples

Version 32 Samples

Prompting

Inference

Recommendations for inference

Limitations

Model tree for TheBulge/AndroWan-2.1-T2V-1.3B

Spaces using TheBulge/AndroWan-2.1-T2V-1.3B 2

Collection including TheBulge/AndroWan-2.1-T2V-1.3B

Adapters