--- license: other base_model: Wan-AI/Wan2.1-T2V-1.3B tags: - wan - video - text-to-video - diffusion-pipe - not-for-all-audiences - lora - template:sd-lora - standard library_name: diffusers pipeline_tag: text-to-video --- # AndroWan T2V 1.3B

💻 Website    |    🤗 Hugging Face    |    💿 Discord
This is a PEFT LoRA derived from [Wan-AI/Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B). It was trained on and produces a small variety of softcore homoerotic content featuring hung twinks and other similarly-endowed males. Subjects were limited to the twink aesthetic morphotype with large, circumcised penises. This is an area where many base models struggle with inclusivity. > 🙈 Hover over the samples to unblur them. Note that the samples may not be suitable for all audiences. ## Version 10 Samples
Image Flow Shift CFG Scale Prompt Negative
Content hidden; Sign in. 4.0 6.0 A muscular nude man with blond hair, holding a beer bottle in a club. He dances a little, his erect penis and saggy testicles visible as he moves slightly, neon lights reflecting off his skin. There are indistinct people in a crowd in the background. 色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
Content hidden; Sign in. 4.0 6.0 A nude muscular man with an erect penis and defined pecs, smiling on a Hawaiian resort balcony. He holds a tropical cocktail, with the ocean and palm trees behind him. He walks towards the camera, smiling, with warm sunlight and tropical plants around him. He winks at the viewer as he approaches. 色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
## Version 32 Samples
Image Flow Shift CFG Scale Prompt Negative
Content hidden; Sign in. 5.0 3.0 a nude man with an erect penis riding a motorcycle, wearing a black helmet, and holding the handlebars, the background is a highway with trees and cars. the camera is in the car ahead of him 色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
Content hidden; Sign in. 5.0 4.0 An attractive nude man with a flaccid penis and a slim build, sitting at a small outdoor ice cream shop. His penis is flaccid with saggy testicles. He has brown eyes, a narrow face, and a strong jaw. Wearing an open jacket and white sneakers with no pants, he spreads his legs while raising a mint ice cream cone to his mouth with his left hand and licks the ice cream with his tongue. 自拍, 色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿
> 🍒 The above results are absolutely cherry-picked, so don't be fooled. The outputs are very much so not perfect and often have false compression artifacts learnt from the source material. ## Prompting Since annotations used a common pseudo-syntax, it's best to format your prompts similarly: ```plaintext A [physique] [nude | {covered by: towel | clothing}] [man | male] with [distinctive features], [and {an erect | a flaccid} penis (if {nude | revealed from coverage})] [and {saggy | } testicles], [performing an action], in [an environment with notable background details]. ``` Here is a higher-level interpretation of this schema without any conditional syntax: ```plaintext [Subject] + [Clothing/Nudity] + [Exposure Status] + [Action] + [Environment] ``` You can find more details regarding the text captions and schemas in the [Annotations](/Annotation.md) document. ## Inference The text encoder **was not** trained. You may reuse the base model text encoder for inference. I recommend using the deafult workflow for Wan 2.1 in ComfyUI [found here](https://comfyanonymous.github.io/ComfyUI_examples/wan/), with the only addition being splicing `LoraLoaderModelOnly` into the model pipeline to load the LoRA. You may also use 🤗 Diffusers to generate videos: ```python import torch from diffusers.utils import export_to_video from diffusers import AutoencoderKLWan, WanPipeline from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler # Model and LoRA settings model_id = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" lora_id = "markury/AndroWan-2.1-T2V-1.3B" lora_file = "safetensors/AndroWan_v10-0092.safetensors" # Specific file in subfolder # Load model vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32) pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16) # Set scheduler pipe.scheduler = UniPCMultistepScheduler.from_config( pipe.scheduler.config, flow_shift=3.0 ) # Move to GPU pipe.to("cuda") # Load LoRA weights with specific file path pipe.load_lora_weights(lora_id, weight_name=lora_file) # Set LoRA scale if hasattr(pipe, "set_adapters_scale"): pipe.set_adapters_scale(1.0) # Enable CPU offload for low VRAM pipe.enable_model_cpu_offload() # Generation parameters prompt = "[your prompt here]" negative_prompt = "" # Generate video generator = torch.Generator("cuda").manual_seed(42) output = pipe( prompt=prompt, negative_prompt=negative_prompt, height=832, width=480, num_frames=33, guidance_scale=4.0, num_inference_steps=20, generator=generator ).frames[0] # Export to video export_to_video(output, "android_output.mp4", fps=16) ``` ## Recommendations for inference The following is taken directly from the [🤗 Diffusers documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan): > - VAE in torch.float32 for better decoding quality. > - num_frames should be of the form 4 * k + 1, for example 49 or 81. > - For smaller resolution videos, try lower values of shift (between 2.0 to 5.0) in the Scheduler. > - For larger resolution videos, try higher values (between 7.0 and 12.0). The default value is 3.0 for Wan. As for adapter-specific recommendations, a lower guidance scale (2-3) seems to reduce smearing/jittering, and higher (4-5) might (?) be better for visual clarity/composition. Running a second pass with a different CFG seems to produce interesting results. YMMV. By fixing the seed and tweaking the CFG, shift, and prompt verbiage, you can effectively avoid many of the artifacts (even the stubborn ones, like **excessive swangin'** and other hallucinated hands-free motion). ## Limitations Prompting and seed-shuffling will still take time. Artifacts like compression-like artifacts, duplicated anatomy, missing anatomy, distorted compositions, and temporal incoherence are inherent to the base model and are exacerbated by the adapter. In favor of better detail convergence, the adapter is slightly overfit. You may wish to reduce the strength and CFG to acheive coherent results, especially as prompts become more complex and diverge from the training distribution. If you are really looking for *lifelike* video, I might suggest something like I2V with a photorealistic image generation using Hunyuan Video, which was trained on more inclusive data. However, even that has limitations. Regardless, photorealism for the purposes of creating deepfakes or hardcore pornography is outside the scope of this project and does not reflect its values. I kindly ask that this model not be re-uploaded or circulated on sites like Civitai or Tensor.art (among others)