DATAGRID-research
/

DATAGRID-Local-Attention-DiT-v1.0.0-0.52B

PixArtSigmaPipeline

Model card Files Files and versions Community

DATAGRID-research commited on 20 days ago

Commit

0700705

·

verified ·

1 Parent(s): f00d2c7

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -8,6 +8,8 @@ LocalDiT builds upon the architecture of [PixArt-α](https://huggingface.co/PixA
 - **Parameters**: 0.52B
 - **Resolution**: Supports generation up to 1024×1024 pixels
 - **Language Support**: English text prompts
 # Usage
 Details on code execution will be released at a later date.
@@ -33,6 +35,10 @@ image.save("generated_image.png")
    - Implemented window-based local attention in alternating transformer blocks
    - Reduced parameter count through efficient attention design
    - Optimized for both quality and computational efficiency
 # Results
 LocalDiT achieves comparable image quality to PixArt-α while offering:

 - **Parameters**: 0.52B
 - **Resolution**: Supports generation up to 1024×1024 pixels
 - **Language Support**: English text prompts
+- **Text Encoder**: FLAN-T5-XXL (4.3B parameters)
+- **VAE**: SDXL VAE for high-quality latent encoding/decoding
 # Usage
 Details on code execution will be released at a later date.
    - Implemented window-based local attention in alternating transformer blocks
    - Reduced parameter count through efficient attention design
    - Optimized for both quality and computational efficiency
+- **Components**:
+   - Diffusion Backbone: Custom LocalDiT architecture (0.52B parameters)
+   - Text Encoder: FLAN-T5-XXL (4.3B parameters) for rich text embedding
+   - VAE: SDXL's Variational Autoencoder for high-fidelity latent space encoding/decoding
 # Results
 LocalDiT achieves comparable image quality to PixArt-α while offering: