DATAGRID-research commited on
Commit
0700705
·
verified ·
1 Parent(s): f00d2c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -8,6 +8,8 @@ LocalDiT builds upon the architecture of [PixArt-α](https://huggingface.co/PixA
8
  - **Parameters**: 0.52B
9
  - **Resolution**: Supports generation up to 1024×1024 pixels
10
  - **Language Support**: English text prompts
 
 
11
 
12
  # Usage
13
  Details on code execution will be released at a later date.
@@ -33,6 +35,10 @@ image.save("generated_image.png")
33
  - Implemented window-based local attention in alternating transformer blocks
34
  - Reduced parameter count through efficient attention design
35
  - Optimized for both quality and computational efficiency
 
 
 
 
36
 
37
  # Results
38
  LocalDiT achieves comparable image quality to PixArt-α while offering:
 
8
  - **Parameters**: 0.52B
9
  - **Resolution**: Supports generation up to 1024×1024 pixels
10
  - **Language Support**: English text prompts
11
+ - **Text Encoder**: FLAN-T5-XXL (4.3B parameters)
12
+ - **VAE**: SDXL VAE for high-quality latent encoding/decoding
13
 
14
  # Usage
15
  Details on code execution will be released at a later date.
 
35
  - Implemented window-based local attention in alternating transformer blocks
36
  - Reduced parameter count through efficient attention design
37
  - Optimized for both quality and computational efficiency
38
+ - **Components**:
39
+ - Diffusion Backbone: Custom LocalDiT architecture (0.52B parameters)
40
+ - Text Encoder: FLAN-T5-XXL (4.3B parameters) for rich text embedding
41
+ - VAE: SDXL's Variational Autoencoder for high-fidelity latent space encoding/decoding
42
 
43
  # Results
44
  LocalDiT achieves comparable image quality to PixArt-α while offering: