Hii!!! This is a side project, so is not the best.
AURORA-Tiny π β¨
Adaptive Unified Reasoning and Organized Reasoning Architecture - Tiny
A ultra-lightweight text diffusion model that generates coherent text through iterative denoising. AURORA-Tiny combines the power of transformer architectures with diffusion processes in a compact, efficient design perfect for local training and experimentation.
The model is 6M parameters.
β¨ Features
- Ultra-Compact Design: Optimized for local training with minimal hardware requirements
- Transformer-based Architecture: Multi-head attention with time conditioning in a tiny footprint
- Diffusion Process: Iterative denoising for high-quality text generation
- Flexible Training: Works with any plain text dataset from Hugging Face
- Efficient Training: Train on CPU or modest GPUs in minutes, not hours
- Prompt-based Generation: Support for both conditional and unconditional generation
π Quick Start
Installation
pip install torch torchvision torchaudio
pip install datasets matplotlib tqdm numpy
Basic Usage
from aurora import DiffusionTrainer, TextTokenizer, DiffusionTransformer, DiffusionSchedule
# Load your dataset (or use built-in loader)
texts = load_hf_dataset("rotten_tomatoes", max_samples=3000)
# Build tokenizer
tokenizer = TextTokenizer(vocab_size=2000)
tokenizer.fit(texts)
# Initialize model
model = DiffusionTransformer(
vocab_size=len(tokenizer.word_to_id),
d_model=256,
n_heads=8,
n_layers=6
)
# Train
trainer = DiffusionTrainer(model, tokenizer, schedule, device='cuda')
trainer.train(train_loader, val_loader, epochs=15)
# Generate text
generated_text = trainer.generate("This movie is", max_length=30)
print(generated_text)
ποΈ Architecture
AURORA-Tiny uses a novel combination of:
- Time-Conditioned Transformers: Each transformer block receives timestep embeddings
- Sinusoidal Time Embeddings: Continuous time representation for the diffusion process
- Linear Noise Schedule: Gradual noise addition during forward diffusion
- DDIM-style Sampling: Deterministic sampling for consistent generation
Model Components
- Token Embedding: Maps discrete tokens to continuous space
- Position Encoding: Learnable positional embeddings
- Time Conditioning: Sinusoidal embeddings injected into each layer
- Multi-Head Attention: Standard transformer attention with time modulation
- Output Projection: Maps back to vocabulary space
Tested on RTX 3060, batch_size=16, 15 epochs. Model size: ~2.4M parameters
ποΈ Configuration
Model Hyperparameters
model_config = {
'vocab_size': 2000, # Vocabulary size
'd_model': 256, # Hidden dimension
'n_heads': 8, # Attention heads
'n_layers': 6, # Transformer layers
'max_seq_len': 64, # Maximum sequence length
'timesteps': 100 # Diffusion timesteps
}
Training Parameters
training_config = {
'batch_size': 16, # Batch size
'learning_rate': 1e-4, # Learning rate
'weight_decay': 0.01, # L2 regularization
'epochs': 15, # Training epochs
'grad_clip': 1.0 # Gradient clipping
}
π Supported Datasets
AURORA-Tiny works with any text dataset from Hugging Face. Pre-configured datasets include:
- rotten_tomatoes - Movie reviews (8.5k samples)
- imdb - Movie reviews (50k samples)
- ag_news - News articles (120k samples)
- poem_sentiment - Poetry (890 samples)
- yelp_review_full - Restaurant reviews (650k samples)
π― Generation Strategies
Conditional Generation
# Generate from a prompt
text = trainer.generate("The movie was", max_length=50, num_steps=20)
Unconditional Generation
# Generate from scratch
text = trainer.generate("", max_length=50, num_steps=20)
Fine-tuned Sampling
# Control generation quality vs speed
text = trainer.generate(
prompt="Breaking news",
max_length=100,
num_steps=50, # More steps = higher quality
)
π¬ Technical Details
Diffusion Process
AURORA-Tiny uses a forward diffusion process that gradually adds Gaussian noise to text embeddings:
q(x_t | x_{t-1}) = N(x_t; β(1-Ξ²_t)x_{t-1}, Ξ²_t I)
The reverse process is learned by the neural network:
p_ΞΈ(x_{t-1} | x_t, t) = N(x_{t-1}; ΞΌ_ΞΈ(x_t, t), Ξ£_ΞΈ(x_t, t))
Training Objective
The model is trained to minimize the variational lower bound:
L = E_t,x_0,Ξ΅ [||Ξ΅ - Ξ΅_ΞΈ(β(αΎ±_t)x_0 + β(1-αΎ±_t)Ξ΅, t)||Β²]
π Monitoring
Training progress is automatically tracked and visualized:
- Loss Curves: Training and validation loss over epochs
- Vocabulary Stats: Word frequency distributions
- Generation Samples: Example outputs during training
π οΈ Customization
Custom Tokenizer
class CustomTokenizer(TextTokenizer):
def __init__(self, vocab_size=5000):
super().__init__(vocab_size)
# Add custom preprocessing
def preprocess(self, text):
# Custom text preprocessing
return text.lower().strip()
Custom Architecture
model = DiffusionTransformer(
vocab_size=vocab_size,
d_model=512, # Larger model
n_heads=16, # More attention heads
n_layers=12, # Deeper network
timesteps=1000 # More diffusion steps
)
π¨ Creative Applications
AURORA-Tiny excels at:
- Story Continuation: Complete narrative fragments
- Style Transfer: Generate text in specific styles
- Creative Writing: Poetry, fiction, and experimental text
- Data Augmentation: Generate synthetic training data
- Content Variation: Create multiple versions of text
π€ Contributing
Contributions welcome! Areas for improvement:
- Better noise schedules (cosine, learned schedules)
- Advanced sampling methods (DPM-Solver, PLMS)
- Larger model architectures
- Multi-modal extensions
- Evaluation benchmarks
AURORA - Where text generation meets the dawn of diffusion π