File size: 6,418 Bytes

---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
---

> [!NOTE]  
> Hii!!! This is a side project, so is not the best.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/dGyEQuQNl80XhlXvGrGGF.png)

# AURORA-Tiny 🌅✨
*Adaptive Unified Reasoning and Organized Reasoning Architecture - Tiny*

A ultra-lightweight text diffusion model that generates coherent text through iterative denoising. AURORA-Tiny combines the power of transformer architectures with diffusion processes in a compact, efficient design perfect for local training and experimentation.

>[!NOTE]
> The model is 6M parameters.

## ✨ Features

- **Ultra-Compact Design**: Optimized for local training with minimal hardware requirements
- **Transformer-based Architecture**: Multi-head attention with time conditioning in a tiny footprint
- **Diffusion Process**: Iterative denoising for high-quality text generation  
- **Flexible Training**: Works with any plain text dataset from Hugging Face
- **Efficient Training**: Train on CPU or modest GPUs in minutes, not hours
- **Prompt-based Generation**: Support for both conditional and unconditional generation

## 🚀 Quick Start

### Installation

```bash
pip install torch torchvision torchaudio
pip install datasets matplotlib tqdm numpy
```

### Basic Usage

```python
from aurora import DiffusionTrainer, TextTokenizer, DiffusionTransformer, DiffusionSchedule

# Load your dataset (or use built-in loader)
texts = load_hf_dataset("rotten_tomatoes", max_samples=3000)

# Build tokenizer
tokenizer = TextTokenizer(vocab_size=2000)
tokenizer.fit(texts)

# Initialize model
model = DiffusionTransformer(
    vocab_size=len(tokenizer.word_to_id),
    d_model=256,
    n_heads=8,
    n_layers=6
)

# Train
trainer = DiffusionTrainer(model, tokenizer, schedule, device='cuda')
trainer.train(train_loader, val_loader, epochs=15)

# Generate text
generated_text = trainer.generate("This movie is", max_length=30)
print(generated_text)
```

## 🏗️ Architecture

AURORA-Tiny uses a novel combination of:

1. **Time-Conditioned Transformers**: Each transformer block receives timestep embeddings
2. **Sinusoidal Time Embeddings**: Continuous time representation for the diffusion process  
3. **Linear Noise Schedule**: Gradual noise addition during forward diffusion
4. **DDIM-style Sampling**: Deterministic sampling for consistent generation

### Model Components

- **Token Embedding**: Maps discrete tokens to continuous space
- **Position Encoding**: Learnable positional embeddings
- **Time Conditioning**: Sinusoidal embeddings injected into each layer
- **Multi-Head Attention**: Standard transformer attention with time modulation
- **Output Projection**: Maps back to vocabulary space

*Tested on RTX 3060, batch_size=16, 15 epochs. Model size: ~2.4M parameters*

## 🎛️ Configuration

### Model Hyperparameters

```python
model_config = {
    'vocab_size': 2000,      # Vocabulary size
    'd_model': 256,          # Hidden dimension
    'n_heads': 8,            # Attention heads
    'n_layers': 6,           # Transformer layers
    'max_seq_len': 64,       # Maximum sequence length
    'timesteps': 100         # Diffusion timesteps
}
```

### Training Parameters

```python
training_config = {
    'batch_size': 16,        # Batch size
    'learning_rate': 1e-4,   # Learning rate
    'weight_decay': 0.01,    # L2 regularization
    'epochs': 15,            # Training epochs
    'grad_clip': 1.0         # Gradient clipping
}
```

## 📚 Supported Datasets

AURORA-Tiny works with any text dataset from Hugging Face. Pre-configured datasets include:

- **rotten_tomatoes** - Movie reviews (8.5k samples)
- **imdb** - Movie reviews (50k samples) 
- **ag_news** - News articles (120k samples)
- **poem_sentiment** - Poetry (890 samples)
- **yelp_review_full** - Restaurant reviews (650k samples)

## 🎯 Generation Strategies

### Conditional Generation
```python
# Generate from a prompt
text = trainer.generate("The movie was", max_length=50, num_steps=20)
```

### Unconditional Generation
```python
# Generate from scratch
text = trainer.generate("", max_length=50, num_steps=20)
```

### Fine-tuned Sampling
```python
# Control generation quality vs speed
text = trainer.generate(
    prompt="Breaking news",
    max_length=100,
    num_steps=50,  # More steps = higher quality
)
```

## 🔬 Technical Details

### Diffusion Process

AURORA-Tiny uses a forward diffusion process that gradually adds Gaussian noise to text embeddings:

```
q(x_t | x_{t-1}) = N(x_t; √(1-β_t)x_{t-1}, β_t I)
```

The reverse process is learned by the neural network:

```
p_θ(x_{t-1} | x_t, t) = N(x_{t-1}; μ_θ(x_t, t), Σ_θ(x_t, t))
```

### Training Objective

The model is trained to minimize the variational lower bound:

```
L = E_t,x_0,ε [||ε - ε_θ(√(ᾱ_t)x_0 + √(1-ᾱ_t)ε, t)||²]
```

## 📈 Monitoring

Training progress is automatically tracked and visualized:

- **Loss Curves**: Training and validation loss over epochs
- **Vocabulary Stats**: Word frequency distributions  
- **Generation Samples**: Example outputs during training

## 🛠️ Customization

### Custom Tokenizer
```python
class CustomTokenizer(TextTokenizer):
    def __init__(self, vocab_size=5000):
        super().__init__(vocab_size)
        # Add custom preprocessing
        
    def preprocess(self, text):
        # Custom text preprocessing
        return text.lower().strip()
```

### Custom Architecture
```python
model = DiffusionTransformer(
    vocab_size=vocab_size,
    d_model=512,       # Larger model
    n_heads=16,        # More attention heads  
    n_layers=12,       # Deeper network
    timesteps=1000     # More diffusion steps
)
```

## 🎨 Creative Applications

AURORA-Tiny excels at:

- **Story Continuation**: Complete narrative fragments
- **Style Transfer**: Generate text in specific styles  
- **Creative Writing**: Poetry, fiction, and experimental text
- **Data Augmentation**: Generate synthetic training data
- **Content Variation**: Create multiple versions of text

## 🤝 Contributing

Contributions welcome! Areas for improvement:

- Better noise schedules (cosine, learned schedules)
- Advanced sampling methods (DPM-Solver, PLMS)
- Larger model architectures
- Multi-modal extensions
- Evaluation benchmarks

---

*AURORA - Where text generation meets the dawn of diffusion* 🌅