Hii!!! This is a side project, so is not the best.

image/png

AURORA-Tiny πŸŒ…βœ¨

Adaptive Unified Reasoning and Organized Reasoning Architecture - Tiny

A ultra-lightweight text diffusion model that generates coherent text through iterative denoising. AURORA-Tiny combines the power of transformer architectures with diffusion processes in a compact, efficient design perfect for local training and experimentation.

The model is 6M parameters.

✨ Features

  • Ultra-Compact Design: Optimized for local training with minimal hardware requirements
  • Transformer-based Architecture: Multi-head attention with time conditioning in a tiny footprint
  • Diffusion Process: Iterative denoising for high-quality text generation
  • Flexible Training: Works with any plain text dataset from Hugging Face
  • Efficient Training: Train on CPU or modest GPUs in minutes, not hours
  • Prompt-based Generation: Support for both conditional and unconditional generation

πŸš€ Quick Start

Installation

pip install torch torchvision torchaudio
pip install datasets matplotlib tqdm numpy

Basic Usage

from aurora import DiffusionTrainer, TextTokenizer, DiffusionTransformer, DiffusionSchedule

# Load your dataset (or use built-in loader)
texts = load_hf_dataset("rotten_tomatoes", max_samples=3000)

# Build tokenizer
tokenizer = TextTokenizer(vocab_size=2000)
tokenizer.fit(texts)

# Initialize model
model = DiffusionTransformer(
    vocab_size=len(tokenizer.word_to_id),
    d_model=256,
    n_heads=8,
    n_layers=6
)

# Train
trainer = DiffusionTrainer(model, tokenizer, schedule, device='cuda')
trainer.train(train_loader, val_loader, epochs=15)

# Generate text
generated_text = trainer.generate("This movie is", max_length=30)
print(generated_text)

πŸ—οΈ Architecture

AURORA-Tiny uses a novel combination of:

  1. Time-Conditioned Transformers: Each transformer block receives timestep embeddings
  2. Sinusoidal Time Embeddings: Continuous time representation for the diffusion process
  3. Linear Noise Schedule: Gradual noise addition during forward diffusion
  4. DDIM-style Sampling: Deterministic sampling for consistent generation

Model Components

  • Token Embedding: Maps discrete tokens to continuous space
  • Position Encoding: Learnable positional embeddings
  • Time Conditioning: Sinusoidal embeddings injected into each layer
  • Multi-Head Attention: Standard transformer attention with time modulation
  • Output Projection: Maps back to vocabulary space

Tested on RTX 3060, batch_size=16, 15 epochs. Model size: ~2.4M parameters

πŸŽ›οΈ Configuration

Model Hyperparameters

model_config = {
    'vocab_size': 2000,      # Vocabulary size
    'd_model': 256,          # Hidden dimension
    'n_heads': 8,            # Attention heads
    'n_layers': 6,           # Transformer layers
    'max_seq_len': 64,       # Maximum sequence length
    'timesteps': 100         # Diffusion timesteps
}

Training Parameters

training_config = {
    'batch_size': 16,        # Batch size
    'learning_rate': 1e-4,   # Learning rate
    'weight_decay': 0.01,    # L2 regularization
    'epochs': 15,            # Training epochs
    'grad_clip': 1.0         # Gradient clipping
}

πŸ“š Supported Datasets

AURORA-Tiny works with any text dataset from Hugging Face. Pre-configured datasets include:

  • rotten_tomatoes - Movie reviews (8.5k samples)
  • imdb - Movie reviews (50k samples)
  • ag_news - News articles (120k samples)
  • poem_sentiment - Poetry (890 samples)
  • yelp_review_full - Restaurant reviews (650k samples)

🎯 Generation Strategies

Conditional Generation

# Generate from a prompt
text = trainer.generate("The movie was", max_length=50, num_steps=20)

Unconditional Generation

# Generate from scratch
text = trainer.generate("", max_length=50, num_steps=20)

Fine-tuned Sampling

# Control generation quality vs speed
text = trainer.generate(
    prompt="Breaking news",
    max_length=100,
    num_steps=50,  # More steps = higher quality
)

πŸ”¬ Technical Details

Diffusion Process

AURORA-Tiny uses a forward diffusion process that gradually adds Gaussian noise to text embeddings:

q(x_t | x_{t-1}) = N(x_t; √(1-β_t)x_{t-1}, β_t I)

The reverse process is learned by the neural network:

p_ΞΈ(x_{t-1} | x_t, t) = N(x_{t-1}; ΞΌ_ΞΈ(x_t, t), Ξ£_ΞΈ(x_t, t))

Training Objective

The model is trained to minimize the variational lower bound:

L = E_t,x_0,Ρ [||Ρ - Ρ_θ(√(ᾱ_t)x_0 + √(1-ᾱ_t)Ρ, t)||²]

πŸ“ˆ Monitoring

Training progress is automatically tracked and visualized:

  • Loss Curves: Training and validation loss over epochs
  • Vocabulary Stats: Word frequency distributions
  • Generation Samples: Example outputs during training

πŸ› οΈ Customization

Custom Tokenizer

class CustomTokenizer(TextTokenizer):
    def __init__(self, vocab_size=5000):
        super().__init__(vocab_size)
        # Add custom preprocessing
        
    def preprocess(self, text):
        # Custom text preprocessing
        return text.lower().strip()

Custom Architecture

model = DiffusionTransformer(
    vocab_size=vocab_size,
    d_model=512,       # Larger model
    n_heads=16,        # More attention heads  
    n_layers=12,       # Deeper network
    timesteps=1000     # More diffusion steps
)

🎨 Creative Applications

AURORA-Tiny excels at:

  • Story Continuation: Complete narrative fragments
  • Style Transfer: Generate text in specific styles
  • Creative Writing: Poetry, fiction, and experimental text
  • Data Augmentation: Generate synthetic training data
  • Content Variation: Create multiple versions of text

🀝 Contributing

Contributions welcome! Areas for improvement:

  • Better noise schedules (cosine, learned schedules)
  • Advanced sampling methods (DPM-Solver, PLMS)
  • Larger model architectures
  • Multi-modal extensions
  • Evaluation benchmarks

AURORA - Where text generation meets the dawn of diffusion πŸŒ…

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support