Linformer-based Language Model

Efficient language modeling optimized for long sequences using the Linformer architecture. This model reduces memory and computational overhead, making it ideal for various text generation tasks.

Introduction
Architecture
Installation
Quick Start
Inference Parameters
Hyperparameters
Training Progress
Sponsorship
License

Introduction

The Linformer-based Language Model leverages the Linformer architecture to efficiently handle long sequences in text generation and other language tasks. By optimizing the self-attention mechanism, this model maintains high performance while reducing resource consumption, making it suitable for applications like text completion and generation.

Architecture

Built upon the Linformer Transformer, the model incorporates several key innovations:

Efficient Attention: Reduces self-attention complexity from quadratic to linear by projecting the attention matrix into a lower-dimensional space.
Low-Rank Linear Projections: Utilizes LowRankLinear layers to decrease dimensionality without compromising expressiveness.
Self-Attention Mechanism: Implements multi-head self-attention with full expressivity by avoiding low-rank projections in this module.
Factorized Feed-Forward Layers: Uses factorized LowRankLinear layers in the Feed-Forward Neural Network to maintain performance with fewer parameters.
PreNorm with LayerNorm and LayerScale: Applies Layer Normalization before attention and feed-forward layers, enhanced with LayerScale for better gradient flow and stability.
Dropout & Residual Connections: Incorporates dropout for regularization and residual connections to aid in gradient flow and prevent vanishing gradients.

Installation

Install the lumenspark package via pip:

pip install lumenspark

This command installs the Linformer-based language model along with all necessary dependencies.

Training Progress

Below is the training loss plot that shows the progress made during the model training process:

Quick Start

Load the pre-trained model and tokenizer from Hugging Face to perform text generation:

from lumenspark import LumensparkModel
import torch

# 1. Set up the device (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# 2. Load the model and move it to the device
model = LumensparkModel.from_pretrained("anto18671/lumenspark").to(device)

# 3. Example input text
input_text = "Once upon a time"

# 4. Generate text
output_text = model.generate(
    input_text,
    max_length=100,        # Maximum length of the generated sequence
    temperature=0.7,       # Controls randomness in predictions
    top_k=50,              # Top-k sampling to filter high-probability tokens
    top_p=0.9,             # Nucleus sampling to control diversity
    repetition_penalty=1.2 # Penalize repetition
)

# 5. Print the generated text
print(output_text)

Inference Parameters

Customize text generation using the following parameters:

max_length: Maximum length of the generated sequence.
temperature: Controls randomness (lower = more deterministic).
top_k: Limits sampling to top k tokens.
top_p: Nucleus sampling based on cumulative probability p.
repetition_penalty: Penalizes repeated tokens or phrases.
no_repeat_ngram_size: Prevents repeated n-grams of specified size.

Hyperparameters

Optimized for performance and efficiency:

vocab_size: 50,257
embed_dim: 768
depth: 8 layers
heads: 8 attention heads
seq_length: 768 tokens
dropout: 1/17
k: 384 (attention projection)
rank: 256 (low-rank projections)

Acknowledgements

We would like to extend our gratitude to RunPod for their generous sponsorship, supporting the training and development of Lumenspark. Their contribution has been instrumental in pushing the project forward.

Sponsorship

Support the ongoing development of Lumenspark!

How to Sponsor

Visit GitHub Sponsors and choose a sponsorship tier that suits you. Thank you for your support!

License

This project is licensed under the MIT License.

anto18671
/

lumenspark