QuasarV4

QuasarV4 is a language model featuring an innovative token temperature mechanism that enhances focus on important tokens in a sequence.

Model Architecture

QuasarV4 builds on transformer architecture with several key innovations (Qwen3-Based):

Core Architecture

  • Base Size: 500M parameters
  • Hidden Size: 1024
  • Layers: 28
  • Attention Heads: 16
  • Key-Value Heads: 8
  • Head Dimension: 128
  • Intermediate Size: 3072

Token Temperature Mechanism

The signature feature of QuasarV4 is its token temperature mechanism, which dynamically adjusts the importance of each token based on context:

  1. Multi-dimensional Temperature Calculation

    • 4-layer temperature projection network
    • Position-dependent temperature scaling
    • Token importance calculation
    • Context-aware scaling
  2. Temperature Aggregation

    • 5-layer aggregation network
    • Global focus mechanism
    • Cross-token attention for temperature refinement
  3. Output Adaptation

    • Residual connections with adapted states
    • DenseNet-style connections from earlier layers

This mechanism enables QuasarV4 to focus on the most important tokens in a sequence, leading to more coherent and contextually appropriate text generation.

Development

QuasarV4 was developed with a focus on enhancing token-level adaptivity while maintaining the strong foundation of transformer-based language models. The token temperature mechanism represents a novel approach to dynamic token importance that can be applied to various language tasks.

Acknowledgments

We would like to thank the open-source community for their contributions to the field of natural language processing, which have made this work possible thank you Qwen team.

Downloads last month
18
Safetensors
Model size
600M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including silx-ai/QuasarV4-600M-Transformer