QuasarV4

QuasarV4 is a language model featuring an innovative token temperature mechanism that enhances focus on important tokens in a sequence.

Model Architecture

QuasarV4 builds on transformer architecture with several key innovations (Qwen3-Based):

Core Architecture

Base Size: 500M parameters
Hidden Size: 1024
Layers: 28
Attention Heads: 16
Key-Value Heads: 8
Head Dimension: 128
Intermediate Size: 3072

Token Temperature Mechanism

The signature feature of QuasarV4 is its token temperature mechanism, which dynamically adjusts the importance of each token based on context:

Multi-dimensional Temperature Calculation
- 4-layer temperature projection network
- Position-dependent temperature scaling
- Token importance calculation
- Context-aware scaling
Temperature Aggregation
- 5-layer aggregation network
- Global focus mechanism
- Cross-token attention for temperature refinement
Output Adaptation
- Residual connections with adapted states
- DenseNet-style connections from earlier layers

This mechanism enables QuasarV4 to focus on the most important tokens in a sequence, leading to more coherent and contextually appropriate text generation.

Development

QuasarV4 was developed with a focus on enhancing token-level adaptivity while maintaining the strong foundation of transformer-based language models. The token temperature mechanism represents a novel approach to dynamic token importance that can be applied to various language tasks.

Acknowledgments

We would like to thank the open-source community for their contributions to the field of natural language processing, which have made this work possible thank you Qwen team.

silx-ai
/

QuasarV4-600M-Transformer

QuasarV4

Model Architecture

Core Architecture

Token Temperature Mechanism

Development

Acknowledgments

Collection including silx-ai/QuasarV4-600M-Transformer

QuasarV4