QuasarV4
QuasarV4 is a language model featuring an innovative token temperature mechanism that enhances focus on important tokens in a sequence.
Model Architecture
QuasarV4 builds on transformer architecture with several key innovations (Qwen3-Based):
Core Architecture
- Base Size: 500M parameters
- Hidden Size: 1024
- Layers: 28
- Attention Heads: 16
- Key-Value Heads: 8
- Head Dimension: 128
- Intermediate Size: 3072
Token Temperature Mechanism
The signature feature of QuasarV4 is its token temperature mechanism, which dynamically adjusts the importance of each token based on context:
Multi-dimensional Temperature Calculation
- 4-layer temperature projection network
- Position-dependent temperature scaling
- Token importance calculation
- Context-aware scaling
Temperature Aggregation
- 5-layer aggregation network
- Global focus mechanism
- Cross-token attention for temperature refinement
Output Adaptation
- Residual connections with adapted states
- DenseNet-style connections from earlier layers
This mechanism enables QuasarV4 to focus on the most important tokens in a sequence, leading to more coherent and contextually appropriate text generation.
Development
QuasarV4 was developed with a focus on enhancing token-level adaptivity while maintaining the strong foundation of transformer-based language models. The token temperature mechanism represents a novel approach to dynamic token importance that can be applied to various language tasks.
Acknowledgments
We would like to thank the open-source community for their contributions to the field of natural language processing, which have made this work possible thank you Qwen team.
- Downloads last month
- 18