File size: 2,383 Bytes

f270120
3dd3ecc
 
 
 
 
 
 
 
 
f270120
3dd3ecc
 
 
 
f270120
 
3dd3ecc
 
21be06c
3dd3ecc
 
f270120
3dd3ecc
f270120
3dd3ecc
f270120
3dd3ecc
f270120
3dd3ecc
 
 
 
f270120
3dd3ecc
f270120
3dd3ecc
 
 
 
 
 
 
 
 
f270120
3dd3ecc
f270120
3dd3ecc
f270120
3dd3ecc
 
 
 
 
f270120
3dd3ecc
f270120
3dd3ecc
 
 
 
f270120
3dd3ecc
 
f270120
3dd3ecc

---
license: apache-2.0
tags:
- pretrained
- base-model
language:
- en
- ko
- ja
pipeline_tag: text-generation
library_name: transformers
extra_gated_fields:
  Full Name: text
  Email: text
  Organization: text
---

<p align="center">
<picture>
  <img src="https://raw.githubusercontent.com/trillion-labs/.github/main/Tri-7B.png" alt="Tri-7B-Base", style="width: 80%;">
</picture>
</p>

# Tri-7B-Base

## Introduction

We present **Tri-7B-Base**, a foundation language model that serves as the pre-trained base for our Tri-7B model family. This model represents our commitment to efficient training while establishing a strong foundation for downstream fine-tuning and adaptation.

### Key Features
* **Foundation Architecture**: State-of-the-art transformer architecture optimized for efficiency
* **Multi-lingual Foundation**: Pre-trained on diverse data in Korean, English, and Japanese
* **Efficient Training**: Optimized training methodology for computational efficiency

### Model Specifications

#### Tri-7B-Base
- Type: Causal Language Model
- Training Stage: Pre-training
- Architecture: Transformer Decoder with RoPE, SwiGLU, RMSNorm
- Number of Parameters: 7.76B
- Number of Layers: 32
- Number of Attention Heads: 32
- Context Length: 4,096
- Vocab Size: 128,128

## Use Cases

As a base model, Tri-7B-Base is designed to serve as a foundation for various downstream applications:

- **Fine-tuning**: Adapt to specific domains or tasks
- **Instruction Tuning**: Create chat or assistant models
- **Domain Specialization**: Customize for specific industries or use cases
- **Research**: Explore model behaviors and capabilities
- **Language Generation**: General text completion and generation tasks

## Limitations

- **Base Model Nature**: This is a pre-trained base model without instruction tuning or alignment. For chat or assistant capabilities, consider fine-tuned variants.
- **Language Support**: The model is optimized for English, Korean, and Japanese. Usage with other languages may result in degraded performance.
- **Knowledge Cutoff**: The model's information is limited to data available up to February, 2025.
- **Generation Quality**: As a base model, outputs may require post-processing or filtering for production use cases.

## License
This model is licensed under the Apache License 2.0.

## Contact
For inquiries, please contact: [email protected]