trillionlabs
/

Tri-7B-Base

Text Generation

text-generation-inference

Model card Files Files and versions Community

Tri-7B-Base / README.md

juyoung-trl's picture

Update README.md

21be06c verified 22 days ago

|

history blame contribute delete

2.38 kB

	---
	license: apache-2.0
	tags:
	- pretrained
	- base-model
	language:
	- en
	- ko
	- ja
	pipeline_tag: text-generation
	library_name: transformers
	extra_gated_fields:
	Full Name: text
	Email: text
	Organization: text
	---

	<p align="center">
	<picture>
	<img src="https://raw.githubusercontent.com/trillion-labs/.github/main/Tri-7B.png" alt="Tri-7B-Base", style="width: 80%;">
	</picture>
	</p>

	# Tri-7B-Base

	## Introduction

	We present Tri-7B-Base, a foundation language model that serves as the pre-trained base for our Tri-7B model family. This model represents our commitment to efficient training while establishing a strong foundation for downstream fine-tuning and adaptation.

	### Key Features
	* Foundation Architecture: State-of-the-art transformer architecture optimized for efficiency
	* Multi-lingual Foundation: Pre-trained on diverse data in Korean, English, and Japanese
	* Efficient Training: Optimized training methodology for computational efficiency

	### Model Specifications

	#### Tri-7B-Base
	- Type: Causal Language Model
	- Training Stage: Pre-training
	- Architecture: Transformer Decoder with RoPE, SwiGLU, RMSNorm
	- Number of Parameters: 7.76B
	- Number of Layers: 32
	- Number of Attention Heads: 32
	- Context Length: 4,096
	- Vocab Size: 128,128

	## Use Cases

	As a base model, Tri-7B-Base is designed to serve as a foundation for various downstream applications:

	- Fine-tuning: Adapt to specific domains or tasks
	- Instruction Tuning: Create chat or assistant models
	- Domain Specialization: Customize for specific industries or use cases
	- Research: Explore model behaviors and capabilities
	- Language Generation: General text completion and generation tasks

	## Limitations

	- Base Model Nature: This is a pre-trained base model without instruction tuning or alignment. For chat or assistant capabilities, consider fine-tuned variants.
	- Language Support: The model is optimized for English, Korean, and Japanese. Usage with other languages may result in degraded performance.
	- Knowledge Cutoff: The model's information is limited to data available up to February, 2025.
	- Generation Quality: As a base model, outputs may require post-processing or filtering for production use cases.

	## License
	This model is licensed under the Apache License 2.0.

	## Contact
	For inquiries, please contact: [email protected]