NovaAI-0.1
Chinese QA GPT Model
Model Description
This is a Chinese language GPT-like transformer model trained on question-answering pairs. The model is designed to generate helpful, conversational responses to user questions in Chinese. It uses a decoder-only architecture similar to GPT with causal self-attention and is optimized for Chinese language understanding and generation.
Model Details
- Architecture: Decoder-only Transformer (GPT-like)
- Parameters: ~124M parameters (configurable)
- Vocabulary Size: 32,000 (SentencePiece BPE)
- Context Length: 1,024 tokens
- Language: Chinese (Simplified)
- Task: Question Answering / Conversational AI
Model Architecture
The model consists of:
- 12 Transformer layers with causal self-attention
- 12 attention heads per layer
- 768-dimensional embeddings
- SentencePiece tokenizer with BPE encoding for Chinese text
- GELU activation functions
- Layer normalization and residual connections
Training Data
The model was trained on a diverse dataset of Chinese question-answering pairs covering various topics including:
- Gaming and entertainment
- Technology and gadgets
- Health and lifestyle
- Travel and local recommendations
- Relationships and social advice
- General knowledge questions
Training Configuration
- Training Method: Causal Language Modeling (next token prediction)
- Batch Size: 4
- Learning Rate: 3e-4 (AdamW optimizer)
- Epochs: 3
- Dropout: 0.1
- Gradient Clipping: 1.0
Usage
Installation
pip install torch sentencepiece tqdm
Training
python train.py --data_path all.jsonl --spm_model spm.model
Inference
python infer.py --checkpoint checkpoints/checkpoint_epoch3.pt --spm_model spm.model --prompt "你的问题"
Python API
import torch
import sentencepiece as spm
from train import GPT, GPTConfig
# Load model
sp = spm.SentencePieceProcessor()
sp.Load('spm.model')
checkpoint = torch.load('checkpoints/checkpoint_epoch3.pt')
config = GPTConfig(
vocab_size=32000,
n_layer=12,
n_head=12,
n_embd=768,
block_size=1024
)
model = GPT(config)
model.load_state_dict(checkpoint['model_state'])
# Generate response
prompt = "你好,请介绍一下你自己"
ids = sp.EncodeAsIds('<s>' + prompt + '<sep>')
# ... generation logic
Model Performance
The model demonstrates strong performance on:
- Chinese language understanding
- Contextual question answering
- Conversational response generation
- Maintaining coherence over multi-turn conversations
Limitations
- Language: Only supports Chinese (Simplified)
- Context Window: Limited to 1,024 tokens
- Knowledge Cutoff: Based on training data timeframe
- Factual Accuracy: May occasionally produce inaccurate information
- Bias: May reflect biases present in training data
Ethical Considerations
This model is designed for educational and research purposes. Users should be aware that:
- The model may generate responses that seem authoritative but could be factually incorrect
- The model's training data may contain biases
- Generated content should be fact-checked before use in critical applications
Technical Requirements
- Python: 3.6+
- PyTorch: Latest stable version
- CUDA: Optional, for GPU acceleration
- Memory: ~2GB GPU memory for inference
License
This model is released under the MIT License.
license: mit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support