NovaAI-0.1

Chinese QA GPT Model

Model Description

This is a Chinese language GPT-like transformer model trained on question-answering pairs. The model is designed to generate helpful, conversational responses to user questions in Chinese. It uses a decoder-only architecture similar to GPT with causal self-attention and is optimized for Chinese language understanding and generation.

Model Details

  • Architecture: Decoder-only Transformer (GPT-like)
  • Parameters: ~124M parameters (configurable)
  • Vocabulary Size: 32,000 (SentencePiece BPE)
  • Context Length: 1,024 tokens
  • Language: Chinese (Simplified)
  • Task: Question Answering / Conversational AI

Model Architecture

The model consists of:

  • 12 Transformer layers with causal self-attention
  • 12 attention heads per layer
  • 768-dimensional embeddings
  • SentencePiece tokenizer with BPE encoding for Chinese text
  • GELU activation functions
  • Layer normalization and residual connections

Training Data

The model was trained on a diverse dataset of Chinese question-answering pairs covering various topics including:

  • Gaming and entertainment
  • Technology and gadgets
  • Health and lifestyle
  • Travel and local recommendations
  • Relationships and social advice
  • General knowledge questions

Training Configuration

  • Training Method: Causal Language Modeling (next token prediction)
  • Batch Size: 4
  • Learning Rate: 3e-4 (AdamW optimizer)
  • Epochs: 3
  • Dropout: 0.1
  • Gradient Clipping: 1.0

Usage

Installation

pip install torch sentencepiece tqdm

Training

python train.py --data_path all.jsonl --spm_model spm.model

Inference

python infer.py --checkpoint checkpoints/checkpoint_epoch3.pt --spm_model spm.model --prompt "你的问题"

Python API

import torch
import sentencepiece as spm
from train import GPT, GPTConfig

# Load model
sp = spm.SentencePieceProcessor()
sp.Load('spm.model')

checkpoint = torch.load('checkpoints/checkpoint_epoch3.pt')
config = GPTConfig(
    vocab_size=32000,
    n_layer=12,
    n_head=12,
    n_embd=768,
    block_size=1024
)
model = GPT(config)
model.load_state_dict(checkpoint['model_state'])

# Generate response
prompt = "你好,请介绍一下你自己"
ids = sp.EncodeAsIds('<s>' + prompt + '<sep>')
# ... generation logic

Model Performance

The model demonstrates strong performance on:

  • Chinese language understanding
  • Contextual question answering
  • Conversational response generation
  • Maintaining coherence over multi-turn conversations

Limitations

  • Language: Only supports Chinese (Simplified)
  • Context Window: Limited to 1,024 tokens
  • Knowledge Cutoff: Based on training data timeframe
  • Factual Accuracy: May occasionally produce inaccurate information
  • Bias: May reflect biases present in training data

Ethical Considerations

This model is designed for educational and research purposes. Users should be aware that:

  • The model may generate responses that seem authoritative but could be factually incorrect
  • The model's training data may contain biases
  • Generated content should be fact-checked before use in critical applications

Technical Requirements

  • Python: 3.6+
  • PyTorch: Latest stable version
  • CUDA: Optional, for GPU acceleration
  • Memory: ~2GB GPU memory for inference

License

This model is released under the MIT License.


license: mit

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support