SSLLM

SSLLM is a 218M parameter decoder-only transformer language model created for testing and experimental purposes. This model has been converted to HuggingFace format.

Model Details

Model Description

  • Developed by: Chang Sau Sheong
  • Model type: Causal Language Model (Decoder-only Transformer)
  • Language(s): English
  • License: MIT
  • Parameters: 217.92M
  • Architecture: Custom SSLLM Transformer with HuggingFace compatibility

Model Architecture

- Model Dimension (d_model): 768
- Attention Heads: 12
- Transformer Layers: 10
- Feed-Forward Dimension: 2560
- Vocabulary Size: 100,277 (cl100k_base)
- Maximum Sequence Length: 1024 tokens
- Position Embeddings: Learned absolute positional embeddings
- Attention: Multi-head self-attention with causal masking
- Activation: GELU
- Layer Normalization: Pre-layer norm architecture

Training Details

  • Training Data: Cosmopedia2 dataset (processed)
  • Training Epochs: 20
  • Training Steps: 16,860
  • Tokens Seen: ~1.38B tokens
  • Final Loss: 4.276
  • Hardware: NVIDIA A100 80GB GPU
  • Precision: bfloat16 (bf16)
  • Optimizer: AdamW with linear warmup and decay
  • Batch Size: 40 (effective batch size: 80 with gradient accumulation)

Tokenizer

  • Type: tiktoken cl100k_base
  • Vocabulary Size: 100,277 tokens
  • Special Tokens:
    • EOS Token ID: 100257
    • PAD Token ID: 100257 (same as EOS)
    • BOS Token ID: 100256

Usage

Quick Start

from ssllm_hf import SSLLMForCausalLM, SSLLMConfig
import tiktoken
import torch
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download

# Initialize model with config
config = SSLLMConfig.from_pretrained('sausheong/ssllm_hf')
model = SSLLMForCausalLM(config)

# Download and load model weights
model_path = hf_hub_download(repo_id='sausheong/ssllm_hf', filename='model.safetensors')
state_dict = load_file(model_path)
model.load_state_dict(state_dict, strict=False)

# Setup device and eval mode
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device).eval()

# Initialize tokenizer
tokenizer = tiktoken.get_encoding('cl100k_base')

def generate_text(prompt, max_new_tokens=128, temperature=0.7, top_p=0.9, top_k=40):
    # Encode the prompt
    input_ids = torch.tensor([tokenizer.encode(prompt)], device=device)
    attention_mask = torch.ones_like(input_ids)
    
    # Generate with the model
    with torch.no_grad():
        outputs = model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            pad_token_id=100257,
            eos_token_id=100257,
        )
    
    # Decode only the new tokens
    new_tokens = outputs[0][input_ids.shape[1]:].tolist()
    generated = tokenizer.decode(new_tokens)
    
    print(f"{prompt}{generated}")
    print(f"\nTokens generated: {len(new_tokens)}")

if __name__ == "__main__":
    prompt = "In a small village nestled between mountains,"
    print(f"PROMPT: {prompt}\n--")
    generate_text(prompt)

Example Outputs

Prompt: "In a small village nestled between mountains,"

Output: "In a small village nestled between mountains, lived two curious friends named Sam and Alex. They were always curious and loved learning new things. One day, while exploring the woods near the riverbank, they stumbled upon a mysterious object. It was a tiny, glowing object with a glowing light.

Sam explained that it had a special kind of light that could change how the light behaves. He told them that the light was made up of different colors and patterns, making it an even better way to see clearly. This made Sam and Alex curious."

Limitations

  • Domain: Primarily trained on English text
  • Context Length: Limited to 1024 tokens
  • Scale: Smaller model may have limitations compared to larger language models
  • Tokenizer: Requires tiktoken library (not standard HuggingFace tokenizer)
  • Special Tokens: Limited special token vocabulary

Considerations

  • Model outputs should be reviewed for potential biases
  • Not suitable for generating harmful or inappropriate content
  • Intended for research and educational purposes
  • Users should implement appropriate content filtering for production use

Technical Specifications

Model Files

  • model.safetensors: Model weights (832 MB)
  • config.json: Model configuration
  • tokenizer_config.json: Tokenizer metadata
  • generation_config.json: Default generation parameters

Compatibility

  • Framework: PyTorch
  • HuggingFace Transformers: Compatible with generation utilities
  • vLLM: No (Requires GPT-2 format conversion)
  • ONNX: Not currently supported
  • TensorFlow: Not supported

Model Card Authors

Chang Sau Sheong

Model Card Contact

For questions about this model, please open an issue in the repository or contact the development team.


Last updated: June 2025

Downloads last month
226
Safetensors
Model size
218M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support