SSLLM

SSLLM is a 218M parameter decoder-only transformer language model created for testing and experimental purposes. This model has been converted to HuggingFace format.

Model Details

Model Description

Developed by: Chang Sau Sheong
Model type: Causal Language Model (Decoder-only Transformer)
Language(s): English
License: MIT
Parameters: 217.92M
Architecture: Custom SSLLM Transformer with HuggingFace compatibility

Model Architecture

- Model Dimension (d_model): 768
- Attention Heads: 12
- Transformer Layers: 10
- Feed-Forward Dimension: 2560
- Vocabulary Size: 100,277 (cl100k_base)
- Maximum Sequence Length: 1024 tokens
- Position Embeddings: Learned absolute positional embeddings
- Attention: Multi-head self-attention with causal masking
- Activation: GELU
- Layer Normalization: Pre-layer norm architecture

Training Details

Training Data: Cosmopedia2 dataset (processed)
Training Epochs: 20
Training Steps: 16,860
Tokens Seen: ~1.38B tokens
Final Loss: 4.276
Hardware: NVIDIA A100 80GB GPU
Precision: bfloat16 (bf16)
Optimizer: AdamW with linear warmup and decay
Batch Size: 40 (effective batch size: 80 with gradient accumulation)

Tokenizer

Type: tiktoken cl100k_base
Vocabulary Size: 100,277 tokens
Special Tokens:
- EOS Token ID: 100257
- PAD Token ID: 100257 (same as EOS)
- BOS Token ID: 100256

Usage

Quick Start

from ssllm_hf import SSLLMForCausalLM, SSLLMConfig
import tiktoken
import torch
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download

# Initialize model with config
config = SSLLMConfig.from_pretrained('sausheong/ssllm_hf')
model = SSLLMForCausalLM(config)

# Download and load model weights
model_path = hf_hub_download(repo_id='sausheong/ssllm_hf', filename='model.safetensors')
state_dict = load_file(model_path)
model.load_state_dict(state_dict, strict=False)

# Setup device and eval mode
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device).eval()

# Initialize tokenizer
tokenizer = tiktoken.get_encoding('cl100k_base')

def generate_text(prompt, max_new_tokens=128, temperature=0.7, top_p=0.9, top_k=40):
    # Encode the prompt
    input_ids = torch.tensor([tokenizer.encode(prompt)], device=device)
    attention_mask = torch.ones_like(input_ids)
    
    # Generate with the model
    with torch.no_grad():
        outputs = model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            pad_token_id=100257,
            eos_token_id=100257,
        )
    
    # Decode only the new tokens
    new_tokens = outputs[0][input_ids.shape[1]:].tolist()
    generated = tokenizer.decode(new_tokens)
    
    print(f"{prompt}{generated}")
    print(f"\nTokens generated: {len(new_tokens)}")

if __name__ == "__main__":
    prompt = "In a small village nestled between mountains,"
    print(f"PROMPT: {prompt}\n--")
    generate_text(prompt)

Example Outputs

Prompt: "In a small village nestled between mountains,"

Output: "In a small village nestled between mountains, lived two curious friends named Sam and Alex. They were always curious and loved learning new things. One day, while exploring the woods near the riverbank, they stumbled upon a mysterious object. It was a tiny, glowing object with a glowing light.

Sam explained that it had a special kind of light that could change how the light behaves. He told them that the light was made up of different colors and patterns, making it an even better way to see clearly. This made Sam and Alex curious."

Limitations

Domain: Primarily trained on English text
Context Length: Limited to 1024 tokens
Scale: Smaller model may have limitations compared to larger language models
Tokenizer: Requires tiktoken library (not standard HuggingFace tokenizer)
Special Tokens: Limited special token vocabulary

Considerations

Model outputs should be reviewed for potential biases
Not suitable for generating harmful or inappropriate content
Intended for research and educational purposes
Users should implement appropriate content filtering for production use

Technical Specifications

Model Files

model.safetensors: Model weights (832 MB)
config.json: Model configuration
tokenizer_config.json: Tokenizer metadata
generation_config.json: Default generation parameters

Compatibility

Framework: PyTorch
HuggingFace Transformers: Compatible with generation utilities
vLLM: No (Requires GPT-2 format conversion)
ONNX: Not currently supported
TensorFlow: Not supported

Model Card Authors

Chang Sau Sheong

Model Card Contact

For questions about this model, please open an issue in the repository or contact the development team.

Last updated: June 2025