SSLLM
SSLLM is a 218M parameter decoder-only transformer language model created for testing and experimental purposes. This model has been converted to HuggingFace format.
Model Details
Model Description
- Developed by: Chang Sau Sheong
- Model type: Causal Language Model (Decoder-only Transformer)
- Language(s): English
- License: MIT
- Parameters: 217.92M
- Architecture: Custom SSLLM Transformer with HuggingFace compatibility
Model Architecture
- Model Dimension (d_model): 768
- Attention Heads: 12
- Transformer Layers: 10
- Feed-Forward Dimension: 2560
- Vocabulary Size: 100,277 (cl100k_base)
- Maximum Sequence Length: 1024 tokens
- Position Embeddings: Learned absolute positional embeddings
- Attention: Multi-head self-attention with causal masking
- Activation: GELU
- Layer Normalization: Pre-layer norm architecture
Training Details
- Training Data: Cosmopedia2 dataset (processed)
- Training Epochs: 20
- Training Steps: 16,860
- Tokens Seen: ~1.38B tokens
- Final Loss: 4.276
- Hardware: NVIDIA A100 80GB GPU
- Precision: bfloat16 (bf16)
- Optimizer: AdamW with linear warmup and decay
- Batch Size: 40 (effective batch size: 80 with gradient accumulation)
Tokenizer
- Type: tiktoken cl100k_base
- Vocabulary Size: 100,277 tokens
- Special Tokens:
- EOS Token ID: 100257
- PAD Token ID: 100257 (same as EOS)
- BOS Token ID: 100256
Usage
Quick Start
from ssllm_hf import SSLLMForCausalLM, SSLLMConfig
import tiktoken
import torch
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
# Initialize model with config
config = SSLLMConfig.from_pretrained('sausheong/ssllm_hf')
model = SSLLMForCausalLM(config)
# Download and load model weights
model_path = hf_hub_download(repo_id='sausheong/ssllm_hf', filename='model.safetensors')
state_dict = load_file(model_path)
model.load_state_dict(state_dict, strict=False)
# Setup device and eval mode
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device).eval()
# Initialize tokenizer
tokenizer = tiktoken.get_encoding('cl100k_base')
def generate_text(prompt, max_new_tokens=128, temperature=0.7, top_p=0.9, top_k=40):
# Encode the prompt
input_ids = torch.tensor([tokenizer.encode(prompt)], device=device)
attention_mask = torch.ones_like(input_ids)
# Generate with the model
with torch.no_grad():
outputs = model.generate(
input_ids,
attention_mask=attention_mask,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=temperature,
top_p=top_p,
top_k=top_k,
pad_token_id=100257,
eos_token_id=100257,
)
# Decode only the new tokens
new_tokens = outputs[0][input_ids.shape[1]:].tolist()
generated = tokenizer.decode(new_tokens)
print(f"{prompt}{generated}")
print(f"\nTokens generated: {len(new_tokens)}")
if __name__ == "__main__":
prompt = "In a small village nestled between mountains,"
print(f"PROMPT: {prompt}\n--")
generate_text(prompt)
Example Outputs
Prompt: "In a small village nestled between mountains,"
Output: "In a small village nestled between mountains, lived two curious friends named Sam and Alex. They were always curious and loved learning new things. One day, while exploring the woods near the riverbank, they stumbled upon a mysterious object. It was a tiny, glowing object with a glowing light.
Sam explained that it had a special kind of light that could change how the light behaves. He told them that the light was made up of different colors and patterns, making it an even better way to see clearly. This made Sam and Alex curious."
Limitations
- Domain: Primarily trained on English text
- Context Length: Limited to 1024 tokens
- Scale: Smaller model may have limitations compared to larger language models
- Tokenizer: Requires tiktoken library (not standard HuggingFace tokenizer)
- Special Tokens: Limited special token vocabulary
Considerations
- Model outputs should be reviewed for potential biases
- Not suitable for generating harmful or inappropriate content
- Intended for research and educational purposes
- Users should implement appropriate content filtering for production use
Technical Specifications
Model Files
model.safetensors
: Model weights (832 MB)config.json
: Model configurationtokenizer_config.json
: Tokenizer metadatageneration_config.json
: Default generation parameters
Compatibility
- Framework: PyTorch
- HuggingFace Transformers: Compatible with generation utilities
- vLLM: No (Requires GPT-2 format conversion)
- ONNX: Not currently supported
- TensorFlow: Not supported
Model Card Authors
Chang Sau Sheong
Model Card Contact
For questions about this model, please open an issue in the repository or contact the development team.
Last updated: June 2025
- Downloads last month
- 226