RSCaLM-138M-core

RSCaLM (Research Scale Causal Language Model) — Core Edition — is an experimental 138M-parameter decoder-only transformer trained for 20,000 steps. Unlike the LLaMA variant, this model is implemented entirely with a custom minimal GPT architecture (standalone_transformer_lm.GPT) and SentencePiece tokenization — no Hugging Face Transformers dependency.

📌 Experiment Summary

Architecture: Custom GPT-style causal decoder
- Implemented in standalone_transformer_lm.py
- Learned positional embeddings (absolute)
- Multi-head self-attention with KV caching
- GELU feed-forward layers
- LayerNorm
Parameter Count: ~138M
Context Length: 2048 tokens
Tokenizer: SentencePiece (tokenizer.model)
Training Framework: Pure PyTorch (no Transformers)
Optimizer: AdamW (β1=0.9, β2=0.95, weight decay=0.1)
Scheduler: Cosine decay with warmup
Precision: Mixed FP16/BF16 training
Steps Completed: 20,000 (~32% of planned total)

📉 Validation Loss Progress

Step	Val Loss
1,000	5.6011
2,000	4.8598
5,000	4.2239
10,000	3.9756
15,000	3.8608
20,000	3.7984

⚠️ Notes

Prototype only — repetition loops expected in longer generations.
Requires standalone_transformer_lm.py and SentencePiece to run.
Does not load with transformers.AutoModelForCausalLM.

🔧 Example Usage

import torch, sentencepiece as spm
from standalone_transformer_lm import GPT, GPTConfig

# Load checkpoint & config
ckpt = torch.load("ckpt_best.pt", map_location="cpu")
cfg  = GPTConfig(**ckpt["config"])

# Init model & load weights
model = GPT(cfg).eval()
model.load_state_dict(ckpt["model"])

# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.load("tokenizer.model")

# Encode prompt
ids = torch.tensor([sp.encode("Dubai is", out_type=int)])

# Generate text
out = model.generate(ids, max_new_tokens=40)
print(sp.decode(out[0].tolist()))

🔧 Example Usage (with repetition control)

import torch, sentencepiece as spm
from standalone_transformer_lm import GPT, GPTConfig

ckpt = torch.load("ckpt_best.pt", map_location="cpu")
cfg  = GPTConfig(**ckpt["config"])
model = GPT(cfg).eval()
model.load_state_dict(ckpt["model"])

sp = spm.SentencePieceProcessor()
sp.load("tokenizer.model")

prompt = "when a man goes to fishing"
ids = torch.tensor([sp.encode(prompt, out_type=int)])

# Manual repetition control
out = model.generate(
    ids,
    max_new_tokens=100,
    temperature=0.7,        # Lower temp = more focused
    top_k=50,                # Top-K sampling
    top_p=0.9,               # Nucleus sampling
    repetition_penalty=1.2,  # Penalize repeats
    no_repeat_ngram_size=3,  # Block repeating trigrams
)
print(sp.decode(out[0].tolist()))

💡 Tips to Reduce Loops

Increase repetition_penalty to 1.2–1.5
Use no_repeat_ngram_size=3 or higher
Combine top_k and top_p for better sampling variety
Lower temperature for more deterministic completions

📜 License

Apache-2.0

yasserrmd
/

RSCaLM-138M-Core