RSCaLM-138M-core
RSCaLM (Research Scale Causal Language Model) β Core Edition β is an experimental 138M-parameter decoder-only transformer trained for 20,000 steps.
Unlike the LLaMA variant, this model is implemented entirely with a custom minimal GPT architecture (standalone_transformer_lm.GPT
) and SentencePiece tokenization β no Hugging Face Transformers dependency.
π Experiment Summary
Architecture: Custom GPT-style causal decoder
- Implemented in
standalone_transformer_lm.py
- Learned positional embeddings (absolute)
- Multi-head self-attention with KV caching
- GELU feed-forward layers
- LayerNorm
- Implemented in
Parameter Count: ~138M
Context Length: 2048 tokens
Tokenizer: SentencePiece (
tokenizer.model
)Training Framework: Pure PyTorch (no Transformers)
Optimizer: AdamW (Ξ²1=0.9, Ξ²2=0.95, weight decay=0.1)
Scheduler: Cosine decay with warmup
Precision: Mixed FP16/BF16 training
Steps Completed: 20,000 (~32% of planned total)
π Validation Loss Progress
Step | Val Loss |
---|---|
1,000 | 5.6011 |
2,000 | 4.8598 |
5,000 | 4.2239 |
10,000 | 3.9756 |
15,000 | 3.8608 |
20,000 | 3.7984 |
β οΈ Notes
- Prototype only β repetition loops expected in longer generations.
- Requires
standalone_transformer_lm.py
and SentencePiece to run. - Does not load with
transformers.AutoModelForCausalLM
.
π§ Example Usage
import torch, sentencepiece as spm
from standalone_transformer_lm import GPT, GPTConfig
# Load checkpoint & config
ckpt = torch.load("ckpt_best.pt", map_location="cpu")
cfg = GPTConfig(**ckpt["config"])
# Init model & load weights
model = GPT(cfg).eval()
model.load_state_dict(ckpt["model"])
# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.load("tokenizer.model")
# Encode prompt
ids = torch.tensor([sp.encode("Dubai is", out_type=int)])
# Generate text
out = model.generate(ids, max_new_tokens=40)
print(sp.decode(out[0].tolist()))
π§ Example Usage (with repetition control)
import torch, sentencepiece as spm
from standalone_transformer_lm import GPT, GPTConfig
ckpt = torch.load("ckpt_best.pt", map_location="cpu")
cfg = GPTConfig(**ckpt["config"])
model = GPT(cfg).eval()
model.load_state_dict(ckpt["model"])
sp = spm.SentencePieceProcessor()
sp.load("tokenizer.model")
prompt = "when a man goes to fishing"
ids = torch.tensor([sp.encode(prompt, out_type=int)])
# Manual repetition control
out = model.generate(
ids,
max_new_tokens=100,
temperature=0.7, # Lower temp = more focused
top_k=50, # Top-K sampling
top_p=0.9, # Nucleus sampling
repetition_penalty=1.2, # Penalize repeats
no_repeat_ngram_size=3, # Block repeating trigrams
)
print(sp.decode(out[0].tolist()))
π‘ Tips to Reduce Loops
- Increase
repetition_penalty
to 1.2β1.5 - Use
no_repeat_ngram_size=3
or higher - Combine
top_k
andtop_p
for better sampling variety - Lower
temperature
for more deterministic completions
π License
Apache-2.0
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support