NewComputeBench-CLM-Digital
Collection
5 items
•
Updated
A 600M parameter bitflip-aware language model trained on 22 * 600M
tokens from FineWeb-Edu dataset.
bitflip-aixsim-600M is a transformer-based language model with approximately 600 million parameters (embedding layer params excluded). It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.
Experiment setup and training logs can be found at wandb run.