StepLaw
/

StepLaw-N_59M-D_7.0B-LR3.906e-03-BS524288

@@ -23,17 +23,17 @@ This model is part of the [StepLaw-N_59M-D_7.0B](https://huggingface.co/collecti
 - **Feed-forward network size (FFN)**: 5016
 - **Attention heads**: 9
 - **Layers**: 6
-- **Parameter count**: 59MM
 ### Training Parameters
 - **Learning rate (lr)**: 3.906e-03
-- **Batch size (bs)**: 256
 - **Training iterations**: 15258
 - **Training tokens (D)**: 8.0B
 ## Model Description
-StepLaw models are trained with various hyperparameter settings to enable research on scaling laws and hyperparameter optimization. This specific model was trained with learning rate 3.906e-03 and batch size 256 for 15258 iterations, using a total of 8.0B training tokens.
 ## Usage Example
@@ -48,7 +48,4 @@ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
 inputs = tokenizer("A long time ago in a galaxy far, far away", return_tensors="pt")
 outputs = model.generate(**inputs, max_length=100)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```## Part of StepLaw Project
-StepLaw is an initiative to provide thousands of models for optimal hyperparameter research.
-Visit [StepLaw Project](https://step-law.github.io/) for more information.

 - **Feed-forward network size (FFN)**: 5016
 - **Attention heads**: 9
 - **Layers**: 6
+- **Parameter count**: 59M
 ### Training Parameters
 - **Learning rate (lr)**: 3.906e-03
+- **Batch size (bs)**: 524288
 - **Training iterations**: 15258
 - **Training tokens (D)**: 8.0B
 ## Model Description
+StepLaw models are trained with various hyperparameter settings to enable research on scaling laws and hyperparameter optimization. This specific model was trained with learning rate 3.906e-03 and batch size 524288 for 15258 iterations, using a total of 8.0B training tokens.
 ## Usage Example
 inputs = tokenizer("A long time ago in a galaxy far, far away", return_tensors="pt")
 outputs = model.generate(**inputs, max_length=100)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```