StepLaw
/

StepLaw-N_268M-D_79.0B-LR4.883e-04-BS720896

@@ -23,17 +23,17 @@ This model is part of the [StepLaw-N_268M-D_79.0B](https://huggingface.co/collec
 - **Feed-forward network size (FFN)**: 9552
 - **Attention heads**: 16
 - **Layers**: 8
-- **Parameter count**: 268MM
 ### Training Parameters
 - **Learning rate (lr)**: 4.883e-04
-- **Batch size (bs)**: 352
 - **Training iterations**: 110973
 - **Training tokens (D)**: 80.0B
 ## Model Description
-StepLaw models are trained with various hyperparameter settings to enable research on scaling laws and hyperparameter optimization. This specific model was trained with learning rate 4.883e-04 and batch size 352 for 110973 iterations, using a total of 80.0B training tokens.
 ## Usage Example
@@ -48,7 +48,4 @@ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
 inputs = tokenizer("A long time ago in a galaxy far, far away", return_tensors="pt")
 outputs = model.generate(**inputs, max_length=100)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```## Part of StepLaw Project
-StepLaw is an initiative to provide thousands of models for optimal hyperparameter research.
-Visit [StepLaw Project](https://step-law.github.io/) for more information.

 - **Feed-forward network size (FFN)**: 9552
 - **Attention heads**: 16
 - **Layers**: 8
+- **Parameter count**: 268M
 ### Training Parameters
 - **Learning rate (lr)**: 4.883e-04
+- **Batch size (bs)**: 720896
 - **Training iterations**: 110973
 - **Training tokens (D)**: 80.0B
 ## Model Description
+StepLaw models are trained with various hyperparameter settings to enable research on scaling laws and hyperparameter optimization. This specific model was trained with learning rate 4.883e-04 and batch size 720896 for 110973 iterations, using a total of 80.0B training tokens.
 ## Usage Example
 inputs = tokenizer("A long time ago in a galaxy far, far away", return_tensors="pt")
 outputs = model.generate(**inputs, max_length=100)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```