arshiaafshani
/

Arsh-llm

Text Generation

text-generation-inference

Model card Files Files and versions Community

arshiaafshani commited on 7 days ago

Commit

47f671e

·

verified ·

1 Parent(s): 70f56e4

End of training

Files changed (2) hide show

README.md +53 -8
generation_config.json +1 -1

README.md CHANGED Viewed

@@ -1,12 +1,57 @@
 ---
 library_name: transformers
 license: mit
 ---
-This model is a Llama architecture based model with 500m parameters created to generate codes, texts and stories. It is pretrained totaly about 35 hours on some kinda small datasets using t4 gpu.
-after that, I put about 5 hours to train the model on shareGpt inscructured chat template.
-I've got 1.2 ~ 1.9 training loss after training and it can be lower by more training. This model has a great potansiel to compare with the similar models (If it get trained).
-This model shouldn't be used as a project itself, It must be trained on some larger datasets. Then, It must be post trained on conversational datasets.
-**I will done it, soon!**
-# License
-This model is licensed under MIT.

 ---
 library_name: transformers
 license: mit
+base_model: arshiaafshani/Arsh-llm
+tags:
+- generated_from_trainer
+model-index:
+- name: Arsh-llm
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Arsh-llm
+This model is a fine-tuned version of [arshiaafshani/Arsh-llm](https://huggingface.co/arshiaafshani/Arsh-llm) on an unknown dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 3e-05
+- train_batch_size: 4
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 12
+- total_train_batch_size: 48
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 2000
+- num_epochs: 1
+- mixed_precision_training: Native AMP
+### Training results
+### Framework versions
+- Transformers 4.52.2
+- Pytorch 2.6.0+cu124
+- Datasets 3.6.0
+- Tokenizers 0.21.1

generation_config.json CHANGED Viewed

@@ -3,5 +3,5 @@
   "bos_token_id": 0,
   "eos_token_id": 2,
   "pad_token_id": 1,
-  "transformers_version": "4.51.3"
 }

   "bos_token_id": 0,
   "eos_token_id": 2,
   "pad_token_id": 1,
+  "transformers_version": "4.52.2"
 }