Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: mit
|
|
4 |
---
|
5 |
This model is a Llama architecture based model with 500m parameters created to generate codes, texts and stories. It is pretrained totaly about 35 hours on some kinda small datasets using t4 gpu.
|
6 |
after that, I put about 5 hours to train the model on shareGpt inscructured chat template.
|
7 |
-
I've got 1.
|
8 |
This model shouldn't be used as a project itself, It must be trained on some larger datasets. Then, It must be post trained on conversational datasets.
|
9 |
**I will done it, soon!**
|
10 |
|
|
|
4 |
---
|
5 |
This model is a Llama architecture based model with 500m parameters created to generate codes, texts and stories. It is pretrained totaly about 35 hours on some kinda small datasets using t4 gpu.
|
6 |
after that, I put about 5 hours to train the model on shareGpt inscructured chat template.
|
7 |
+
I've got 1.2 ~ 1.9 training loss after training and it can be lower by more training. This model has a great potansiel to compare with the similar models (If it get trained).
|
8 |
This model shouldn't be used as a project itself, It must be trained on some larger datasets. Then, It must be post trained on conversational datasets.
|
9 |
**I will done it, soon!**
|
10 |
|