AGI-0/smartllama3.1-8B-001 · How did you manage to train this on a T4 alone?

Sep 8, 2024

I was shocked to find that you had managed to train all of this on a single T4 GPU. I made an attempt on fine tuning on Colab before, with the T4; but the process took 7 hours and I only had 2 hours and 30 minutes left of compute. How did your fine tuning manage to go through 22 days without all of your fine tuning progress deleted? I am interested.

gr0010

AGI-0 Labs org Sep 8, 2024

I created it 22 days ago, it didn't take 22 days to train. It took around 30 minutes on colab.

brianhuynhML

Sep 10, 2024

Just tested the model on Spaces. I can say that the output is much more clear and easier to understand than the base Llama 3.1 8B. Here are some screenshots:
Your model:

Base

As you can see, the base model just spits out the SQL schema, without explain the logic and functionality behind it, whereas Artificium explains how each field works.
Note: I chatted with the base and Artificium for a while and I found that Artificium is more step-by-step based than the base model. Maybe it is different for others, but for me it is this way.