The training time mentioned in the paper and the explanations in the Git repository have a significant gap.
#77
by
wangzl
- opened
I summarized the training throughput of according to model card as follows:
model | training size | time cost | a100 cost | Throughput |
---|---|---|---|---|
phi1 | 54B | 6 days | 8 | 13020 token/s / per a100 |
phi1.5 | 150B | 8 days | 32 | 6781 token/s / per a100 |
phi2 | 1.4T | 14 days | 96 | 12056 token/s per a100 |
and in the paper of 1.5, 150B tokens training cost 1.5K A100 gpu hours, which means the throughput is 27777 token/s / per a100.
Through comparison, I feel there might be errors in the data presented in the paper. It could be due to other reasons as well. I welcome any input from everyone.
i am facing The repository for microsoft/phi-1_5 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/microsoft/phi-1_5. Please pass the argument trust_remote_code=True
to allow custom code to be run. this error in my fine tuned model anyone suggest solution.