StarCycle
/

llava-siglip-internlm2-1_8b-v1

Image-Text-to-Text

Model card Files Files and versions Community

StarCycle commited on Mar 7, 2024

Commit

caa8961

·

verified ·

1 Parent(s): 6dc926d

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -204,20 +204,20 @@ NPROC_PER_NODE=4 xtuner train ./pretrain.py --deepspeed deepspeed_zero2
 The checkpoint and tensorboard logs are saved by default in ./work_dirs/. I only train it for 1 epoch to be same as the original LLaVA paper. Some researches also report that training for multiple epochs will make the model overfit the training dataset and perform worse in other domains.
 This is my loss curve for llava-siglip-internlm2-1_8b-pretrain-v1:
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/iNxPxfOvSJq8ZPz8uP_sP.png)
 And the learning rate curve:
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/U1U9Kapcd6AIEUySvt2RS.png)
 2. Instruction following fine-tuning
 ```
 NPROC_PER_NODE=4 xtuner train ./finetune.py --deepspeed deepspeed_zero2
 ```
 Here is my loss curve (the curve fluctuates strongly because the batch size is small, and I only record batch loss instead of epoch loss):
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/kby2Y1dixeTaALliZ4pJa.png)
 And the learning rate curve:
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/7ue98bikCOU7ub2jEHrom.png)
 ## Transfer the checkpoints to Huggingface safetensor format
 ```

 The checkpoint and tensorboard logs are saved by default in ./work_dirs/. I only train it for 1 epoch to be same as the original LLaVA paper. Some researches also report that training for multiple epochs will make the model overfit the training dataset and perform worse in other domains.
 This is my loss curve for llava-siglip-internlm2-1_8b-pretrain-v1:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/geoWP80yE5wzG1e6ZJTEy.png)
 And the learning rate curve:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/hy8ulNnvy1Y7fE1ZNnHRN.png)
 2. Instruction following fine-tuning
 ```
 NPROC_PER_NODE=4 xtuner train ./finetune.py --deepspeed deepspeed_zero2
 ```
 Here is my loss curve (the curve fluctuates strongly because the batch size is small, and I only record batch loss instead of epoch loss):
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/IZVjtlw4zPw-61p8dT8nL.png)
 And the learning rate curve:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/81VD13-zwFsYqkfUyyntJ.png)
 ## Transfer the checkpoints to Huggingface safetensor format
 ```