Downloading weights without duplicates

#52
by vadimkantorov - opened

Downloading with regular git/lfs makes duplicates in .git/lfs/objects which are quite huge for 680Gb weights files:

sudo apt-get install git-lfs
git lfs install

# git clone https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
# du -sh DeepSeek-V3-0324
# # 1.3T    DeepSeek-V3-0324/
# du -sh DeepSeek-V3-0324/.git/lfs
# # 642G    DeepSeek-V3-0324/.git/lfs

How do I download the weights files without any duplication?

Would huggingface_hub.snapshot_download not produce duplicates / any extra cache? (I'm worried of this cache https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache)

pip install hf_transfer huggingface_hub[hf_transfer] 
HF_HUB_ENABLE_HF_TRANSFER=1 python -c 'import huggingface_hub; huggingface_hub.snapshot_download(repo_id="deepseek-ai/DeepSeek-V3-0324",local_dir="deepseek-ai/DeepSeek-V3-0324",allow_patterns=["*.safetensors"])'

just use huggingface-cli download deepseek-ai/DeepSeek-V3-0324 should be fine

Thanks! Maybe adding a note about this directly in the README would be very helpful for novices.

As DeepSeek is one of really big open models, so having a warning that git clone https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 would lead to duplicating the 642Gb would be useful, and a command for fast non-duplicating download would be very helpful.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment