
GreenBitAI/DeepSeek-R1-Distill-Llama-70B-layer-mix-bpw-4.0-mlx
Updated
โข
66
Command for reproducing this run ๐ :
CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=true python -m sft.finetune --model GreenBitAI/Llama-3-8B-layer-mix-bpw-2.2 --tune-qweight-only --galore --galore-rank 64 --optimizer adamw8bit --batch-size 1 --seqlen 96