catid
/

cat-llama-3-70b-hqq

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

catid commited on Apr 20

Commit

722a847

•

1 Parent(s): df2e5b3

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -1,3 +1,5 @@
 How to quantize 70B model so it will fit on 2x4090 GPUs:
 I tried EXL2, AutoAWQ, and SqueezeLLM and they all failed for different reasons (issues opened).

+AI Model Name: Llama 3 70B "Built with Meta Llama 3" https://llama.meta.com/llama3/license/
 How to quantize 70B model so it will fit on 2x4090 GPUs:
 I tried EXL2, AutoAWQ, and SqueezeLLM and they all failed for different reasons (issues opened).