Michael Goin

mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Organizations

mgoin's activity

New activity in neuralmagic/pixtral-12b-FP8-dynamic 9 days ago

Update model card

#1 opened 9 days ago by nm-research

Oom with 24g vram

3
#1 opened about 1 month ago by Klopez
New activity in neuralmagic/Phi-3.5-mini-instruct-FP8-KV about 1 month ago
New activity in meta-llama/Llama-3.1-405B-Instruct 3 months ago

8-kv-heads

4
#17 opened 3 months ago by ArthurZ
New activity in meta-llama/Llama-3.1-405B 3 months ago

8-kv-heads

3
#21 opened 3 months ago by ArthurZ

run with vllm

8
#4 opened 3 months ago by kuliev-vitaly
New activity in neuralmagic/gemma-2-9b-it-FP8 3 months ago
New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 3 months ago

How to fast inference with FP8

1
#2 opened 3 months ago by CCRss