Michael Goin
mgoin
·
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Organizations
mgoin's activity
Is this the standard GPTQ quantization?
1
#5 opened 8 days ago
by
molereddy
Model weights are not loaded
4
#3 opened 2 months ago
by
MarvelousMouse
Update model card
#1 opened 9 days ago
by
nm-research
Add chat_template to tokenizer_config.json
#1 opened 9 days ago
by
nm-research
Why is the Pixtral activation function "gelu" when the reference code uses "silu"?
2
#10 opened 25 days ago
by
mgoin
Update tokenizer_config.json with chat_template
2
#11 opened 25 days ago
by
mgoin
Any chance your team is working on a 4-bit Llama-3.2-90B-Vision-Instruct-quantized.w4a16 version?
1
#1 opened about 1 month ago
by
mrhendrey
Oom with 24g vram
3
#1 opened about 1 month ago
by
Klopez
latest vllm docker (v0.6.2) fail to load
2
#1 opened about 1 month ago
by
choronz333
Issue with loading model
1
#1 opened 2 months ago
by
xSumukhax
Can it run on A100/A800 with VLLM?
3
#1 opened 3 months ago
by
Parkerlambert123
weights does not exist when trying to deploy in sagemaker endpoint
1
#1 opened 3 months ago
by
LorenzoCevolaniAXA
8-kv-heads
4
#17 opened 3 months ago
by
ArthurZ
8-kv-heads
3
#21 opened 3 months ago
by
ArthurZ
run with vllm
8
#4 opened 3 months ago
by
kuliev-vitaly
Not able to run Model using VLLM
1
#3 opened 3 months ago
by
Pchaudhary
getting issue while loading in llm
1
#1 opened 3 months ago
by
Abhinav6310
How to fast inference with FP8
1
#2 opened 3 months ago
by
CCRss
Unable to load model onto multiple GPUs
2
#2 opened 3 months ago
by
bprice9