Michael Goin PRO
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Recent Activity
updated
a model
2 days ago
mgoin/Llama-3.2-1B-Instruct-FP8-ATTN
updated
a model
2 days ago
mgoin/Llama-3.2-1B-Instruct-FP8-dynamic-ATTN
updated
a model
6 days ago
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic
Organizations
mgoin's activity
Model does not run with VLLM
2
#3 opened 9 days ago
by
aswad546
Nice model, any info on scripts used to quantize?
1
#1 opened 12 days ago
by
RonanMcGovern
Add config_format and load_format to vLLM args
#5 opened about 1 month ago
by
mgoin
Update config.json to use null for sliding_window
#4 opened about 1 month ago
by
mgoin
Adding `safetensors` variant of this model
#1 opened about 1 month ago
by
SFconvertbot
Is this the standard GPTQ quantization?
1
#5 opened about 2 months ago
by
molereddy
Model weights are not loaded
4
#3 opened 4 months ago
by
MarvelousMouse
Update model card
#1 opened about 2 months ago
by
nm-research
Add chat_template to tokenizer_config.json
#1 opened about 2 months ago
by
nm-research
7900xtx torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
1
#3 opened about 2 months ago
by
aaaaaaaaaasdf
Why is the Pixtral activation function "gelu" when the reference code uses "silu"?
2
#10 opened 2 months ago
by
mgoin
Update tokenizer_config.json with chat_template
3
#11 opened 2 months ago
by
mgoin
Any chance your team is working on a 4-bit Llama-3.2-90B-Vision-Instruct-quantized.w4a16 version?
1
#1 opened 3 months ago
by
mrhendrey
Oom with 24g vram
3
#1 opened 3 months ago
by
Klopez
latest vllm docker (v0.6.2) fail to load
2
#1 opened 3 months ago
by
choronz333
Issue with loading model
1
#1 opened 4 months ago
by
xSumukhax
Can it run on A100/A800 with VLLM?
3
#1 opened 5 months ago
by
Parkerlambert123
weights does not exist when trying to deploy in sagemaker endpoint
1
#1 opened 4 months ago
by
LorenzoCevolaniAXA
8-kv-heads
4
#17 opened 5 months ago
by
ArthurZ