Independent evaluation results
#30 opened 3 months ago
by
yaronr
Getting the error: "triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 166912. Reducing block sizes or `num_stages` may help."
2
#27 opened 6 months ago
by
Pranav0511
Why the inference speed so slow compare with same 7B parameters of Qwen?
#26 opened 6 months ago
by
lucasjin
Upload triton_flash_blocksparse_attn.py
#25 opened 6 months ago
by
barcelosallan
Phi-3-small doesn't load with TGI
1
#24 opened 6 months ago
by
aveer30
Multi-GPU training fails when using device_map = "auto"
2
#23 opened 6 months ago
by
aveer30
Shared memory error
9
#15 opened 7 months ago
by
marktenenholtz