pswitala/pllum-8B-instruct-Q5_k_m-gguf

This model was converted to GGUF format from CYFRAGOVPL/Llama-PLLuM-8B-instruct using llama.cpp Refer to the original model card for more details on the model.

Really fast at rtx4080

This model run smoothly on RTX4080 with 70 tokens / sec

GGUF

Model size

8.03B params

Architecture

llama

5-bit

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Base model

Quantized

(3)

this model