parasail-ai
/

Llama-3.1-8B-Instruct-NVFP4A16

8-bit precision

compressed-tensors

Model card Files Files and versions Community

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Llama-3.1-8B-Instruct-NVFP4A16

This model is a quantized version of the original model using FP4 (NVFP4A16).

Quantization Details

Method: FP4 (NVFP4A16)
Framework: llmcompressor
Quantized layers: All Linear layers except lm_head
Original model: Llama-3.1-8B-Instruct

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("parasail-ai/Llama-3.1-8B-Instruct-NVFP4A16")
tokenizer = AutoTokenizer.from_pretrained("parasail-ai/Llama-3.1-8B-Instruct-NVFP4A16")

# Use the model as usual
inputs = tokenizer("Hello, my name is", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

License

Please refer to the original model's license.

Downloads last month: 57

Safetensors

Model size

4.98B params

Tensor type

BF16

·

F32

·

F8_E4M3

·

U8

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support