YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Llama-3.1-8B-Instruct-NVFP4A16
This model is a quantized version of the original model using FP4 (NVFP4A16).
Quantization Details
- Method: FP4 (NVFP4A16)
- Framework: llmcompressor
- Quantized layers: All Linear layers except lm_head
- Original model: Llama-3.1-8B-Instruct
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("parasail-ai/Llama-3.1-8B-Instruct-NVFP4A16")
tokenizer = AutoTokenizer.from_pretrained("parasail-ai/Llama-3.1-8B-Instruct-NVFP4A16")
# Use the model as usual
inputs = tokenizer("Hello, my name is", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))
License
Please refer to the original model's license.
- Downloads last month
- 57
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support