Uploaded model

Developed by: harithapliyal
License: apache-2.0
Finetuned from model : unsloth/llama-3-8b-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

from google.colab import userdata HF_KEY = userdata.get('HF_KEY')

from unsloth import FastLanguageModel import torch

Load model directly

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

Configure the quantization

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16"
)

Load the model with quantization

model1 = AutoModelForCausalLM.from_pretrained(
    "harithapliyal/llama-3-8b-bnb-4bit-finetuned-SentAnalysis", 
    quantization_config=bnb_config
)



FastLanguageModel.for_inference(model1) # Enable native 2x faster inference
inputs = tokenizer(
[
    fine_tuned_prompt.format(
        "Classify the sentiment of the following text.", # instruction
        "I like play yoga under the rain", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

harithapliyal
/

llama-3-8b-bnb-4bit-finetuned-SentAnalysis

Uploaded model

Load model directly

Configure the quantization

Load the model with quantization

Model tree for harithapliyal/llama-3-8b-bnb-4bit-finetuned-SentAnalysis