Meta-Llama-3-8B-Instruct_bitsandbytes_4bit fine-tuned on Salesforce/xlam-function-calling-60k

Function-Calling Agent

LoRA Adpater Head

Parameter Efficient Finetuning (PEFT) a 4bit quantized Meta-Llama-3-8B-Instruct on Salesforce/xlam-function-calling-60k dataset.

Language(s) (NLP): English
License: openrail
Qunatization: BitsAndBytes
PEFT: LoRA
Finetuned from model SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit
Dataset: Salesforce/xlam-function-calling-60k dataset

Intended uses & limitations

Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

How to use

Install Required Libraries

!pip install transformers accelerate bitsandbytes>0.37.0
!pip install peft

Setup Adapter with Base Model

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer,AutoModelForCausalLM
from peft import PeftModel, PeftConfig, get_peft_model
import torch

base_model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit",device_map="auto")
model = PeftModel.from_pretrained(base_model, "SwastikM/Meta-Llama3-8B-Chat-Adapter")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

model = model.to("cuda")
model.eval()

Setup Template and Infer

x1 = {"role": "system", "content": """You are a APIGen Function Calling Tool. You will br provided with a user query and associated tools for answering the query.
    query (string): The query or problem statement.
    tools (array): An array of available tools that can be used to solve the query.
    Each tool is represented as an object with the following properties:
        name (string): The name of the tool.
        description (string): A brief description of what the tool does.
        parameters (object): An object representing the parameters required by the tool.
            Each parameter is represented as a key-value pair, where the key is the parameter name and the value is an object with the following properties:
                type (string): The data type of the parameter (e.g., "int", "float", "list").
                description (string): A brief description of the parameter.
                required (boolean): Indicates whether the parameter is required or optional.
    You will provide the Answer array.
        Answers array provides the specific tool and arguments used to generate each answer."""}
x2 = {"role": "user", "content": None}
x3 = {"role": "assistant", "content": None}
user_template = 'Query: {Q} Tools: {T}'
response_template = '{A}'
Q = "Where can I find live giveaways for beta access and games?"
T = """[{"name": "live_giveaways_by_type", "description": "Retrieve live giveaways from the GamerPower API based on the specified type.", "parameters": {"type": {"description": "The type of giveaways to retrieve (e.g., game, loot, beta).", "type": "str", "default": "game"}}}]"""


x2['content'] = f'{user_template.format(Q=Q,T=T)}'
prompts = [x1,x2]
input_ids = tokenizer.apply_chat_template(
    prompts,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators
)

response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Size Comparison

The table shows comparison VRAM requirements for loading and training of FP16 Base Model and 4bit bnb quantized model with PEFT. The value for base model referenced from Model Memory Calculator from HuggingFace

Model	Total Size	Training Using Adam
Base Model	28.21 GB	56.42 GB
4bitQuantized+PEFT	5.21 GB	13 GB

Training Details

Training Data

Dataset: Salesforce/xlam-function-calling-60k dataset

Trained on instruction column of 20,00 randomly shuffled data.

Training Procedure

HuggingFace Accelerate with Training Loop.

Training Hyperparameters

Optimizer: AdamW
lr: 2e-5
decay: linear
batch_size: 1
gradient_accumulation_steps: 2
fp16: True

LoraConfig

r: 8
lora_alpha: 32
task_type: TaskType.CAUSAL_LM
lora_dropout: 0.1

Hardware

GPU: P100

Acknowledgment

Thanks to @AMerve Noyan for precise intro.
Thanks to @HuggungFace Team for the Blog.
Thanks to @Salesforce for the marvelous dataset.

Model Card Authors

Swastik Maiti

SwastikM
/

Meta-Llama3-8B-Chat-Instruct-LoRA