Model Card for Model ID

This model is a fine-tuned version of Llama-3.2-3B-Instruct, specifically trained to evaluate investment offers and make optimal deal decisions based on Shark Tank-style negotiations. The model has been trained on a custom dataset that includes company details such as name, financials, sales performance, and offers received from investors, along with associated conditions.

The dataset was constructed from raw Shark Tank episode transcripts, preprocessed, and structured for supervised fine-tuning (SFT). Additionally, a preference dataset was created to facilitate Direct Preference Optimization (DPO) for offer selection. The model was initially trained using SFT and subsequently fine-tuned with DPO, leveraging the Unsloth library for efficient training.

Uses

# !pip install unsloth

from unsloth import FastLanguageModel
import torch

max_seq_length = 3000 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "VaidikML0508/llama3.2-3B-Instruct-DPO-16bits-V1",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

template = """
You are the founder of a company, pitching your business on *Shark Tank* to secure the best possible deal from the sharks. Below are the complete details about your company, its initial valuation, and the funding you are seeking. Your goal is to make the best decision for your company by evaluating shark offers, negotiating effectively, and choosing the most beneficial deal—or deciding to walk away if necessary.**

You will drive the conversation by answering the sharks' questions, assessing their offers, and strategically negotiating to secure the right investment for your company’s future. Your responses should be well-structured, leveraging your company’s strengths and financial details to justify your valuation and counter any aggressive terms from the sharks.

### **Final Decision:**
Once all sharks have made their offers, you will analyze them and decide on the best course of action. If an offer aligns with your vision, accept it; otherwise, gracefully decline and explain your reasoning.

Use special tokens in your response once to structure the answer appropriately.

#### **Accepted Offer:**
If a deal is accepted, the response must include the investment amount, the equity offered, and any special conditions. The answer should be structured using the following special tokens:
`<|money|>`, `<|equity|>`, `<|endoftext|>`, `<|No_Deal|>`, `<|condition|>`, `<|shark_pitch|>`, and `<|what_makes_shark_to_offer|>` to ensure clarity and consistency.

##### **Example:**
*"After careful consideration, I have decided to accept the offer from Shark X. The final deal is:
<|accepted_offer|> <|money|> 150000 <|end_money|> for <|equity|> 17.5 <|end_equity|> % <|shark_pitch|> Blake offered his expertise and resources to help bring the ideas to life, expressing a commitment to helping arrange a line of credit at a bank if needed. <|what_makes_shark_to_offer|> Blake was impressed by Carson's knowledge and the potential to help grow the business. <|endoftext|>*

#### **No Deal:**
If no deal is made, the response must explain the reasoning behind rejecting the offers. This should consider factors such as undervaluation, unfavorable terms, or better future opportunities. The answer should explicitly state:
`<|accepted_offer|> <|No_Deal|> <|endoftext|>`
to signify that no investment was secured.

##### **Example:**
*"After evaluating the offers, I have decided to decline, as none of them reflect the true value of our company. We believe we can achieve better terms in the future. <|accepted_offer|> <|No_Deal|> <|endoftext|>. Our plan is to scale further and seek investment at a later stage with an improved valuation."*

### Input:
{}

### Response:
{}"""

# Format data of company like below.
company_input_data = """<|company_name|> GreenGrow Planters <|endoftext|>
<|company_background|> GreenGrow Planters is an eco-friendly gardening solution that transforms household food waste into nutrient-rich compost. Their patented self-watering planter system uses a special filtration method that accelerates the composting process while eliminating odors. The company has also developed companion products including GreenGrow Sprouts for seedlings and GreenGrow XL for larger plants, all using their proprietary biodegradable materials. <|endoftext|>
<|sales_details|> The company has generated $340,000 in sales over the past three years, with $180,000 in the last year alone. They project $500,000 in sales for the coming year. Currently, 85% of sales come from their e-commerce platform and 15% from specialty garden stores. The product is available in 1,200 retail locations through partnerships with sustainable living retailers. <|endoftext|>
<|financials|> The standard GreenGrow Planter costs $4.75 to manufacture and ships for $8.50 wholesale, retailing for $19.99. The GreenGrow Sprouts starter kit costs $2.25 to manufacture, wholesales for $5.99, and retails for $12.99. The GreenGrow XL costs $7.50 to manufacture, wholesales for $14.99, and retails for $29.99. <|endoftext|>
<|initial_ask|> <|money|> 250000 <|end_money|> for <|equity|> 15 <|end_equity|> % <|endoftext|>

Shark Offers:
<|shark_offer|> <|money|> 300000 <|end_money|> for <|equity|> 30 <|end_equity|> % <|shark_pitch|> Lori offered her QVC connections and retail expertise to scale the business quickly, promising to make GreenGrow a household name within a year. <|what_makes_shark_to_offer|> Lori loved the sustainability angle and believed the product would resonate strongly with her customer base. <|endoftext|>
<|shark_offer|> <|money|> 250000 <|end_money|> for <|equity|> 20 <|end_equity|> % <|shark_pitch|> Mark proposed a strategic partnership focusing on improving the technology and expanding the product line with smart garden features. <|what_makes_shark_to_offer|> Mark was impressed by the innovation and saw potential to integrate IoT technology into future versions. <|endoftext|>
<|shark_offer|> <|money|> 200000 <|end_money|> for <|equity|> 15 <|end_equity|> % plus $2 royalty until $400,000 is recouped <|shark_pitch|> Kevin offered less equity but added a royalty structure to protect his investment while allowing the founders to maintain more control. <|what_makes_shark_to_offer|> Kevin appreciated the solid margins and wanted to structure a deal that would ensure quick returns while incentivizing growth. <|endoftext|>
<|shark_offer|> <|money|> 250000 <|end_money|> for <|equity|> 25 <|end_equity|> % <|shark_pitch|> Robert offered to leverage his connections in the home improvement sector to get the product into major retailers nationwide. <|what_makes_shark_to_offer|> Robert connected with the founders' passion and saw a clear path to scaling through his existing retail relationships. <|endoftext|>"""

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    template.format(
        company_input_data, # instruction
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 512, temperature=0.1, use_cache = True, stop_strings=['<|endoftext|>'], tokenizer=tokenizer)
ans = tokenizer.batch_decode(outputs)[0]
print(ans.split('### Response:')[1])
# Output: <|accepted_offer|> <|money|> 300000 <|end_money|> for <|equity|> 30 <|end_equity|> % <|shark_pitch|> Lori offered her QVC connections and retail expertise to scale the business quickly, expressing a commitment to helping arrange a line of credit at a bank if needed. <|what_makes_shark_to_offer|> Blake was impressed by Carson's knowledge and the potential to help grow the business. <|endoftext|>

Training Details

Training Data

The dataset was constructed from raw Shark Tank episode transcripts and preprocessed to structure company details, investment offers, and negotiation strategies. The preprocessing steps included:

Text Cleaning: Removing irrelevant content, filler words, and transcription errors.
Entity Extraction: Identifying and labeling company names, financials, offers, and investor conditions.
Data Structuring: Formatting the dataset for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) training. I used Alpaca format.

Training Procedure

Training Hyperparameters

This model was trained using a two-stage approach: Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO). The following hyperparameters were used for each stage:

Supervised Fine-Tuning (SFT) Parameters

Batch Size: 4
Gradient Accumulation Steps: 8 (calculated as samples_per_training_step / batch_size)
Optimizer: adamw_8bit
Learning Rate: 1e-5
Warmup Ratio: 0.1
Number of Epochs: 7
Evaluation Strategy: steps (evaluated every 5 steps)
Save Strategy: epoch
Max Sequence Length: 3000
Logging Steps: 5
Scheduler: linear
Reporting: Weights & Biases (wandb)

Direct Preference Optimization (DPO) Parameters

Batch Size: 2
Gradient Accumulation Steps: 4
Optimizer: adamw_8bit
Learning Rate: 5e-6
Number of Epochs: 6
Warmup Ratio: 0.1
Evaluation Steps: 5
Save Steps: 15
Logging Steps: 2
Weight Decay: 0.0
Scheduler: linear
Beta Parameter for DPO: 0.1
Max Sequence Length: 3000
Max Prompt Length: 2000
Reporting: Weights & Biases (wandb)

LoRA (PEFT) Configuration

LoRA Rank (r): 64
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
LoRA Alpha: 64
LoRA Dropout: 0
Use Gradient Checkpointing: "unsloth" (to reduce VRAM usage and enable larger batch sizes)
Random Seed: 3407
Use Rank Stabilized LoRA (rslora): False
LoftQ Quantization: None

Metrics

SFT Training Loss & Eval Loss

DPO Loss

Desclaimers

This model may generate inaccurate or misleading results. While it has been fine-tuned on structured data, it is not infallible and should not be solely relied upon for critical decision-making. Users should independently verify any outputs before making financial or business decisions. Use this model responsibly.

VaidikML0508
/

Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-SFT-DPO-4bits-V1