|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
tags: |
|
|
- tinyllama |
|
|
- json |
|
|
- intent-detection |
|
|
- qlora |
|
|
- gptq |
|
|
--- |
|
|
|
|
|
# TinyLlama-JSON-Intent (GPTQ 4-bit) |
|
|
|
|
|
This is a fine-tuned version of [`TinyLlama/TinyLlama-1.1B-Chat-v1.0`](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) that has been specifically trained to act as an e-commerce intent detection model. Given a catalog of products and a user's request, it outputs a structured JSON object representing the user's intent (`add` or `remove`), the `product` name, and the `quantity`. |
|
|
|
|
|
This version of the model is **quantized to 4-bit using GPTQ**, making it highly efficient for inference in terms of memory usage and speed. |
|
|
The QLoRA adapter was merged into the final GPTQ model—no separate adapter loading is required. |
|
|
|
|
|
- **Adapter Version:** [jtlicardo/tinyllama-ecommerce-intent-adapter](https://huggingface.co/jtlicardo/tinyllama-ecommerce-intent-adapter) |
|
|
|
|
|
## Model Description |
|
|
|
|
|
The base model, TinyLlama-Chat, was fine-tuned using the QLoRA method on a synthetic dataset of 100 examples. The training objective was to teach the model to ignore conversational pleasantries and strictly output a JSON object that can be directly parsed by a backend system for managing a shopping cart. |
|
|
|
|
|
## Intended Use & Limitations |
|
|
|
|
|
This model is designed for a specific task: parsing user requests in an e-commerce context. It should not be used as a general-purpose chatbot. |
|
|
|
|
|
- **Primary Use:** Backend service for intent detection from user text. |
|
|
- **Out-of-Scope:** General conversation, answering questions, or any task not related to adding/removing items from a list. |
|
|
|
|
|
## How to Use |
|
|
|
|
|
The model expects a prompt formatted in a specific way, following the TinyLlama-Chat template. You must provide the `Catalog` and the `User` request. |
|
|
|
|
|
**Important:** You need to install `optimum` and `auto-gptq` to run this 4-bit GPTQ model. |
|
|
```bash |
|
|
pip install -q optimum auto-gptq transformers |
|
|
``` |
|
|
|
|
|
Here's how to run inference in Python: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline |
|
|
|
|
|
# Model repository on the Hugging Face Hub |
|
|
model_id = "jtlicardo/tinyllama-ecommerce-intent-gptq" |
|
|
|
|
|
# Load the tokenizer and the 4-bit quantized model |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
device_map="auto", |
|
|
torch_dtype=torch.float16 # Recommended for inference |
|
|
) |
|
|
|
|
|
# --- Define the prompt --- |
|
|
catalog = """Catalog: |
|
|
Shampoo (400ml bottle) |
|
|
Hand Soap (250ml dispenser) |
|
|
Peanut Butter (340g jar) |
|
|
Headphones |
|
|
Green Tea (25 tea bags)""" |
|
|
|
|
|
user_query = "Could you please take off 4 pairs of headphons from my cart?" |
|
|
|
|
|
# --- Format the prompt using the model's chat template --- |
|
|
# The model was trained to see this structure. |
|
|
prompt = f"<|user|>\n{catalog}\n\nUser:\n{user_query}\n<|assistant|>\n" |
|
|
|
|
|
# --- Generate the output --- |
|
|
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) |
|
|
outputs = pipe( |
|
|
prompt, |
|
|
max_new_tokens=50, # Max length of the JSON output |
|
|
do_sample=False, # Use deterministic output |
|
|
temperature=None, # Not needed for do_sample=False |
|
|
top_p=None, # Not needed for do_sample=False |
|
|
return_full_text=False # Only return the generated part |
|
|
) |
|
|
|
|
|
# The output will be a clean JSON string |
|
|
generated_json = outputs[0]['generated_text'].strip() |
|
|
print(generated_json) |
|
|
# Expected output: |
|
|
# {"action": "remove", "product": "Headphones", "quantity": 4} |
|
|
``` |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
This model was fine-tuned using the `trl` library's `SFTTrainer`. |
|
|
|
|
|
- **Method:** QLoRA (4-bit quantization with LoRA adapters) |
|
|
- **Dataset:** A custom JSONL file with 100 `prompt`/`completion` pairs. |
|
|
- **Configuration:** `completion_only_loss=True` was used to ensure the model only learned to generate the assistant's JSON response. |