tinyllama-ecommerce-intent-gptq / README.md

Update README.md

7f80df3 verified 5 months ago

3.89 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- tinyllama
	- json
	- intent-detection
	- qlora
	- gptq
	---

	# TinyLlama-JSON-Intent (GPTQ 4-bit)

	This is a fine-tuned version of [`TinyLlama/TinyLlama-1.1B-Chat-v1.0`](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) that has been specifically trained to act as an e-commerce intent detection model. Given a catalog of products and a user's request, it outputs a structured JSON object representing the user's intent (`add` or `remove`), the `product` name, and the `quantity`.

	This version of the model is quantized to 4-bit using GPTQ, making it highly efficient for inference in terms of memory usage and speed.
	The QLoRA adapter was merged into the final GPTQ model—no separate adapter loading is required.

	- Adapter Version: [jtlicardo/tinyllama-ecommerce-intent-adapter](https://huggingface.co/jtlicardo/tinyllama-ecommerce-intent-adapter)

	## Model Description

	The base model, TinyLlama-Chat, was fine-tuned using the QLoRA method on a synthetic dataset of 100 examples. The training objective was to teach the model to ignore conversational pleasantries and strictly output a JSON object that can be directly parsed by a backend system for managing a shopping cart.

	## Intended Use & Limitations

	This model is designed for a specific task: parsing user requests in an e-commerce context. It should not be used as a general-purpose chatbot.

	- Primary Use: Backend service for intent detection from user text.
	- Out-of-Scope: General conversation, answering questions, or any task not related to adding/removing items from a list.

	## How to Use

	The model expects a prompt formatted in a specific way, following the TinyLlama-Chat template. You must provide the `Catalog` and the `User` request.

	Important: You need to install `optimum` and `auto-gptq` to run this 4-bit GPTQ model.
	```bash
	pip install -q optimum auto-gptq transformers
	```

	Here's how to run inference in Python:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

	# Model repository on the Hugging Face Hub
	model_id = "jtlicardo/tinyllama-ecommerce-intent-gptq"

	# Load the tokenizer and the 4-bit quantized model
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype=torch.float16 # Recommended for inference
	)

	# --- Define the prompt ---
	catalog = """Catalog:
	Shampoo (400ml bottle)
	Hand Soap (250ml dispenser)
	Peanut Butter (340g jar)
	Headphones
	Green Tea (25 tea bags)"""

	user_query = "Could you please take off 4 pairs of headphons from my cart?"

	# --- Format the prompt using the model's chat template ---
	# The model was trained to see this structure.
	prompt = f"<\|user\|>\n{catalog}\n\nUser:\n{user_query}\n<\|assistant\|>\n"

	# --- Generate the output ---
	pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
	outputs = pipe(
	prompt,
	max_new_tokens=50, # Max length of the JSON output
	do_sample=False, # Use deterministic output
	temperature=None, # Not needed for do_sample=False
	top_p=None, # Not needed for do_sample=False
	return_full_text=False # Only return the generated part
	)

	# The output will be a clean JSON string
	generated_json = outputs[0]['generated_text'].strip()
	print(generated_json)
	# Expected output:
	# {"action": "remove", "product": "Headphones", "quantity": 4}
	```

	## Training Procedure

	This model was fine-tuned using the `trl` library's `SFTTrainer`.

	- Method: QLoRA (4-bit quantization with LoRA adapters)
	- Dataset: A custom JSONL file with 100 `prompt`/`completion` pairs.
	- Configuration: `completion_only_loss=True` was used to ensure the model only learned to generate the assistant's JSON response.