learn-abc
/

html-model-tinyllama-chat-bnb-4bit-F32-GGUF

text-generation-inference

Model card Files Files and versions

html-model-tinyllama-chat-bnb-4bit-F32-GGUF / README.md

learn-abc's picture

Update README.md

57eb066 verified 3 days ago

|

history blame contribute delete

2.87 kB

	---
	base_model: learn-abc/html-model-tinyllama-chat-bnb-4bit
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	- llama-cpp
	- gguf-my-lora
	license: apache-2.0
	language:
	- en
	---

	# learn-abc/html-model-tinyllama-chat-bnb-4bit-F32-GGUF
	This LoRA adapter was converted to GGUF format from [`learn-abc/html-model-tinyllama-chat-bnb-4bit`](https://huggingface.co/learn-abc/html-model-tinyllama-chat-bnb-4bit) via the ggml.ai's [GGUF-my-lora](https://huggingface.co/spaces/ggml-org/gguf-my-lora) space.
	Refer to the [original adapter repository](https://huggingface.co/learn-abc/html-model-tinyllama-chat-bnb-4bit) for more details.

	# Fine-tuned TinyLlama for JSON Extraction (GGUF)

	This repository contains a fine-tuned version of the `unsloth/tinyllama-chat-bnb-4bit` model, specifically trained for extracting product information from HTML snippets and outputting it in a JSON format. This is the GGUF quantized version for use with tools like `llama.cpp` or other compatible inference engines.

	## Model Details

	- Base Model: `learn-abc/html-model-tinyllama-chat-bnb-4bit`
	- Fine-tuning Method: LoRA (Low-Rank Adaptation)
	- Quantization: q4_k_m GGUF
	- Trained on: A custom dataset of HTML product snippets and their corresponding JSON representations.

	## Usage

	This model can be used for tasks involving structured data extraction from HTML content using GGUF compatible software.

	### Downloading and using the GGUF file

	You can download the GGUF file directly from the "Files and versions" tab on this repository page.

	To use this file with `llama.cpp`, you generally follow these steps:

	1. Download `llama.cpp`: Clone the `llama.cpp` repository and build it. Follow the instructions in the `llama.cpp` README for building on your specific platform.

	## Use with llama.cpp

	```bash
	# with cli
	llama-cli -m base_model.gguf --lora html-model-tinyllama-chat-bnb-4bit-f32.gguf (...other args)

	# with server
	llama-server -m base_model.gguf --lora html-model-tinyllama-chat-bnb-4bit-f32.gguf (...other args)
	```


	## Use python script
	### Install llama.cpp
	```bash
	pip install llama-cpp-python
	```
	### Python script to run the model
	```python
	from llama_cpp import Llama

	# Replace with the actual path to your downloaded GGUF file
	model_path = "/path/to/your/downloaded/html-model-tinyllama-chat-bnb-4bit-F32-GGUF.gguf"

	llm = Llama(model_path=model_path)

	prompt = "Extract the product information:\n<div class='product'><h2>iPad Air</h2><span class='price'>$1344</span><span class='category'>audio</span><span class='brand'>Dell</span></div>"

	output = llm(prompt, max_tokens=256, temperature=0.7)

	print(output["choices"][0]["text"])
	```

	To know more about LoRA usage with llama.cpp server, refer to the [llama.cpp server documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md).