Description

This model was trained to be better at tool calling a playwright mcp server. Most people do not have compute resources to run the top LLMs. The goal here is to allow a smaller LLM to specialize at a specific task by copying successful demonstrations of a bigger LLM. To examine the data generation head to my dataset repo https://huggingface.co/jdaddyalbs/playwright-mcp-toolcalling.

My training code for this model is in the trainer-v1.ipynb, you will need to install additional dependencies like unsloth (look at the import statements at the top)

Getting Started

Install dependencies (ollama, uv, git, npx, google-chrome)

curl -LsSf https://astral.sh/uv/install.sh | sh
curl -fsSL https://ollama.com/install.sh | sh
sudo apt install git, npx, google-chrome
npx @playwright/mcp@latest

let npx install playwright then hit ctl+c to exit

Clone this repo and install python dependencies

git clone https://huggingface.co/jdaddyalbs/qwen3_sft_playwright_gguf
cd qwen3_sft_playwright_gguf
uv sync

Pull the model into Ollama using 1 of 2 ways

  1. ollama pull hf.co/jdaddyalbs/qwen3_sft_playwright_gguf (use this method for the benchmark models like hf.co/unsloth/Qwen3-4B-GGUF:Q8_0)
  2. ollama create hf.co/jdaddyalbs/qwen3_sft_playwright_gguf -f Modelfile

Run Evaluation

We will evaluate in two parts, first we will let the models use playwright to answer the eval queries, next we will check their answers against the ground truth answers and to see if they had a correct answer.

Part 1 playwright interaction

If you are using openai models you must get an API key and set it as environment variable before running.

export OPENAI_API_KEY="PASTE_YOUR_KEY_HERE"

If everything is installed correctly you should be able to run the following from this directory

uv run evaluate-v1.py

You will see chrome browsers opening up for a while and results will be appended to a 'results.jsonl' file.

Part 2 grade answers

Edit evaluate-part2-v1.py to use your best ollama model for grading the results (I use qwen3:32b). It also will expect a results.jsonl file from eval part 1. When everything is ready run the following:

uv run evaluate-part2-v1.py

The output of this should be a csv file called eval_results.csv this you can open in excel of libreoffice to view the results.

Results

Pending....

Uploaded model

  • Developed by: jdaddyalbs
  • License: apache-2.0
  • Finetuned from model : unsloth/qwen3-4b-bnb-4bit

This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
76
GGUF
Model size
4.02B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support