Description
This model was trained to be better at tool calling a playwright mcp server. Most people do not have compute resources to run the top LLMs. The goal here is to allow a smaller LLM to specialize at a specific task by copying successful demonstrations of a bigger LLM. To examine the data generation head to my dataset repo https://huggingface.co/jdaddyalbs/playwright-mcp-toolcalling.
My training code for this model is in the trainer-v1.ipynb, you will need to install additional dependencies like unsloth (look at the import statements at the top)
Getting Started
Install dependencies (ollama, uv, git, npx, google-chrome)
curl -LsSf https://astral.sh/uv/install.sh | sh
curl -fsSL https://ollama.com/install.sh | sh
sudo apt install git, npx, google-chrome
npx @playwright/mcp@latest
let npx install playwright then hit ctl+c to exit
Clone this repo and install python dependencies
git clone https://huggingface.co/jdaddyalbs/qwen3_sft_playwright_gguf
cd qwen3_sft_playwright_gguf
uv sync
Pull the model into Ollama using 1 of 2 ways
ollama pull hf.co/jdaddyalbs/qwen3_sft_playwright_gguf
(use this method for the benchmark models like hf.co/unsloth/Qwen3-4B-GGUF:Q8_0)ollama create hf.co/jdaddyalbs/qwen3_sft_playwright_gguf -f Modelfile
Run Evaluation
We will evaluate in two parts, first we will let the models use playwright to answer the eval queries, next we will check their answers against the ground truth answers and to see if they had a correct answer.
Part 1 playwright interaction
If you are using openai models you must get an API key and set it as environment variable before running.
export OPENAI_API_KEY="PASTE_YOUR_KEY_HERE"
If everything is installed correctly you should be able to run the following from this directory
uv run evaluate-v1.py
You will see chrome browsers opening up for a while and results will be appended to a 'results.jsonl' file.
Part 2 grade answers
Edit evaluate-part2-v1.py to use your best ollama model for grading the results (I use qwen3:32b).
It also will expect a results.jsonl
file from eval part 1.
When everything is ready run the following:
uv run evaluate-part2-v1.py
The output of this should be a csv file called eval_results.csv
this you can open in excel of libreoffice to view the results.
Results
Pending....
Uploaded model
- Developed by: jdaddyalbs
- License: apache-2.0
- Finetuned from model : unsloth/qwen3-4b-bnb-4bit
This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 76
8-bit