File size: 2,939 Bytes
7610605
 
 
 
 
 
 
 
 
 
 
 
 
458dbd2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
305bafc
 
 
 
458dbd2
 
 
 
 
 
 
d896897
 
 
 
 
 
 
 
305bafc
 
458dbd2
 
7610605
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
base_model: unsloth/qwen3-4b-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3
- gguf
license: apache-2.0
language:
- en
---

# Description
This model was trained to be better at tool calling a playwright mcp server. Most people do not have compute resources to run the top LLMs.
The goal here is to allow a smaller LLM to specialize at a specific task by copying successful demonstrations of a bigger LLM.
To examine the data generation head to my dataset repo https://huggingface.co/jdaddyalbs/playwright-mcp-toolcalling.


My training code for this model is in the trainer-v1.ipynb, you will need to install additional dependencies like unsloth (look at the import statements at the top)

# Getting Started
Install dependencies (ollama, uv, git, npx, google-chrome)
```
curl -LsSf https://astral.sh/uv/install.sh | sh
curl -fsSL https://ollama.com/install.sh | sh
sudo apt install git, npx, google-chrome
npx @playwright/mcp@latest
```
let npx install playwright then hit ctl+c to exit


Clone this repo and install python dependencies
```
git clone https://huggingface.co/jdaddyalbs/qwen3_sft_playwright_gguf
cd qwen3_sft_playwright_gguf
uv sync
```


Pull the model into Ollama using 1 of 2 ways
1. `ollama pull hf.co/jdaddyalbs/qwen3_sft_playwright_gguf` (use this method for the benchmark models like hf.co/unsloth/Qwen3-4B-GGUF:Q8_0)
2. `ollama create hf.co/jdaddyalbs/qwen3_sft_playwright_gguf -f Modelfile`

# Run Evaluation
We will evaluate in two parts, first we will let the models use playwright to answer the eval queries, 
next we will check their answers against the ground truth answers and to see if they had a correct answer.
## Part 1 playwright interaction
If you are using openai models you must get an API key and set it as environment variable before running.
```
export OPENAI_API_KEY="PASTE_YOUR_KEY_HERE"
```
If everything is installed correctly you should be able to run the following from this directory
```
uv run evaluate-v1.py
```
You will see chrome browsers opening up for a while and results will be appended to a 'results.jsonl' file.

## Part 2 grade answers
Edit evaluate-part2-v1.py to use your best ollama model for grading the results (I use qwen3:32b).
It also will expect a `results.jsonl` file from eval part 1.
When everything is ready run the following:
```
uv run evaluate-part2-v1.py
```
The output of this should be a csv file called `eval_results.csv` this you can open in excel of libreoffice to view the results.

# Results
Pending....


# Uploaded  model

- **Developed by:** jdaddyalbs
- **License:** apache-2.0
- **Finetuned from model :** unsloth/qwen3-4b-bnb-4bit

This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)