Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,80 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Qwen3-4B (Custom Fine-Tune) Sile
|
2 |
+
|
3 |
+
[](#benchmark-results)
|
4 |
+
[](#benchmark-results)
|
5 |
+

|
6 |
+

|
7 |
+
|
8 |
+
---
|
9 |
+
|
10 |
+
## Model Summary
|
11 |
+
- **Author:** rfcoder0
|
12 |
+
- **Model Type:** Qwen3-4B base, custom fine-tuned Sile
|
13 |
+
- **Hardware Used:** Single RTX 3060 (12 GB) + RTX 3070 (8gb)
|
14 |
+
- **Training:** Proprietary fine-tune on a curated dataset
|
15 |
+
- **Evaluation:** [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), 5-shot
|
16 |
+
|
17 |
+
This fine-tuned Qwen3-4B demonstrates performance comparable to, and in some cases exceeding, 7B–8B parameter models on standard reasoning and commonsense benchmarks.
|
18 |
+
|
19 |
+
---
|
20 |
+
|
21 |
+
## Benchmark Results (5-shot)
|
22 |
+
|
23 |
+
| Task | acc | acc_norm |
|
24 |
+
|---------------|--------|----------|
|
25 |
+
| HellaSwag | 0.540 | 0.711 |
|
26 |
+
| ARC-Challenge | 0.615 | 0.659 |
|
27 |
+
| MMLU | *TBD* | *TBD* |
|
28 |
+
|
29 |
+
*Values are mean ± stderr. Results produced locally with lm-eval-harness, batch_size=1.*
|
30 |
+
|
31 |
+
---
|
32 |
+
|
33 |
+
## Comparison (acc_norm)
|
34 |
+
|
35 |
+
| Model | Params | HellaSwag | ARC-Challenge |
|
36 |
+
|-------------------|--------|-----------|---------------|
|
37 |
+
| **This work** | 4B | **0.711** | **0.659** |
|
38 |
+
| Qwen3-8B (base) | 8B | ~0.732 | ~0.58 |
|
39 |
+
| LLaMA-2-7B | 7B | ~0.70–0.72| ~0.55–0.57 |
|
40 |
+
| Mistral-7B | 7B | ~0.74–0.75| ~0.60–0.62 |
|
41 |
+
|
42 |
+
---
|
43 |
+
|
44 |
+
## Notes
|
45 |
+
- These results were obtained on a **single consumer GPU (RTX 3060) and RTX 3070 (8gb)**.
|
46 |
+
- The fine-tune procedure and dataset remain proprietary.
|
47 |
+
- Scores indicate that with high-quality data and efficient training, a **4B parameter model can rival or outperform 7B–8B baselines** on reasoning and commonsense benchmarks.
|
48 |
+
|
49 |
+
---
|
50 |
+
|
51 |
+
## Usage
|
52 |
+
Weights are **not provided**. This repository serves as a **benchmark disclosure**.
|
53 |
+
If you wish to reproduce similar results, see [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for methodology.
|
54 |
+
|
55 |
+
---
|
56 |
+
|
57 |
+
## License
|
58 |
+
License: benchmark-only
|
59 |
+
This repo shares evaluation results only.
|
60 |
+
Model weights are not provided.
|
61 |
+
No redistribution or commercial use permitted without permission.
|
62 |
+
|
63 |
+
## Support
|
64 |
+
If you find this work valuable and want to support further experiments:
|
65 |
+
|
66 |
+
- **Bitcoin:** bc1q76vw4krfx24gvz73pwmhav620xe6fxkxdh0s48
|
67 |
+
- **Other:** Feel free to contact me for additional options.
|
68 |
+
|
69 |
+
---
|
70 |
+
|
71 |
+
## Citation
|
72 |
+
If you reference these results, please cite this repository:
|
73 |
+
|
74 |
+
```bibtex
|
75 |
+
@misc{rfcoder02025qwen4b,
|
76 |
+
title = {Qwen3-4B (Sile)},
|
77 |
+
author = {Rob Hak},
|
78 |
+
year = {2025},
|
79 |
+
url = {https://huggingface.co/rfcoder0/qwen3-4b-custom-Sile}
|
80 |
+
}
|