rfcoder0
/

qwen3-4b-custom-Sile

Model card Files Files and versions Community

rfcoder0 commited on 9 days ago

Commit

2633737

·

verified ·

1 Parent(s): acdafbf

Update README.md

Files changed (1) hide show

README.md +80 -5

README.md CHANGED Viewed

@@ -1,5 +1,80 @@
----
-license: other
-license_name: restricted-use
-license_link: LICENSE
----

+# Qwen3-4B (Custom Fine-Tune) Sile
+[![HellaSwag acc_norm](https://img.shields.io/badge/HellaSwag_acc_norm-71.1%25-brightgreen)](#benchmark-results)
+[![ARC-Challenge acc_norm](https://img.shields.io/badge/ARC--Challenge_acc_norm-65.9%25-brightgreen)](#benchmark-results)
+![Params](https://img.shields.io/badge/Params-4B-blue)
+![Hardware](https://img.shields.io/badge/Hardware-RTX%203060%2012GB-orange)
+---
+## Model Summary
+- **Author:** rfcoder0
+- **Model Type:** Qwen3-4B base, custom fine-tuned  Sile
+- **Hardware Used:** Single RTX 3060 (12 GB) + RTX 3070 (8gb)
+- **Training:** Proprietary fine-tune on a curated dataset
+- **Evaluation:** [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), 5-shot
+This fine-tuned Qwen3-4B demonstrates performance comparable to, and in some cases exceeding, 7B–8B parameter models on standard reasoning and commonsense benchmarks.
+---
+## Benchmark Results (5-shot)
+| Task          | acc    | acc_norm |
+|---------------|--------|----------|
+| HellaSwag     | 0.540  | 0.711    |
+| ARC-Challenge | 0.615  | 0.659    |
+| MMLU          | *TBD*  | *TBD*    |
+*Values are mean ± stderr. Results produced locally with lm-eval-harness, batch_size=1.*
+---
+## Comparison (acc_norm)
+| Model             | Params | HellaSwag | ARC-Challenge |
+|-------------------|--------|-----------|---------------|
+| **This work**     | 4B     | **0.711** | **0.659**     |
+| Qwen3-8B (base)   | 8B     | ~0.732    | ~0.58         |
+| LLaMA-2-7B        | 7B     | ~0.70–0.72| ~0.55–0.57    |
+| Mistral-7B        | 7B     | ~0.74–0.75| ~0.60–0.62    |
+---
+## Notes
+- These results were obtained on a **single consumer GPU (RTX 3060) and  RTX 3070 (8gb)**.
+- The fine-tune procedure and dataset remain proprietary.
+- Scores indicate that with high-quality data and efficient training, a **4B parameter model can rival or outperform 7B–8B baselines** on reasoning and commonsense benchmarks.
+---
+## Usage
+Weights are **not provided**. This repository serves as a **benchmark disclosure**.
+If you wish to reproduce similar results, see [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for methodology.
+---
+## License
+License: benchmark-only
+This repo shares evaluation results only.
+Model weights are not provided.
+No redistribution or commercial use permitted without permission.
+## Support
+If you find this work valuable and want to support further experiments:
+- **Bitcoin:** bc1q76vw4krfx24gvz73pwmhav620xe6fxkxdh0s48
+- **Other:** Feel free to contact me for additional options.
+---
+## Citation
+If you reference these results, please cite this repository:
+```bibtex
+@misc{rfcoder02025qwen4b,
+  title  = {Qwen3-4B (Sile)},
+  author = {Rob Hak},
+  year   = {2025},
+  url    = {https://huggingface.co/rfcoder0/qwen3-4b-custom-Sile}
+}