rfcoder0
/

qwen3-4b-custom-Sile

Model card Files Files and versions Community

qwen3-4b-custom-Sile / README.md

rfcoder0's picture

Update README.md

be478f1 verified 12 days ago

|

history blame contribute delete

3.26 kB

	---
	license: cc-by-nc-4.0
	---
	# Qwen3-4B (Custom Fine-Tune) Sile

	[![HellaSwag acc_norm](https://img.shields.io/badge/HellaSwag_acc_norm-71.1%25-brightgreen)](#benchmark-results)
	[![ARC-Challenge acc_norm](https://img.shields.io/badge/ARC--Challenge_acc_norm-65.9%25-brightgreen)](#benchmark-results)
	![Params](https://img.shields.io/badge/Params-4B-blue)
	![Hardware](https://img.shields.io/badge/Hardware-RTX%203060%2012GB-orange)

	---

	## Model Summary
	- Author: rfcoder0
	- Model Type: Qwen3-4B base, custom fine-tuned Sile
	- Hardware Used: Single RTX 3060 (12 GB) + RTX 3070 (8gb)
	- Training: Proprietary fine-tune on a curated dataset
	- Evaluation: [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), 5-shot

	This fine-tuned Qwen3-4B demonstrates performance comparable to, and in some cases exceeding, 7B–8B parameter models on standard reasoning and commonsense benchmarks.

	---

	## Benchmark Results (5-shot)

	\| Task \| acc \| acc_norm \|
	\|---------------\|--------\|----------\|
	\| HellaSwag \| 0.540 \| 0.711 \|
	\| ARC-Challenge \| 0.615 \| 0.659 \|
	\| MMLU \| TBD \| TBD \|

	Values are mean ± stderr. Results produced locally with lm-eval-harness, batch_size=1.

	---

	## Comparison (acc_norm)

	\| Model \| Params \| HellaSwag \| ARC-Challenge \|
	\|-------------------\|--------\|-----------\|---------------\|
	\| This work \| 4B \| 0.711 \| 0.659 \|
	\| Qwen3-8B (base) \| 8B \| ~0.732 \| ~0.58 \|
	\| LLaMA-2-7B \| 7B \| ~0.70–0.72\| ~0.55–0.57 \|
	\| Mistral-7B \| 7B \| ~0.74–0.75\| ~0.60–0.62 \|

	---

	## Notes
	- These results were obtained on a single consumer GPU (RTX 3060) and RTX 3070 (8gb).
	- The fine-tune procedure and dataset remain proprietary.
	- Scores indicate that with high-quality data and efficient training, a 4B parameter model can rival or outperform 7B–8B baselines on reasoning and commonsense benchmarks.

	---

	## Usage
	Weights are not provided. This repository serves as a benchmark disclosure.
	If you wish to reproduce similar results, see [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for methodology.

	---

	## License
	This model is licensed under the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0).

	You are free to:
	- Share — copy and redistribute the material in any medium or format
	- Adapt — remix, transform, and build upon the material

	Under the following terms:
	- Attribution — You must give appropriate credit.
	- NonCommercial — You may not use the material for commercial purposes.

	Full license text: [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)
	.

	## Support
	If you find this work valuable and want to support further experiments:

	- Bitcoin: bc1q76vw4krfx24gvz73pwmhav620xe6fxkxdh0s48
	- Other: Feel free to contact me for additional options.

	---

	## Citation
	If you reference these results, please cite this repository:

	```bibtex
	@misc{rfcoder02025qwen4b,
	title = {Qwen3-4B (Sile)},
	author = {Rob Hak},
	year = {2025},
	url = {https://huggingface.co/rfcoder0/qwen3-4b-custom-Sile}
	}