YOYO-AI
/

Qwen2.5-14B-YOYO-Average

Text Generation

Model card Files Files and versions

Qwen2.5-14B-YOYO-Average / README.md

YOYO-AI's picture

Update README.md

4f6ca06 verified about 1 month ago

|

history blame contribute delete

1.6 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model:
	- Qwen/Qwen2.5-14B-Instruct
	- Qwen/Qwen2.5-14B-Instruct-1M
	- arcee-ai/Virtuoso-Small-v2
	- deepcogito/cogito-v1-preview-qwen-14B
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
	pipeline_tag: text-generation
	tags:
	- merge
	---
	> We have used the Karcher merging method to average five of the most representative Qwen2.5 14B derivative models, in commemoration of the efforts made by the open-source community for the Qwen2.5 14B model.
	# Merge Method
	This model was merged using the [Karcher Mean](https://github.com/arcee-ai/mergekit/blob/main/docs/merge_methods.md#karcher-mean-karcher) merge method.

	# Models Merged
	The following models were included in the merge:

	* [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
	* [Qwen/Qwen2.5-14B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-1M)
	* [arcee-ai/Virtuoso-Small-v2](https://huggingface.co/arcee-ai/Virtuoso-Small-v2)
	* [deepcogito/cogito-v1-preview-qwen-14B](https://huggingface.co/deepcogito/cogito-v1-preview-qwen-14B)
	* [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

	# Configuration:
	The following YAML configuration was used to produce this model:

	```yaml
	models:
	- model: Qwen/Qwen2.5-14B-Instruct
	- model: Qwen/Qwen2.5-14B-Instruct-1M
	- model: arcee-ai/Virtuoso-Small-v2
	- model: deepcogito/cogito-v1-preview-qwen-14B
	- model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
	merge_method: karcher
	parameters:
	max_iter: 1000
	dtype: bfloat16
	tokenizer_source: base
	```