merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the della_linear merge method using CultriX/Qwen2.5-14B-Wernickev3 as a base.
Models Merged
The following models were included in the merge:
- qingy2019/Qwen2.5-Math-14B-Instruct
- djuna/Q2.5-Veltha-14B-0.5
- CultriX/Qwenfinity-2.5-14B
- allknowingroger/QwenSlerp6-14B
- CultriX/Qwen2.5-14B-Broca
- sometimesanotion/Qwen2.5-14B-Vimarckoso
- CultriX/SeQwence-14Bv1
Configuration
The following YAML configuration was used to produce this model:
merge_method: della_linear
base_model: CultriX/Qwen2.5-14B-Wernickev3
dtype: bfloat16
parameters:
epsilon: 0.012 # Further reduced to ensure ultra-fine parameter scaling for precision.
lambda: 1.4 # Stronger emphasis on significant model contributions.
normalize: true # Balances the parameter integration for stability.
adaptive_merge_parameters:
task_weights:
tinyArc: 1.6 # Prioritizes logical reasoning improvements.
tinyHellaswag: 1.5 # Strengthened contextual understanding and consistency.
tinyMMLU: 1.65 # Enhanced domain knowledge for multitask benchmarks.
tinyTruthfulQA: 1.9 # Maximized for accurate factual reasoning and QA.
tinyTruthfulQA_mc1: 1.7 # Balanced focus for multiple-choice reasoning.
tinyWinogrande: 1.75 # Advanced reasoning and contextual prediction improvement.
IFEval: 1.9 # Instruction-following tasks boosted by multitask contributors.
BBH: 1.7 # Complex reasoning is supported by logical base models.
MATH: 2.1 # Highest priority, focusing on mathematical excellence.
GPQA: 1.8 # Boosted graduate-level QA capabilities.
MUSR: 1.9 # Nuanced multi-step reasoning strengthened further.
MMLU-PRO: 1.8 # Domain multitask performance maximized.
smoothing_factor: 0.1 # Precisely tuned for smooth task-specific blending.
gradient_clipping:
CultriX/Qwen2.5-14B-Wernickev3: 0.86 # Backbone stability with slightly reduced clipping.
CultriX/Qwenfinity-2.5-14B: 0.83 # Consistent multitask integration.
djuna/Q2.5-Veltha-14B-0.5: 0.91 # Strengthened advanced reasoning contributions.
CultriX/Qwen2.5-14B-Broca: 0.85 # Logical reasoning enhancements stabilized.
qingy2019/Qwen2.5-Math-14B-Instruct: 0.93 # Mathematically focused tasks maximized.
CultriX/SeQwence-14Bv1: 0.88 # Generalist multitask support.
sometimesanotion/Qwen2.5-14B-Vimarckoso: 0.89 # Balanced multi-step reasoning contributions.
allknowingroger/QwenSlerp6-14B: 0.87 # Contextual and logical reasoning integration refined.
models:
- model: CultriX/Qwen2.5-14B-Wernickev3
parameters:
weight: 0.26 # Core backbone for multitask reasoning.
density: 0.7 # Slight increase to preserve critical reasoning parameters.
- model: CultriX/Qwenfinity-2.5-14B
parameters:
weight: 0.23 # Comprehensive multitask performer.
density: 0.65
- model: djuna/Q2.5-Veltha-14B-0.5
parameters:
weight: 0.22 # Advanced reasoning support for GPQA and MUSR.
density: 0.72
- model: CultriX/Qwen2.5-14B-Broca
parameters:
weight: 0.15 # Logical reasoning and factual QA enhancements.
density: 0.65
- model: qingy2019/Qwen2.5-Math-14B-Instruct
parameters:
weight: 0.18 # Mathematical reasoning priority.
density: 0.73
- model: CultriX/SeQwence-14Bv1
parameters:
weight: 0.14 # Generalist multitask backbone.
density: 0.63
- model: sometimesanotion/Qwen2.5-14B-Vimarckoso
parameters:
weight: 0.12 # Multi-step reasoning tasks contributor.
density: 0.6
- model: allknowingroger/QwenSlerp6-14B
parameters:
weight: 0.1 # Contextual reasoning improvements.
density: 0.62
tokenizer_source: CultriX/Qwen2.5-14B-Wernickev3
- Downloads last month
- 17
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for CultriX/Qwen2.5-14B-Brocav3
Merge model
this model