merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using CultriX/Qwen2.5-14B-Wernickev3 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: dare_ties  # Specifies the merge method as dare_ties, known for its high-performance potential.
base_model: CultriX/Qwen2.5-14B-Wernickev3  # Sets the base model, a strong multitask performer, for parameter alignment.
dtype: bfloat16  # Defines the data type for model weights as bfloat16, for efficient memory and computation.
out_dtype: bfloat16  # Sets the output data type to bfloat16 for consistency with the input.

parameters:
  epsilon: 0.008  # Fine-tunes parameter scaling, improving the quality of the merge and stability.
  lambda: 1.8  # Prioritizes high-impact parameters, useful for reasoning and multitask performance.
  normalize: true  # Ensures parameter normalization, preventing any instability during the merge.
  rescale: true  # Adjusts parameter scales across different models, improving compatibility.
  int8_mask: false  # Disables int8 masking, preserving full precision for better parameter alignment.

adaptive_merge_parameters:
  task_weights:  # Defines task weights, to emphasize different areas of model performance
    tinyArc: 1.6  # Sets a moderate priority for logical reasoning tasks.
    tinyHellaswag: 1.5  # Sets a medium priority for contextual understanding tasks.
    tinyMMLU: 1.8  # Gives a higher priority to multi-domain knowledge benchmarks.
    tinyTruthfulQA: 1.9  # Gives a higher priority to factual accuracy and QA tasks.
    tinyTruthfulQA_mc1: 1.75  # Slightly reduced priority, but still important, for multiple-choice reasoning.
    tinyWinogrande: 1.75  # Sets a medium priority for more complex contextual reasoning tasks.
    IFEval: 2.30  # Sets a high priority for instruction-following evaluation, as it is often a weak point for models.
    BBH: 2.05  # Gives a high priority to the big bench hard task, critical for complex reasoning.
    MATH: 2.70  # Sets the highest priority for mathematical reasoning tasks.
    GPQA: 2.20  # Gives a balanced priority to graduate-level question-answering tasks.
    MUSR: 2.15  # Gives a slightly lower, but still high, priority to multi-step reasoning tasks.
    MMLU-PRO: 2.00 # Gives a high priority to domain-specific multitask benchmark performance.
  smoothing_factor: 0.03  # Sets the smoothing factor, for a better blending of different task performance.

gradient_clipping:  # Defines gradient clipping values for each model, for training stability.
  CultriX/Qwen2.5-14B-Wernickev3: 0.89  # Sets the clipping value for the base model, a core component of the merge, and gives it a higher stability.
  djuna/Q2.5-Veltha-14B-0.5: 0.92  # Sets the clipping value for the djuna model, a strong performer in reasoning tasks.
  CultriX/SeQwence-14B-EvolMerge: 0.87  # Sets the clipping value for this model, which is a balanced multi-task performer.
  qingy2024/Fusion4-14B-Instruct: 0.93  # Sets the clipping value for this model, emphasizing stability for mathematical reasoning.
  CultriX/Qwen2.5-14B-Emerged: 0.88  # Sets the clipping value for this model, which provides multi-task support.
  sometimesanotion/Lamarck-14B-v0.6: 0.89  # Sets the clipping value for this model, to enhance the multi-step reasoning capabilities.
  allknowingroger/QwenSlerp6-14B: 0.90  # Sets the clipping value for this model, which supports nuanced reasoning tasks.
  hotmailuser/QwenSlerp2-14B: 0.91  # Sets the clipping value for this model, with slightly increased stability for logical reasoning tasks.

models:  # Defines all the models that are going to be included in the merge.
  - model: CultriX/Qwen2.5-14B-Wernickev3 # Defines the base model, the main backbone of the merge, that offers good multi-task capabilities.
    parameters: # Defines the weight and density that will be used for the model.
      weight: 0.32  # Sets the weight of the model to 0.32, which is the dominant contribution to the final model.
      density: 0.78  # Sets a high density of 0.78, to preserve its parameters, as this is a key component of the merge.

  - model: djuna/Q2.5-Veltha-14B-0.5  # Defines the djuna model, a strong performer in factual and reasoning tasks.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.28  # Sets the weight of the model to 0.28, to prioritize reasoning performance.
      density: 0.77  # Sets a balanced density of 0.77, to enhance its reasoning abilities.

  - model: allknowingroger/QwenSlerp6-14B  # Defines the allknowingroger model, which has good performance and reasoning capabilities.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.15  # Sets the weight of the model to 0.15, which has a moderate contribution to the final model.
      density: 0.72  # Sets a density of 0.72, for an effective parameter integration into the final model.

  - model: CultriX/SeQwence-14B-EvolMerge # Defines the CultriX/SeQwence model, which is a good multi-task contributor.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.12  # Sets the weight of the model to 0.12, a lower weight for its contribution.
      density: 0.62  # Sets a density of 0.62, for balanced performance.

  - model: qingy2024/Fusion4-14B-Instruct  # Defines the qingy model, which excels at mathematical reasoning.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.09  # Sets the weight of the model to 0.09, for a specific focus on mathematical reasoning tasks.
      density: 0.75  # Sets a density of 0.75, for preserving its strengths in mathematical tasks.

  - model: CultriX/Qwen2.5-14B-Emerged  # Defines the CultriX/Qwen2.5-14B-Emerged model, a good multi-task performer.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.08 # Sets the weight of the model to 0.08, for a smaller, but still useful, contribution.
      density: 0.69 # Sets a density of 0.69, to balance its contributions.

  - model: sometimesanotion/Lamarck-14B-v0.6 # Defines the sometimesanotion/Lamarck model, which is useful for multi-step reasoning.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.06 # Sets the weight of the model to 0.06, for a lower performing task model.
      density: 0.62 # Sets the density to 0.62 for multi-step reasoning tasks.

  - model: hotmailuser/QwenSlerp2-14B  # Defines the hotmailuser model, with strong performance in reasoning and multi-task performance.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.12  # Sets the weight of the model to 0.12, for its balanced contributions.
      density: 0.66  # Sets the density of the model to 0.66, for better parameter integration.
Downloads last month
25
Safetensors
Model size
14.8B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CultriX/Qwen2.5-14B-Hyper

Space using CultriX/Qwen2.5-14B-Hyper 1