Model Card for qwen2.5-0.5B-Instruct-pruned-Inshort

pruned(model=qwen2.5-0.5B-Instruct-Inshort) = qwen2.5-0.5B-Instruct-pruned-Inshort

Model Details

Model Description

The model qwen2.5-0.5B-Instruct-pruned-Inshort is a pruned version of qwen2.5-0.5B-Instruct-Inshort.


NOTE

This model is part of my project, where I explore pruning a capable teacher model and recovering its performance through distillation (specifically, behavior cloning) and supervised fine-tuning (SFT), focused on an Inshorts-style summarization task.


This model will act as a pruned model.

๐Ÿง  Model Configuration: Normal vs. Pruned

๐Ÿ”ง Component ๐ŸŸฉ Normal Model ๐ŸŸฆ Pruned Model
Decoder Layers 24 11
MLP Intermediate Size 4864 4096

Pruning details

This work utilizes a hybrid pruning strategy that integrates both width and depth pruning, based on methodologies outlined in LLM Pruning and Distillation in Practice: The Minitron Approach and Compact Language Models via Pruning and Knowledge Distillation.

Check out the Colab Notebook for the code.

Evaluation

The initial evaluation began with ROUGE SCORE; however, this approach was quickly abandoned as ROUGE fails to capture semantic meaning and contextual understandingโ€”both of which are crucial for evaluating abstractive summarization.

As a result, a custom evaluation pipeline was adopted. This pipeline uses an LLM-as-a-judge to assess the quality of summaries, assigning an accuracy score on a scale from 1 to 5. Side wise human evaluation on few selected datapoints were also done.

Check out the Colab Notebook for the code of custom evaluation pipeline

LLM-as-a-judge details

system_prompt_for_accuracy = '''YOU ARE A HIGHLY RELIABLE NEWS HEADLINE EVALUATION JUDGE, TRAINED TO ASSESS PREDICTED HEADLINES BASED SOLELY ON THEIR ACCURACY AND FAITHFULNESS TO THE ORIGINAL NEWS CONTENT. YOUR PRIMARY OBJECTIVE IS TO ENSURE THAT THE PREDICTED HEADLINES ARE:

1. **NOT MISLEADING OR HALLUCINATED**: The predicted headline must accurately reflect the original news content without adding false information or exaggerating details.
2. **FAITHFUL TO THE ORIGINAL NEWS CONTENT**: The headline should summarize the essence of the news while maintaining neutrality and factual correctness.

### INSTRUCTIONS ###

FOR EACH PREDICTED HEADLINE, FOLLOW THIS EVALUATION PROCESS:

1. **UNDERSTAND THE INPUTS:**
   - ORIGINAL_NEWS_CONTENT: The full news article that serves as the source.
   - PREDICTED_HEADLINE: The generated headline to be evaluated.

2. **EVALUATE FOR MISREPRESENTATION & HALLUCINATION:**
   - CHECK if the predicted headline introduces **any false claims** and **misleading phrases** that are **not supported** by the source.
   - RATE on a scale of 1-5:
     - (1) **Severely Misleading** โ€“ The headline contains major inaccuracies, false claims, or is entirely unrelated to the news content.
     - (2) **Largely Inaccurate** โ€“ The headline distorts key facts, introduces misleading implications, or exaggerates information.
     - (3) **Partially Accurate** โ€“ The headline is mostly correct but includes minor distortions,or slightly misleading phrasing.
     - (4) **Mostly Accurate** โ€“ The headline aligns well with the source but may have slight nuances or wording that could be improved.
     - (5) **Fully Accurate** โ€“ The headline is entirely faithful to the source, correctly summarizing key details with no factual distortions.

### WHAT NOT TO DO ###
- NEVER ACCEPT A HEADLINE THAT IS FACTUALLY INCORRECT OR MISLEADING.
- NEVER IGNORE SUBTLE DIFFERENCES IN MEANING THAT COULD CHANGE THE FACTUAL ACCURACY.

### OUTPUT FORMAT ###
Your evaluation should be structured as follows:
```json
{
  "predicted_headline": "...",
  "score": "X/5",
  "feedback": "..."
}
```'''

user_prompt_for_accuracy = '''News Content: {content}
Predicted Headline: {predicted_headline}
'''

Results

โœ… Accuracy Score [main evaluation criteria]

Metric Value
Accuracy Score 1.003

๐Ÿ“ ROUGE Score

Metric Score
ROUGE-1 0.0303
ROUGE-2 0.0007
ROUGE-L 0.0285
ROUGE-Lsum 0.0285

๐ŸŽฏ Accuracy-Aware ROUGE Score

Metric Score
ROUGE-1 0.0060
ROUGE-2 0.0001
ROUGE-L 0.0057
ROUGE-Lsum 0.0057

Gitub Repository

github

All Models

Downloads last month
13
Safetensors
Model size
277M params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nis12ram/qwen2.5-0.5B-Instruct-pruned-Inshort

Finetunes
1 model