Zyphra
/

Zamba2-1.2B-Instruct-v2

Text Generation

Transformers

Safetensors

zamba2

conversational

Model card Files Files and versions Community

pglo commited on Mar 18

Commit

a8ee144

verified ·

1 Parent(s): 5cd87da

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -27

README.md CHANGED Viewed

@@ -1,13 +1,5 @@
 ---
 license: apache-2.0
-datasets:
-- HuggingFaceTB/smoltalk
-- TIGER-Lab/MathInstruct
-- nvidia/OpenMathInstruct-2
-- argilla/ifeval-like-data
-- allenai/llama-3.1-tulu-3-70b-preference-mixture
-- jondurbin/gutenberg-dpo-v0.1
-- jondurbin/truthy-dpo-v0.1
 base_model:
 - Zyphra/Zamba2-1.2B
 library_name: transformers
@@ -16,17 +8,7 @@ library_name: transformers
 # Model Card for Zamba2-1.2B-Instruct-v2
-Zamba2-1.2B-Instruct-v2 is derived from the base [Zamba2-1.2B](https://huggingface.co/Zyphra/Zamba2-1.2B) model through fine-tuning on instruction-following and conversational datasets. Specifically, the fine-tuning process involved:
-1. **Supervised Fine-Tuning (SFT)** on the following datasets for 1 epoch:
-   - [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
-   - [TIGER-Lab/MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct)
-   - [nvidia/OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2)
-   - [argilla/ifeval-like-data](https://huggingface.co/datasets/argilla/ifeval-like-data)
-2. **Direct Preference Optimization (DPO)** was conducted in two stages:
-   - **First-stage DPO**: The model was trained on a subset of the [allenai/llama-3.1-tulu-3-70b-preference-mixture](https://huggingface.co/datasets/allenai/llama-3.1-tulu-3-70b-preference-mixture) dataset for 2 epochs.
-   - **Second-stage DPO**: The model underwent an additional epoch of training using [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1), represented three times, and [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1), represented once.
 Zamba2-1.2B-Instruct-v2 is a hybrid model composed of state-space ([Mamba2](https://github.com/state-spaces/mamba)) and transformer blocks.
@@ -70,22 +52,19 @@ print((tokenizer.decode(outputs[0])))
 ## Performance
-Zamba2-1.2B-Instruct-v2 achieves leading instruction-following and multi-turn chat performance for a model of its size and matches strong models significantly larger. For instance, Zamba2-1.2B-Instruct-v2 outperforms Gemma2-2B-Instruct, a very strong model over 2x its size.
-<center>
-<img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/ceOUHVeJPhBgwTDCsR9Y6.png" width="900"/>
-</center>
 | Model | Size | IFEval | BBH | GPQA | MATH_hard | MMLU_pro | MUSR | Aggregate |
-|-------|------|--------|-----|------|-----------|----------|------|-----------|
 | Zamba2-1.2B-Instruct-v2 | 1.22B | 66.505 | 15.3259 | 1.0933 | 3.59 | 12.89 | 1.5917 | 16.8326 |
 | Gemma-2-2b-it | 2.51B | 19.76 | 24.42 | 2.58 | 1.04 | 25.80 | 7.16 | 13.46 |
 | SmolLM2-1.7B-Instruct | 1.71B | 53.00 | 18.30 | 3.51 | 4.89 | 20.51 | 4.53 | 17.46 |
 | Qwen-2.5-1.5B-Instruct | 1.54B | 43.74 | 24.72 | 0.80 | 19.11 | 27.23 | 4.45 | 20.01 |
 | Llama-3.2-1B-Instruct | 1.24B | 56.88 | 16.65 | 2.03 | 6.85 | 17.79 | 1.68 | 16.98 |
-Moreover, due to its unique hybrid SSM architecture, Zamba2-1.2B-Instruct-v2 achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
 <center>
 <img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/tQ-j1krA634EfTU1Lp3E7.png" width="700" alt="Zamba performance">
@@ -114,4 +93,11 @@ Zamba2-1.2B utilizes and extends our original Zamba hybrid SSM-attention archite
 </center>
-A standalone Pytorch implementation of Zamba2-1.2B-Instruct-v2 may be found [here](https://github.com/Zyphra/Zamba2).

 ---
 license: apache-2.0
 base_model:
 - Zyphra/Zamba2-1.2B
 library_name: transformers
 # Model Card for Zamba2-1.2B-Instruct-v2
+Zamba2-1.2B-Instruct-v2 is derived from the base [Zamba2-1.2B](https://huggingface.co/Zyphra/Zamba2-1.2B) model through SFT and DPO training on instruction-following and conversational datasets.
 Zamba2-1.2B-Instruct-v2 is a hybrid model composed of state-space ([Mamba2](https://github.com/state-spaces/mamba)) and transformer blocks.
 ## Performance
+Zamba2-1.2B-Instruct-v2 achieves leading instruction-following performance for a model of its size and surpasses models of significantly larger size. For instance, Zamba2-1.2B-Instruct-v2 outperforms Gemma2-2B-Instruct, a very strong model over 2x its size.
 | Model | Size | IFEval | BBH | GPQA | MATH_hard | MMLU_pro | MUSR | Aggregate |
+|:-------|:------:|:--------:|:-----:|:------:|:-----------:|:----------:|:------:|:-----------:|
 | Zamba2-1.2B-Instruct-v2 | 1.22B | 66.505 | 15.3259 | 1.0933 | 3.59 | 12.89 | 1.5917 | 16.8326 |
+| Zamba2-1.2B-Instruct	| 1.22B | 41.76	| 17.49 | 1.73 | 2.75 | 14.69 | 2.44 | 13.48 |
 | Gemma-2-2b-it | 2.51B | 19.76 | 24.42 | 2.58 | 1.04 | 25.80 | 7.16 | 13.46 |
 | SmolLM2-1.7B-Instruct | 1.71B | 53.00 | 18.30 | 3.51 | 4.89 | 20.51 | 4.53 | 17.46 |
 | Qwen-2.5-1.5B-Instruct | 1.54B | 43.74 | 24.72 | 0.80 | 19.11 | 27.23 | 4.45 | 20.01 |
 | Llama-3.2-1B-Instruct | 1.24B | 56.88 | 16.65 | 2.03 | 6.85 | 17.79 | 1.68 | 16.98 |
+Due to its unique hybrid SSM architecture, Zamba2-1.2B-Instruct-v2 achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
 <center>
 <img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/tQ-j1krA634EfTU1Lp3E7.png" width="700" alt="Zamba performance">
 </center>
+A standalone Pytorch implementation of Zamba2-1.2B may be found [here](https://github.com/Zyphra/Zamba2).
+## Training Recipe
+Zamba2-1.2B-Instruct-v2 was trained on a mix of publicly available dataset including instruction-following and chat data. We experimented with various training approaches and found that the best recipe was as follows:
+1) SFT for one epoch on core chat, reasoning and math datasets such as [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) and [nvidia/OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2)
+2) DPO for 3 epochs on core alignment datasets including a subset of [allenai/llama-3.1-tulu-3-70b-preference-mixture](https://huggingface.co/datasets/allenai/llama-3.1-tulu-3-70b-preference-mixture)
+3) DPO on very high quality preference datasets such as [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1) and [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1)