---
language:
- en
license: mit
library_name: transformers
base_model:
- Qwen/Qwen2.5-32B-Instruct
datasets:
- Magpie-Align/Magpie-Pro-300K-Filtered
model-index:
- name: TheBeagle-v2beta-32B-MGS
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 45.03
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 58.07
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 39.43
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 20.13
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 24.5
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 54.57
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
      name: Open LLM Leaderboard
---

# TheBeagle-v2beta-32B-MGS
This model is an experimental version of our latest innovation: `MGS`. Its up to you to figure out what does it means, but its very explicit.
We didn't applied our known `UNA` algorithm to the forward pass, but they are entirely compatible and operates in different parts of the neural network and in different ways, tho they both can be seen as a regularization technique.


## MGS
MGS stands for... Many-Geeks-Searching... and thats it. Hint: `1+1 is 2, and 1+1 is not 3`

We still believe on 1-Epoch should be enough, so we just did 1 Epoch only.

## Dataset
Used here the first decent (corpora & size) dataset on the hub: `Magpie-Align/Magpie-Pro-300K-Filtered`
Kudos to the Magpie team to contribute with some decent stuff that I personally think is very good to ablate.

It achieves the following results on the evaluation set:
- Loss: 0.5378 (1 Epoch), outperforming the baseline model.
## Quants

[All versions available](https://huggingface.co/fblgit/TheBeagle-v2beta-MGS-GGUF/tree/main)

EXL2 by bartowski:
https://huggingface.co/bartowski/TheBeagle-v2beta-32B-MGS-GGUF

  
## Training
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 25
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 9.8642        | 0.0012 | 1    | 0.7195          |
| 2.077         | 0.0507 | 42   | 0.6161          |
| 1.0325        | 0.1014 | 84   | 0.6093          |
| 0.8945        | 0.1520 | 126  | 0.5962          |
| 0.8532        | 0.2027 | 168  | 0.5869          |
| 0.8185        | 0.2534 | 210  | 0.5805          |
| 0.81          | 0.3041 | 252  | 0.5719          |
| 0.7901        | 0.3548 | 294  | 0.5663          |
| 0.7766        | 0.4054 | 336  | 0.5618          |
| 0.7687        | 0.4561 | 378  | 0.5590          |
| 0.7443        | 0.5068 | 420  | 0.5564          |
| 0.7494        | 0.5575 | 462  | 0.5525          |
| 0.7787        | 0.6081 | 504  | 0.5485          |
| 0.7381        | 0.6588 | 546  | 0.5466          |
| 0.7359        | 0.7095 | 588  | 0.5444          |
| 0.7447        | 0.7602 | 630  | 0.5435          |
| 0.7378        | 0.8109 | 672  | 0.5415          |
| 0.7302        | 0.8615 | 714  | 0.5398          |
| 0.7476        | 0.9122 | 756  | 0.5391          |
| 0.715         | 0.9629 | 798  | 0.5378          |


# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__TheBeagle-v2beta-32B-MGS)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |40.29|
|IFEval (0-Shot)    |45.03|
|BBH (3-Shot)       |58.07|
|MATH Lvl 5 (4-Shot)|39.43|
|GPQA (0-shot)      |20.13|
|MuSR (0-shot)      |24.50|
|MMLU-PRO (5-shot)  |54.57|

## Thanks
- Qwen Team for their outstanding model
- MagPie Team for contributing plenty of datasets
- Cybertron Cloud Compute