awnr's picture
Adding Evaluation Results (#1)
efa8944 verified
metadata
license: apache-2.0
model-index:
  - name: Mistral-7B-v0.1-signtensors-5-over-16
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 21.18
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=awnr/Mistral-7B-v0.1-signtensors-5-over-16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 17.54
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=awnr/Mistral-7B-v0.1-signtensors-5-over-16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 2.19
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=awnr/Mistral-7B-v0.1-signtensors-5-over-16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 4.14
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=awnr/Mistral-7B-v0.1-signtensors-5-over-16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 6.14
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=awnr/Mistral-7B-v0.1-signtensors-5-over-16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 21.75
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=awnr/Mistral-7B-v0.1-signtensors-5-over-16
          name: Open LLM Leaderboard

Model Card for Model Mistral-7B-v0.1-5-over-16

I'm experimenting with the weight matrices in neural networks. This is a clone of Mistral-7B-v0.1 with some weight matrices replaced.

I'm interested in seeing how the adjustmenets affect performance on existing metrics.

Model Details

Research in progress! Demons could come out of your nose if you use this.

Model Description

A modification of mistralai/Mistral-7B-v0.1. Thanks to their team for sharing their model.

  • Modified by: Dr. Alex W. Neal Riasanovsky
  • Model type: pre-trained
  • Language(s) (NLP): English
  • License: Apache-2.0

Bias, Risks, and Limitations

Use your own risk. I have no idea what this model's biases and limitations are. I just want to see if the benchmark values are similar to those from Mistral-7B-v0.1. I am setting up a long computational experiment to test some ideas.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 12.16
IFEval (0-Shot) 21.18
BBH (3-Shot) 17.54
MATH Lvl 5 (4-Shot) 2.19
GPQA (0-shot) 4.14
MuSR (0-shot) 6.14
MMLU-PRO (5-shot) 21.75