Proposal for new column

#1032
by Yuma42 - opened

I think it would be interesting to have an avg score / Co2 cost column. We already have both numbers so why not combine them to show the most efficient models?

If there was such a column, the leaderboard would currently look like this:

                                fullname Average ⬆️ CO₂ cost (kg) avg_per_kg_co2
                                    gpt2  5.977737    0.03924517      152.31776
              cpayne1303/cp2024-instruct  4.319731    0.03216190      134.31209
               cpayne1303/llama-43m-beta  5.288332    0.05839185       90.56627
               cpayne1303/llama-43m-beta  5.347100    0.05991588       89.24346
                      JackFram/llama-68m  4.862635    0.06055790       80.29728
                       cpayne1303/cp2024  3.614016    0.04761306       75.90388
                   openai-community/gpt2  6.510807    0.08594126       75.75881
                            sumink/ftgpt  3.951784    0.05281752       74.81957
                  cpayne1303/smallcp2024  3.455732    0.04730794       73.04760
            postbot/gpt2-medium-emailgen  4.743048    0.07818635       60.66338
          unsloth/Phi-3-mini-4k-instruct 27.178374    0.46953311       57.88383
              tiiuae/Falcon3-7B-Instruct 34.906699    0.61876067       56.41389
              tiiuae/Falcon3-3B-Instruct 26.551992    0.48046365       55.26327
        SultanR/SmolTulu-1.7b-Reinforced 15.756606    0.28961603       54.40516
             h2oai/h2o-danube3.1-4b-chat 16.210718    0.29914058       54.19097
                   openai-community/gpt2  6.296471    0.11738690       53.63862
 suayptalha/HomerCreativeAnvita-Mix-Qw7B 34.620978    0.64988069       53.27282
          newsbang/Homer-v0.3-Qwen2.5-7B 31.088203    0.58560348       53.08746
          newsbang/Homer-v0.4-Qwen2.5-7B 33.918837    0.63972041       53.02134
               icefog72/Ice0.37-18.11-RP 21.913941    0.41451281       52.86674

Sorted in ascending order, it would look like this:

                                fullname Average ⬆️ CO₂ cost (kg) avg_per_kg_co2
           WizardLMTeam/WizardLM-13B-V1.0  4.546092    70.9775871     0.06404968
          NAPS-ai/naps-gemma-2-27b-v0.1.0  1.679602    22.6642492     0.07410799
         NAPS-ai/naps-gemma-2-27b-v-0.1.0  1.679602    11.2248610     0.14963231
       NousResearch/Yarn-Llama-2-13b-128k  8.418618    51.9357833     0.16209668
                 PygmalionAI/pygmalion-6b  5.392360    31.9231193     0.16891707
            togethercomputer/GPT-JT-6B-v1  6.827354    37.9588107     0.17986218
                  TencentARC/LLaMA-Pro-8B  8.778934    47.8077336     0.18363000
                      Qwen/Qwen2-57B-A14B 25.033873   107.0314775     0.23389262
             mistralai/Mixtral-8x22B-v0.1 25.728348   104.6973163     0.24574028
                allknowingroger/Quen2-65B  3.531344    13.3174236     0.26516723
               alpindale/WizardLM-2-8x22B 32.983523    93.3052217     0.35350136
                   bigcode/starcoder2-15b 12.551764    35.0445477     0.35816594
                   teknium/OpenHermes-13B 12.169676    31.1191167     0.39106753
                   Qwen/Qwen1.5-110B-Chat 29.224837    72.5652931     0.40273849
                        Qwen/Qwen1.5-110B 29.846266    71.2708884     0.41877218
                         Qwen/Qwen1.5-32B 27.021817    59.9671594     0.45061026
        deepseek-ai/deepseek-llm-67b-chat 26.995929    59.8218087     0.45127236
                davidkim205/Rhea-72b-v0.5  4.224031     8.6886909     0.48615279
     mistral-community/mixtral-8x22B-v0.3 25.789407    52.4944852     0.49127840
          allknowingroger/Qwen2.5-42B-AGI  4.470830     8.8569811     0.50478030

Quite interesting, actually.

Open LLM Leaderboard org

Yep, @alozowski has been working on a blog to feature something like this! Lots of interesting results from computing CO2 cost :)
We'll probably add the column once the blog is ready but with holidays arriving it will take a couple weeks :)

clefourrier changed discussion status to closed

@brankor-mcom now that the leaderboard got unfortunately archived, could you give a similar list with the most actual data?
I'm interested in the best ones overall and the best ones under 10b.

Open LLM Leaderboard org

You can easily compute it from the contents dataset :)

@Yuma42 - to produce the list of most efficient models e.g. between 5b and 10b params, one can use this R script:

library(arrow)

leaderboard_url <- "https://huggingface.co/datasets/open-llm-leaderboard/contents/resolve/refs%2Fconvert%2Fparquet/default/train/0000.parquet"

data <- read_parquet(leaderboard_url)
data$avg_per_kg_co2 <- data$"Average ⬆️" / data$"CO₂ cost (kg)"
data_subset <- data[data$"#Params (B)" >= 5 & data$"#Params (B)" < 10, ]
data_subset[order(-data_subset$avg_per_kg_co2), c("fullname", "Average ⬆️", "CO₂ cost (kg)", "avg_per_kg_co2")]

That's the entire script. I'm sure it would be fairly easy to translate it to Python+pyarrow+pandas if needed, with the help of our AI friends. :-)

According to the above criteria, the top model turns out to be Xiaojian9992024/Qwen2.5-Dyanka-7B-Preview.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment