Commit 
							
							ยท
						
						27f3da5
	
1
								Parent(s):
							
							672339b
								
Commit before claude code
Browse files- about.py +31 -17
- app.py +25 -34
- assets/prediction_explainer.png +2 -2
- assets/prediction_explainer_cv.png +3 -0
    	
        about.py
    CHANGED
    
    | @@ -19,13 +19,25 @@ Antibodies have to be manufacturable, stable in high concentrations, and have lo | |
| 19 | 
             
            Properties such as these can often hinder the progression of an antibody to the clinic, and are collectively referred to as 'developability'.
         | 
| 20 | 
             
            Here we invite the community to submit and develop better predictors, which will be tested out on a heldout private set to assess model generalization.
         | 
| 21 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 22 | 
             
            #### ๐ Prizes
         | 
| 23 |  | 
| 24 | 
             
            For each of the 5 properties in the competition, there is a prize for the model with the highest performance for that property on the private test set.
         | 
| 25 | 
             
            There is also an 'open-source' prize for the best model trained on the GDPa1 dataset of monoclonal antibodies (reporting cross-validation results) and assessed on the private test set where authors provide all training code and data.
         | 
| 26 | 
            -
            For each of these 6 prizes, participants have the choice between | 
|  | |
|  | |
| 27 |  | 
| 28 | 
             
            See the "{FAQ_TAB_NAME}" tab above (you are currently on the "{ABOUT_TAB_NAME}" tab) or the [competition terms]({TERMS_URL}) for more details.
         | 
|  | |
|  | |
| 29 | 
             
            """
         | 
| 30 |  | 
| 31 | 
             
            ABOUT_TEXT = f"""
         | 
| @@ -34,13 +46,15 @@ ABOUT_TEXT = f""" | |
| 34 |  | 
| 35 | 
             
            1. **Create a Hugging Face account** [here](https://huggingface.co/join) if you don't have one yet (this is used to track unique submissions and to access the GDPa1 dataset).
         | 
| 36 | 
             
            2. **Register your team** on the [Competition Registration](https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition) page.
         | 
| 37 | 
            -
            3. **Build a model**  | 
| 38 | 
            -
            4. ** | 
| 39 | 
            -
             | 
| 40 | 
            -
                - Track 2 (Train from scratch): Train a model using cross-validation on the `GDPa1` dataset and submit cross-validation predictions by selecting `GDPa1_cross_validation`.
         | 
| 41 | 
            -
            5. **Submit to the "Final Exam"**. Once you have submitted predictions on the validation set, download the private test set sequences from the {SUBMIT_TAB_NAME} tab and submit your final predictions. Your performance on this private set will determine the winners.
         | 
| 42 |  | 
| 43 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
| 44 |  | 
| 45 | 
             
            #### Acknowledgements
         | 
| 46 |  | 
| @@ -53,6 +67,8 @@ We gratefully acknowledge [Tamarind Bio](https://www.tamarind.bio/)'s help in ru | |
| 53 |  | 
| 54 | 
             
            We're working on getting more public models added, so that participants have more precomputed features to use for modeling.
         | 
| 55 |  | 
|  | |
|  | |
| 56 | 
             
            #### How to contribute?
         | 
| 57 |  | 
| 58 | 
             
            We'd like to add some more existing developability models to the leaderboard. Some examples of models we'd like to add:
         | 
| @@ -62,6 +78,8 @@ We'd like to add some more existing developability models to the leaderboard. So | |
| 62 |  | 
| 63 | 
             
            If you would like to form a team or discuss ideas, join the [Slack community]({SLACK_URL}) co-hosted by Bits in Bio.
         | 
| 64 | 
             
            """
         | 
|  | |
|  | |
| 65 |  | 
| 66 | 
             
            # Note(Lood): Significance: Add another note of "many models are trained on different datasets, and differing train/test splits, so this is a consistent way of comparing for a heldout set"
         | 
| 67 | 
             
            FAQS = {
         | 
| @@ -98,7 +116,7 @@ FAQS = { | |
| 98 | 
             
                ),
         | 
| 99 | 
             
                "How exactly can I evaluate my model?": (
         | 
| 100 | 
             
                    "You can easily calculate the Spearman correlation coefficient on the GDPa1 dataset yourself before uploading to the leaderboard. "
         | 
| 101 | 
            -
                    "Simply use the `spearmanr(predictions, targets, nan_policy='omit')` function from `scipy.stats | 
| 102 | 
             
                    "For the heldout private set, we will calculate these Spearman correlations privately at the end of the competition (and possibly at other points throughout the competition) - but there will not be 'rolling results' on the private test set to prevent test set leakage."
         | 
| 103 | 
             
                ),
         | 
| 104 | 
             
                "How often does the leaderboard update?": (
         | 
| @@ -114,7 +132,7 @@ FAQS = { | |
| 114 | 
             
                    "We reserve the right to award the open-source prize to a predictor with competitive results for a subset of properties (e.g. a top polyreactivity model)."
         | 
| 115 | 
             
                ),
         | 
| 116 | 
             
                "How does the open-source prize work?": (
         | 
| 117 | 
            -
                    "Participants who open-source their code and methods will be eligible for the open-source prize (as well as the other prizes)."
         | 
| 118 | 
             
                ),
         | 
| 119 | 
             
                "What do I need to submit?": (
         | 
| 120 | 
             
                    'There is a tab on the Hugging Face competition page to upload predictions for datasets - for each dataset participants need to submit a CSV containing a column for each property they would like to predict (e.g. called "HIC"), '
         | 
| @@ -124,11 +142,8 @@ FAQS = { | |
| 124 | 
             
                "Can I submit predictions for only one property?": (
         | 
| 125 | 
             
                    "Yes. You do not need to predict all 5 properties to participate. Each property has its own leaderboard and prize, so you may submit models for a subset of the assays if you wish."
         | 
| 126 | 
             
                ),
         | 
| 127 | 
            -
                "Can I switch between Track 1 and Track 2 during the competition?": (
         | 
| 128 | 
            -
                    "Yes. You may submit to both tracks. For example, you can benchmark an existing model on the GDPa1 dataset (Track 1) and later also train and submit a cross-validation model on GDPa1 (Track 2)."
         | 
| 129 | 
            -
                ),
         | 
| 130 | 
             
                "Are participants required to use the provided cross-validation splits?": (
         | 
| 131 | 
            -
                    "Yes,  | 
| 132 | 
             
                ),
         | 
| 133 | 
             
                "Are there any country restrictions for prize eligibility?": (
         | 
| 134 | 
             
                    "Yes. Due to applicable laws, prizes cannot be awarded to participants from countries under U.S. sanctions. See the competition terms for details."
         | 
| @@ -141,8 +156,6 @@ FAQS = { | |
| 141 |  | 
| 142 | 
             
            SUBMIT_INTRUCTIONS = f"""
         | 
| 143 | 
             
            # Antibody Developability Submission
         | 
| 144 | 
            -
            Upload CSV files to get your scores!
         | 
| 145 | 
            -
            List of valid property names: `{', '.join(ASSAY_LIST)}`.
         | 
| 146 |  | 
| 147 | 
             
            You do **not** need to predict all 5 properties โ each property has its own leaderboard and prize.
         | 
| 148 |  | 
| @@ -151,15 +164,16 @@ You do **not** need to predict all 5 properties โ each property has its own le | |
| 151 | 
             
               - **GDPa1 Cross-Validation predictions** (using cross-validation folds)
         | 
| 152 | 
             
               - **Private Test Set predictions** (final test submission)
         | 
| 153 | 
             
            2. Each CSV should contain `antibody_name` + one column per property you are predicting (e.g. `"antibody_name,Titer,PR_CHO"` if your model predicts Titer and Polyreactivity).
         | 
|  | |
| 154 |  | 
| 155 | 
            -
            The GDPa1 results should appear on the leaderboard within a minute, and can also be calculated manually  | 
| 156 | 
             
            We may release private test set results at intermediate points during the competition.
         | 
| 157 |  | 
| 158 | 
             
            ## Cross-validation
         | 
| 159 |  | 
| 160 | 
             
            For the GDPa1 cross-validation predictions, use the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column to split the dataset into folds and make predictions for each of the folds.
         | 
| 161 | 
             
            Submit a CSV file in the same format but also containing the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column.
         | 
| 162 | 
            -
            Check out our tutorial on  | 
| 163 |  | 
| 164 | 
             
            Submissions close on **1 November 2025**.
         | 
| 165 | 
             
            """
         | 
|  | |
| 19 | 
             
            Properties such as these can often hinder the progression of an antibody to the clinic, and are collectively referred to as 'developability'.
         | 
| 20 | 
             
            Here we invite the community to submit and develop better predictors, which will be tested out on a heldout private set to assess model generalization.
         | 
| 21 |  | 
| 22 | 
            +
            #### ๐งฌ Developability properties in this competition
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            1. ๐ง Hydrophobicity
         | 
| 25 | 
            +
            2. ๐ฏ Polyreactivity
         | 
| 26 | 
            +
            3. ๐งฒ Self-association
         | 
| 27 | 
            +
            4. ๐ก๏ธ Thermostability
         | 
| 28 | 
            +
            5. ๐งช Titer
         | 
| 29 | 
            +
             | 
| 30 | 
             
            #### ๐ Prizes
         | 
| 31 |  | 
| 32 | 
             
            For each of the 5 properties in the competition, there is a prize for the model with the highest performance for that property on the private test set.
         | 
| 33 | 
             
            There is also an 'open-source' prize for the best model trained on the GDPa1 dataset of monoclonal antibodies (reporting cross-validation results) and assessed on the private test set where authors provide all training code and data.
         | 
| 34 | 
            +
            For each of these 6 prizes, participants have the choice between
         | 
| 35 | 
            +
            - **$10 000 in data generation credits** with [Ginkgo Datapoints](https://datapoints.ginkgo.bio/), or
         | 
| 36 | 
            +
            - A **$2000 cash prize**.
         | 
| 37 |  | 
| 38 | 
             
            See the "{FAQ_TAB_NAME}" tab above (you are currently on the "{ABOUT_TAB_NAME}" tab) or the [competition terms]({TERMS_URL}) for more details.
         | 
| 39 | 
            +
             | 
| 40 | 
            +
            ---
         | 
| 41 | 
             
            """
         | 
| 42 |  | 
| 43 | 
             
            ABOUT_TEXT = f"""
         | 
|  | |
| 46 |  | 
| 47 | 
             
            1. **Create a Hugging Face account** [here](https://huggingface.co/join) if you don't have one yet (this is used to track unique submissions and to access the GDPa1 dataset).
         | 
| 48 | 
             
            2. **Register your team** on the [Competition Registration](https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition) page.
         | 
| 49 | 
            +
            3. **Build a model** using cross-validation on the [GDPa1](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1) dataset, using the `hierarchical_cluster_IgG_isotype_stratified_fold` column to split the dataset into folds, and write out all cross-validation predictions to a CSV file.
         | 
| 50 | 
            +
            4. **Use your model to make predictions** on the private test set (download the 80 private test set sequences from the {SUBMIT_TAB_NAME} tab).
         | 
| 51 | 
            +
            5. **Submit your training and test set predictions** on the {SUBMIT_TAB_NAME} tab by uploading both your cross-validation and private test set CSV files.
         | 
|  | |
|  | |
| 52 |  | 
| 53 | 
            +
            Check out our introductory tutorial on training an antibody developability prediction model with cross-validation [here]({TUTORIAL_URL}).
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            โฐ Submissions close on **1 November 2025**.
         | 
| 56 | 
            +
             | 
| 57 | 
            +
            ---
         | 
| 58 |  | 
| 59 | 
             
            #### Acknowledgements
         | 
| 60 |  | 
|  | |
| 67 |  | 
| 68 | 
             
            We're working on getting more public models added, so that participants have more precomputed features to use for modeling.
         | 
| 69 |  | 
| 70 | 
            +
            ---
         | 
| 71 | 
            +
             | 
| 72 | 
             
            #### How to contribute?
         | 
| 73 |  | 
| 74 | 
             
            We'd like to add some more existing developability models to the leaderboard. Some examples of models we'd like to add:
         | 
|  | |
| 78 |  | 
| 79 | 
             
            If you would like to form a team or discuss ideas, join the [Slack community]({SLACK_URL}) co-hosted by Bits in Bio.
         | 
| 80 | 
             
            """
         | 
| 81 | 
            +
            # TODO(Lood): Add "๐ The first test set results will be released on October 13th, ahead of the final submission deadline on November 1st."
         | 
| 82 | 
            +
             | 
| 83 |  | 
| 84 | 
             
            # Note(Lood): Significance: Add another note of "many models are trained on different datasets, and differing train/test splits, so this is a consistent way of comparing for a heldout set"
         | 
| 85 | 
             
            FAQS = {
         | 
|  | |
| 116 | 
             
                ),
         | 
| 117 | 
             
                "How exactly can I evaluate my model?": (
         | 
| 118 | 
             
                    "You can easily calculate the Spearman correlation coefficient on the GDPa1 dataset yourself before uploading to the leaderboard. "
         | 
| 119 | 
            +
                    "Simply use the `spearmanr(predictions, targets, nan_policy='omit')` function from `scipy.stats` to calculate the Spearman correlation coefficient for each of the 5 folds, and then take the average."
         | 
| 120 | 
             
                    "For the heldout private set, we will calculate these Spearman correlations privately at the end of the competition (and possibly at other points throughout the competition) - but there will not be 'rolling results' on the private test set to prevent test set leakage."
         | 
| 121 | 
             
                ),
         | 
| 122 | 
             
                "How often does the leaderboard update?": (
         | 
|  | |
| 132 | 
             
                    "We reserve the right to award the open-source prize to a predictor with competitive results for a subset of properties (e.g. a top polyreactivity model)."
         | 
| 133 | 
             
                ),
         | 
| 134 | 
             
                "How does the open-source prize work?": (
         | 
| 135 | 
            +
                    "Participants who open-source their training code and methods will be eligible for the open-source prize (as well as the other prizes)."
         | 
| 136 | 
             
                ),
         | 
| 137 | 
             
                "What do I need to submit?": (
         | 
| 138 | 
             
                    'There is a tab on the Hugging Face competition page to upload predictions for datasets - for each dataset participants need to submit a CSV containing a column for each property they would like to predict (e.g. called "HIC"), '
         | 
|  | |
| 142 | 
             
                "Can I submit predictions for only one property?": (
         | 
| 143 | 
             
                    "Yes. You do not need to predict all 5 properties to participate. Each property has its own leaderboard and prize, so you may submit models for a subset of the assays if you wish."
         | 
| 144 | 
             
                ),
         | 
|  | |
|  | |
|  | |
| 145 | 
             
                "Are participants required to use the provided cross-validation splits?": (
         | 
| 146 | 
            +
                    "Yes, to ensure fair comparison between different trained models. The results will be calculated by taking the average Spearman correlation coefficient across all folds."
         | 
| 147 | 
             
                ),
         | 
| 148 | 
             
                "Are there any country restrictions for prize eligibility?": (
         | 
| 149 | 
             
                    "Yes. Due to applicable laws, prizes cannot be awarded to participants from countries under U.S. sanctions. See the competition terms for details."
         | 
|  | |
| 156 |  | 
| 157 | 
             
            SUBMIT_INTRUCTIONS = f"""
         | 
| 158 | 
             
            # Antibody Developability Submission
         | 
|  | |
|  | |
| 159 |  | 
| 160 | 
             
            You do **not** need to predict all 5 properties โ each property has its own leaderboard and prize.
         | 
| 161 |  | 
|  | |
| 164 | 
             
               - **GDPa1 Cross-Validation predictions** (using cross-validation folds)
         | 
| 165 | 
             
               - **Private Test Set predictions** (final test submission)
         | 
| 166 | 
             
            2. Each CSV should contain `antibody_name` + one column per property you are predicting (e.g. `"antibody_name,Titer,PR_CHO"` if your model predicts Titer and Polyreactivity).
         | 
| 167 | 
            +
               - List of valid property names: `{', '.join(ASSAY_LIST)}`.
         | 
| 168 |  | 
| 169 | 
            +
            The GDPa1 results should appear on the leaderboard within a minute, and can also be calculated manually using Spearman rank correlation. The **private test set results will not appear on the leaderboards at first**, and will be used to determine the winners at the close of the competition.
         | 
| 170 | 
             
            We may release private test set results at intermediate points during the competition.
         | 
| 171 |  | 
| 172 | 
             
            ## Cross-validation
         | 
| 173 |  | 
| 174 | 
             
            For the GDPa1 cross-validation predictions, use the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column to split the dataset into folds and make predictions for each of the folds.
         | 
| 175 | 
             
            Submit a CSV file in the same format but also containing the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column.
         | 
| 176 | 
            +
            Check out our tutorial on training an antibody developability prediction model with cross-validation [here]({TUTORIAL_URL}).
         | 
| 177 |  | 
| 178 | 
             
            Submissions close on **1 November 2025**.
         | 
| 179 | 
             
            """
         | 
    	
        app.py
    CHANGED
    
    | @@ -50,7 +50,6 @@ def get_leaderboard_object(assay: str | None = None): | |
| 50 | 
             
                filter_columns = ["dataset"]
         | 
| 51 | 
             
                if assay is None:
         | 
| 52 | 
             
                    filter_columns.append("property")
         | 
| 53 | 
            -
                # TODO how to sort filter columns alphabetically?
         | 
| 54 | 
             
                # Bug: Can't leave search_columns empty because then it says "Column None not found in headers"
         | 
| 55 | 
             
                # Note(Lood): Would be nice to make it clear that the Search Column is searching on model name
         | 
| 56 | 
             
                current_dataframe = pd.read_csv("debug-current-results.csv")
         | 
| @@ -101,11 +100,6 @@ async def periodic_data_fetch(app): | |
| 101 | 
             
                event.set()
         | 
| 102 | 
             
                t.join(3)
         | 
| 103 |  | 
| 104 | 
            -
             | 
| 105 | 
            -
            # Lood: Two problems currently:
         | 
| 106 | 
            -
            # 1. The data_version state value isn't being incremented, it seems (even though it's triggering the dataframe change correctly)
         | 
| 107 | 
            -
            # 2. The global current_dataframe is being shared across all sessions
         | 
| 108 | 
            -
             | 
| 109 | 
             
            # Make font size bigger using gradio theme
         | 
| 110 | 
             
            with gr.Blocks(theme=gr.themes.Default(text_size=sizes.text_lg)) as demo:
         | 
| 111 | 
             
                timer = gr.Timer(3)  # Run every 3 seconds when page is focused
         | 
| @@ -131,6 +125,7 @@ with gr.Blocks(theme=gr.themes.Default(text_size=sizes.text_lg)) as demo: | |
| 131 | 
             
                            show_label=False,
         | 
| 132 | 
             
                            show_download_button=False,
         | 
| 133 | 
             
                            show_share_button=False,
         | 
|  | |
| 134 | 
             
                            width="25vw",  # Take up the width of the column (2/8 = 1/4)
         | 
| 135 | 
             
                        )
         | 
| 136 |  | 
| @@ -138,30 +133,34 @@ with gr.Blocks(theme=gr.themes.Default(text_size=sizes.text_lg)) as demo: | |
| 138 | 
             
                    with gr.TabItem(ABOUT_TAB_NAME, elem_id="abdev-benchmark-tab-table"):
         | 
| 139 | 
             
                        gr.Markdown(ABOUT_INTRO)
         | 
| 140 | 
             
                        gr.Image(
         | 
| 141 | 
            -
                            value="./assets/ | 
| 142 | 
             
                            show_label=False,
         | 
| 143 | 
             
                            show_download_button=False,
         | 
| 144 | 
             
                            show_share_button=False,
         | 
| 145 | 
            -
                             | 
|  | |
| 146 | 
             
                        )
         | 
| 147 | 
             
                        gr.Markdown(ABOUT_TEXT)
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 148 |  | 
| 149 | 
            -
                    # Procedurally make these 5 tabs
         | 
| 150 | 
            -
                    # for i, assay in enumerate(ASSAY_LIST):
         | 
| 151 | 
            -
                    #     with gr.TabItem(
         | 
| 152 | 
            -
                    #         f"{ASSAY_EMOJIS[assay]} {ASSAY_RENAME[assay]}",
         | 
| 153 | 
            -
                    #         elem_id="abdev-benchmark-tab-table",
         | 
| 154 | 
            -
                    #     ) as tab_item:
         | 
| 155 | 
            -
                    #         gr.Markdown(f"# {ASSAY_DESCRIPTION[assay]}")
         | 
| 156 | 
            -
                    #         lb = get_leaderboard_object(assay=assay)
         | 
| 157 | 
            -
             | 
| 158 | 
            -
                    #         def refresh_leaderboard(assay=assay):
         | 
| 159 | 
            -
                    #             return format_leaderboard_table(df_results=current_dataframe, assay=assay)
         | 
| 160 | 
            -
             | 
| 161 | 
            -
                    #         # Refresh when data version changes
         | 
| 162 | 
            -
                    #         data_version.change(fn=refresh_leaderboard, outputs=lb)
         | 
| 163 | 
            -
             | 
| 164 | 
            -
                    # Note(Lood): Trying out just one leaderboard. We could also have a dropdown here that shows different leaderboards for each property, but that's just the same as the filters
         | 
| 165 | 
             
                    with gr.TabItem(
         | 
| 166 | 
             
                        "๐ Leaderboard", elem_id="abdev-benchmark-tab-table"
         | 
| 167 | 
             
                    ) as leaderboard_tab:
         | 
| @@ -171,18 +170,13 @@ with gr.Blocks(theme=gr.themes.Default(text_size=sizes.text_lg)) as demo: | |
| 171 | 
             
                            Each property has its own prize, and participants can submit models for any combination of properties.
         | 
| 172 |  | 
| 173 | 
             
                            **Note**: It is *easy to overfit* the public GDPa1 dataset, which results in artificially high Spearman correlations.
         | 
| 174 | 
            -
                            We would suggest training using cross-validation  | 
| 175 | 
             
                            """
         | 
| 176 | 
             
                        )
         | 
| 177 | 
             
                        lb = get_leaderboard_object()
         | 
| 178 | 
             
                        timer.tick(fn=refresh_overall_leaderboard, outputs=lb)
         | 
| 179 | 
             
                        demo.load(fn=refresh_overall_leaderboard, outputs=lb)
         | 
| 180 |  | 
| 181 | 
            -
                        # At the bottom of the leaderboard, we can keep as NaN and explain missing test set results
         | 
| 182 | 
            -
                        # gr.Markdown(
         | 
| 183 | 
            -
                        #     "_โน๏ธ Results for the private test set will not be shown here and will be used for final judging at the close of the competition._"
         | 
| 184 | 
            -
                        # )
         | 
| 185 | 
            -
             | 
| 186 | 
             
                    with gr.TabItem(SUBMIT_TAB_NAME, elem_id="boundary-benchmark-tab-table"):
         | 
| 187 | 
             
                        gr.Markdown(SUBMIT_INTRUCTIONS)
         | 
| 188 |  | 
| @@ -218,9 +212,6 @@ with gr.Blocks(theme=gr.themes.Default(text_size=sizes.text_lg)) as demo: | |
| 218 |  | 
| 219 | 
             
                            with gr.Column():
         | 
| 220 | 
             
                                gr.Markdown("### Upload Both Submission Files")
         | 
| 221 | 
            -
                                gr.Markdown(
         | 
| 222 | 
            -
                                    "**Both CSV files are required** - you cannot submit without uploading both files."
         | 
| 223 | 
            -
                                )
         | 
| 224 |  | 
| 225 | 
             
                                # GDPa1 Cross-validation file
         | 
| 226 | 
             
                                gr.Markdown("**GDPa1 Cross-Validation Predictions:**")
         | 
| @@ -281,5 +272,5 @@ with gr.Blocks(theme=gr.themes.Default(text_size=sizes.text_lg)) as demo: | |
| 281 |  | 
| 282 | 
             
            if __name__ == "__main__":
         | 
| 283 | 
             
                demo.launch(
         | 
| 284 | 
            -
                    ssr_mode=False,  | 
| 285 | 
             
                )
         | 
|  | |
| 50 | 
             
                filter_columns = ["dataset"]
         | 
| 51 | 
             
                if assay is None:
         | 
| 52 | 
             
                    filter_columns.append("property")
         | 
|  | |
| 53 | 
             
                # Bug: Can't leave search_columns empty because then it says "Column None not found in headers"
         | 
| 54 | 
             
                # Note(Lood): Would be nice to make it clear that the Search Column is searching on model name
         | 
| 55 | 
             
                current_dataframe = pd.read_csv("debug-current-results.csv")
         | 
|  | |
| 100 | 
             
                event.set()
         | 
| 101 | 
             
                t.join(3)
         | 
| 102 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
| 103 | 
             
            # Make font size bigger using gradio theme
         | 
| 104 | 
             
            with gr.Blocks(theme=gr.themes.Default(text_size=sizes.text_lg)) as demo:
         | 
| 105 | 
             
                timer = gr.Timer(3)  # Run every 3 seconds when page is focused
         | 
|  | |
| 125 | 
             
                            show_label=False,
         | 
| 126 | 
             
                            show_download_button=False,
         | 
| 127 | 
             
                            show_share_button=False,
         | 
| 128 | 
            +
                            show_fullscreen_button=False,
         | 
| 129 | 
             
                            width="25vw",  # Take up the width of the column (2/8 = 1/4)
         | 
| 130 | 
             
                        )
         | 
| 131 |  | 
|  | |
| 133 | 
             
                    with gr.TabItem(ABOUT_TAB_NAME, elem_id="abdev-benchmark-tab-table"):
         | 
| 134 | 
             
                        gr.Markdown(ABOUT_INTRO)
         | 
| 135 | 
             
                        gr.Image(
         | 
| 136 | 
            +
                            value="./assets/prediction_explainer_cv.png",
         | 
| 137 | 
             
                            show_label=False,
         | 
| 138 | 
             
                            show_download_button=False,
         | 
| 139 | 
             
                            show_share_button=False,
         | 
| 140 | 
            +
                            show_fullscreen_button=False,
         | 
| 141 | 
            +
                            width="30vw",
         | 
| 142 | 
             
                        )
         | 
| 143 | 
             
                        gr.Markdown(ABOUT_TEXT)
         | 
| 144 | 
            +
                        
         | 
| 145 | 
            +
                        # Sequence download buttons
         | 
| 146 | 
            +
                        gr.Markdown(
         | 
| 147 | 
            +
                        """### ๐ฅ Download Sequences
         | 
| 148 | 
            +
                        The GDPa1 dataset (with assay data and sequences) is available on Hugging Face [here](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1), 
         | 
| 149 | 
            +
                        but we provide this and the private test set for convenience.""")
         | 
| 150 | 
            +
                        with gr.Row():
         | 
| 151 | 
            +
                            with gr.Column():
         | 
| 152 | 
            +
                                download_button_cv_about = gr.DownloadButton(
         | 
| 153 | 
            +
                                    label="๐ฅ Download GDPa1 sequences",
         | 
| 154 | 
            +
                                    value=SEQUENCES_FILE_DICT["GDPa1_cross_validation"],
         | 
| 155 | 
            +
                                    variant="secondary",
         | 
| 156 | 
            +
                                )
         | 
| 157 | 
            +
                            with gr.Column():
         | 
| 158 | 
            +
                                download_button_test_about = gr.DownloadButton(
         | 
| 159 | 
            +
                                    label="๐ฅ Download Private Test Set sequences",
         | 
| 160 | 
            +
                                    value=SEQUENCES_FILE_DICT["Heldout Test Set"],
         | 
| 161 | 
            +
                                    variant="secondary",
         | 
| 162 | 
            +
                                )
         | 
| 163 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 164 | 
             
                    with gr.TabItem(
         | 
| 165 | 
             
                        "๐ Leaderboard", elem_id="abdev-benchmark-tab-table"
         | 
| 166 | 
             
                    ) as leaderboard_tab:
         | 
|  | |
| 170 | 
             
                            Each property has its own prize, and participants can submit models for any combination of properties.
         | 
| 171 |  | 
| 172 | 
             
                            **Note**: It is *easy to overfit* the public GDPa1 dataset, which results in artificially high Spearman correlations.
         | 
| 173 | 
            +
                            We would suggest training using cross-validation to give a better indication of the model's performance on the eventual private test set.
         | 
| 174 | 
             
                            """
         | 
| 175 | 
             
                        )
         | 
| 176 | 
             
                        lb = get_leaderboard_object()
         | 
| 177 | 
             
                        timer.tick(fn=refresh_overall_leaderboard, outputs=lb)
         | 
| 178 | 
             
                        demo.load(fn=refresh_overall_leaderboard, outputs=lb)
         | 
| 179 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
| 180 | 
             
                    with gr.TabItem(SUBMIT_TAB_NAME, elem_id="boundary-benchmark-tab-table"):
         | 
| 181 | 
             
                        gr.Markdown(SUBMIT_INTRUCTIONS)
         | 
| 182 |  | 
|  | |
| 212 |  | 
| 213 | 
             
                            with gr.Column():
         | 
| 214 | 
             
                                gr.Markdown("### Upload Both Submission Files")
         | 
|  | |
|  | |
|  | |
| 215 |  | 
| 216 | 
             
                                # GDPa1 Cross-validation file
         | 
| 217 | 
             
                                gr.Markdown("**GDPa1 Cross-Validation Predictions:**")
         | 
|  | |
| 272 |  | 
| 273 | 
             
            if __name__ == "__main__":
         | 
| 274 | 
             
                demo.launch(
         | 
| 275 | 
            +
                    ssr_mode=False, app_kwargs={"lifespan": periodic_data_fetch}
         | 
| 276 | 
             
                )
         | 
    	
        assets/prediction_explainer.png
    CHANGED
    
    |   | 
| Git LFS Details
 | 
|   | 
| Git LFS Details
 | 
    	
        assets/prediction_explainer_cv.png
    ADDED
    
    |   | 
| Git LFS Details
 | 
 
			
