alakob
/

DraGNOME-2.5b-v1

@@ -38,7 +38,7 @@ This model is a fine-tuned version of InstaDeepAI's Nucleotide Transformer (2.5B
 This model can be used directly for predicting whether a given nucleotide sequence is associated with Antimicrobial Resistance (AMR) without additional fine-tuning.
-### Downstream Use [optional]
 The model can be further fine-tuned for specific AMR-related tasks or integrated into larger bioinformatics pipelines for genomic analysis.
@@ -71,13 +71,12 @@ sequence = "ATGC..."  # Replace with your nucleotide sequence
 inputs = tokenizer(sequence, truncation=True, max_length=1000, return_tensors="pt")
 outputs = model(**inputs)
 prediction = outputs.logits.argmax(-1).item()  # 0 = non-AMR, 1 = AMR
 ## Training Details
 ### Training Data
-The model was trained on the DraGNOME-2.5b-v1 dataset, consisting of 1200 overlapping sequences:
 - **Negative sequences (non-AMR):**
   `DSM_20231.fasta`, `ecoli-k12.fasta`, `FDA.fasta`
@@ -89,7 +88,7 @@ The model was trained on the DraGNOME-2.5b-v1 dataset, consisting of 1200 overla
 ### Training Procedure
-#### Preprocessing [optional]
 Sequences were tokenized using the Nucleotide Transformer tokenizer with a maximum length of 1000 tokens and truncation applied where necessary.
@@ -103,10 +102,10 @@ Sequences were tokenized using the Nucleotide Transformer tokenizer with a maxim
 - **Scheduler:** Linear with 10% warmup
 - **LoRA parameters:** `r=32`, `alpha=64`, `dropout=0.1`, `target_modules=["query", "value"]`
-#### Speeds, Sizes, Times [optional]
 Training was performed on Google Colab with checkpointing every 500 steps, retaining the last 3 checkpoints.
-Exact throughput and times depend on Colab's hardware allocation (typically T4 GPU).
 ---
@@ -150,7 +149,7 @@ Evaluation was performed across AMR and non-AMR classes.
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** Google Colab GPU (typically NVIDIA T4)
 - **Hours used:** [More Information Needed]
 - **Cloud Provider:** Google Colab
 - **Compute Region:** [More Information Needed]
@@ -170,7 +169,7 @@ Training was performed on Google Colab with persistent storage via Google Drive.
 #### Hardware
-- NVIDIA T4 GPU (typical Colab allocation)
 #### Software
@@ -191,7 +190,7 @@ Training was performed on Google Colab with persistent storage via Google Drive.
 ---
-## Glossary [optional]
 - **AMR:** Antimicrobial Resistance
 - **LoRA:** Low-Rank Adaptation
@@ -205,7 +204,7 @@ Training was performed on Google Colab with persistent storage via Google Drive.
 ---
-## Model Card Authors [optional]
 Blaise Alako

 This model can be used directly for predicting whether a given nucleotide sequence is associated with Antimicrobial Resistance (AMR) without additional fine-tuning.
+### Downstream Use
 The model can be further fine-tuned for specific AMR-related tasks or integrated into larger bioinformatics pipelines for genomic analysis.
 inputs = tokenizer(sequence, truncation=True, max_length=1000, return_tensors="pt")
 outputs = model(**inputs)
 prediction = outputs.logits.argmax(-1).item()  # 0 = non-AMR, 1 = AMR
+```
 ## Training Details
 ### Training Data
 - **Negative sequences (non-AMR):**
   `DSM_20231.fasta`, `ecoli-k12.fasta`, `FDA.fasta`
 ### Training Procedure
+#### Preprocessing
 Sequences were tokenized using the Nucleotide Transformer tokenizer with a maximum length of 1000 tokens and truncation applied where necessary.
 - **Scheduler:** Linear with 10% warmup
 - **LoRA parameters:** `r=32`, `alpha=64`, `dropout=0.1`, `target_modules=["query", "value"]`
+#### Speeds, Sizes, Times
 Training was performed on Google Colab with checkpointing every 500 steps, retaining the last 3 checkpoints.
+Exact throughput and times depend on Colab's hardware allocation NVIDIA A100 GPU.
 ---
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** Google Colab NVIDIA A100 GPU
 - **Hours used:** [More Information Needed]
 - **Cloud Provider:** Google Colab
 - **Compute Region:** [More Information Needed]
 #### Hardware
+- NVIDIA A100 GPU
 #### Software
 ---
+## Glossary
 - **AMR:** Antimicrobial Resistance
 - **LoRA:** Low-Rank Adaptation
 ---
+## Model Card Authors
 Blaise Alako