Simonlee711 commited on
Commit
f191089
·
verified ·
1 Parent(s): 66b503e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -4
README.md CHANGED
@@ -58,10 +58,6 @@ This model leverages a suite of modern architectural advancements including rota
58
  | Batch Size | 128 | Batch size used during training |
59
 
60
 
61
- The pre-training regimen of Clinical ModernBERT is distinguished by a dynamic, two-phase approach that is both computationally efficient and sensitive to the unique linguistic characteristics of clinical text. In the initial phase, the model trains on sequences limited to 128 tokens using large batches and an elevated learning rate. This phase is optimized with the StableAdamW optimizer, employing a cosine learning rate schedule with a 10% warmup ratio. Mixed-precision techniques are applied to accelerate training while preserving memory efficiency.
62
-
63
- Once the model has acquired robust short-span contextual embeddings, the sequence length is extended to 8,192 tokens—a critical modification that enables the model to capture the long-range dependencies present in full-length discharge summaries and radiology reports. In this extended phase, the scaling parameter of the rotary positional embeddings is adjusted from 10,000 to 160,000, ensuring that the increased context does not compromise the relative positioning of tokens. The batch size and learning rate are reduced to maintain stability during this more computationally demanding phase. Additionally, an adaptive sampling strategy is implemented to prioritize well-structured clinical narratives, thereby intensifying the learning signal from the most informative examples.
64
-
65
  ## Masked Language Modeling (MLM) Setup
66
 
67
  Clinical ModernBERT is pre-trained using a multi-phase masked language modeling (MLM) strategy. A custom collator dynamically adjusts the masking probability—beginning at 30% and decreasing to 15% over the course of training—to emphasize medically relevant tokens (e.g., drug names, procedural codes). The MLM objective is defined as
 
58
  | Batch Size | 128 | Batch size used during training |
59
 
60
 
 
 
 
 
61
  ## Masked Language Modeling (MLM) Setup
62
 
63
  Clinical ModernBERT is pre-trained using a multi-phase masked language modeling (MLM) strategy. A custom collator dynamically adjusts the masking probability—beginning at 30% and decreasing to 15% over the course of training—to emphasize medically relevant tokens (e.g., drug names, procedural codes). The MLM objective is defined as