|
--- |
|
library_name: transformers |
|
tags: |
|
- medical |
|
license: apache-2.0 |
|
language: |
|
- fr |
|
- en |
|
base_model: |
|
- ik-ram28/BioMistral-CPT-7B |
|
- BioMistral/BioMistral-7B |
|
--- |
|
|
|
## Model Description |
|
|
|
BioMistral-CPT-SFT-7B is a French medical language model based on BioMistral-7B, adapted for French medical domain applications through a combined approach of Continual Pre-Training (CPT) followed by Supervised Fine-Tuning (SFT). |
|
|
|
## Model Details |
|
|
|
- **Model Type**: Causal Language Model |
|
- **Base Model**: BioMistral-7B |
|
- **Language**: French (adapted from English medical model) |
|
- **Domain**: Medical/Healthcare |
|
- **Parameters**: 7 billion |
|
- **License**: Apache 2.0 |
|
- **Paper**: [Adaptation des connaissances médicales pour les grands modèles de langue : Stratégies et analyse comparative](https://github.com/ikram28/medllm-strategies) |
|
|
|
## Training Details |
|
|
|
### Continual Pre-Training (CPT) |
|
- **Dataset**: NACHOS corpus (opeN crAwled frenCh Healthcare cOrpuS) |
|
- **Size**: 7.4 GB of French medical texts |
|
- **Word Count**: Over 1 billion words |
|
- **Sources**: 24 French medical websites |
|
- **Training Duration**: 2.8 epochs |
|
- **Hardware**: 32 NVIDIA H100 80GB GPUs |
|
- **Training Time**: 11 hours |
|
- **Optimizer**: AdamW |
|
- **Learning Rate**: 2e-5 |
|
- **Weight Decay**: 0.01 |
|
- **Batch Size**: 16 with gradient accumulation of 2 |
|
|
|
### Supervised Fine-Tuning (SFT) |
|
- **Dataset**: 30K French medical question-answer pairs |
|
- 10K native French medical questions |
|
- 10K translated medical questions from English resources |
|
- 10K generated questions from French medical texts |
|
- **Method**: DoRA (Weight-Decomposed Low-Rank Adaptation) |
|
- **Training Duration**: 10 epochs |
|
- **Hardware**: 1 NVIDIA H100 80GB GPU |
|
- **Training Time**: 42 hours |
|
- **Rank**: 16 |
|
- **Alpha**: 16 |
|
- **Learning Rate**: 2e-5 |
|
- **Batch Size**: 4 |
|
|
|
|
|
|
|
|
|
## Computational Impact |
|
|
|
- **Total Training Time**: 53 hours (11h CPT + 42h SFT) |
|
- **Hardware**: 32 GPU H100 + 1 GPU H100 |
|
- **Carbon Emissions**: 10.11 kgCO2e (9.04 + 1.07) |
|
|
|
|
|
|
|
## Ethical Considerations |
|
|
|
- **Medical Accuracy**: This model is for research and educational purposes only. Performance limitations make it unsuitable for critical medical applications |
|
- **Bias**: May contain biases from both English and French medical literature |
|
|
|
|
|
## Citation |
|
|
|
If you use this model, please cite: |
|
|
|
```bibtex |
|
|
|
``` |
|
|
|
## Contact |
|
|
|
For questions about this model, please contact: [email protected] |