|
# Model Card for mlpf-clic-clusters-v2.1.0 |
|
|
|
This model reconstructs particles in a detector, based on the tracks and calorimeter clusters recorded by the detector. |
|
|
|
## Model Details |
|
|
|
The performance is measured with respect to generator-level jets and MET computed from Pythia particles, i.e. the truth-level jets and MET. |
|
|
|
<details> |
|
<summary>Jet performance</summary> |
|
|
|
<img src="plots_checkpoint-20-1.914489/clic_edm_ttbar_pf/jet_response_iqr_over_med_pt.png" alt="ttbar jet resolution" width="300"/> |
|
<img src="plots_checkpoint-20-1.914489/clic_edm_qq_pf/jet_response_iqr_over_med_pt.png" alt="qq jet resolution" width="300"/> |
|
<img src="plots_checkpoint-20-1.914489/clic_edm_ww_fullhad_pf/jet_response_iqr_over_med_pt.png" alt="ttbar jet resolution" width="300"/> |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>MET performance</summary> |
|
|
|
<img src="plots_checkpoint-20-1.914489/clic_edm_ttbar_pf/met_response_iqr_over_med.png" alt="ttbar MET resolution" width="300"/> |
|
<img src="plots_checkpoint-20-1.914489/clic_edm_qq_pf/met_response_iqr_over_med.png" alt="qq MET resolution" width="300"/> |
|
<img src="plots_checkpoint-20-1.914489/clic_edm_ww_fullhad_pf/met_response_iqr_over_med.png" alt="ttbar MET resolution" width="300"/> |
|
|
|
</details> |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Joosep Pata, Eric Wulff, Farouk Mokhtar, Mengke Zhang, David Southwick, Maria Girone, David Southwick, Javier Duarte, Michael Kagan |
|
- **Model type:** transformer |
|
- **License:** Apache License |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://github.com/jpata/particleflow/releases/tag/v2.1.0 |
|
|
|
## Uses |
|
### Direct Use |
|
|
|
This model may be used to study the physics and computational performance on ML-based reconstruction in simulation. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not intended for physics measurements on real data. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The model has only been trained on simulation data and has not been validated against real data. |
|
The model has not been peer reviewed or published in a peer-reviewed journal. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
``` |
|
#get the code |
|
git clone https://github.com/jpata/particleflow |
|
cd particleflow |
|
git checkout v2.1.0 |
|
|
|
#get the models |
|
git clone https://huggingface.co/jpata/particleflow models |
|
``` |
|
|
|
## Training Details |
|
Trained on 8x MI250X for 26 epochs over ~5 days. |
|
The training was continued several times from a checkpoint due to a runtime limit. |
|
|
|
### Training Data |
|
The following datasets were used: |
|
``` |
|
47G /local/joosep/mlpf/tensorflow_datasets/clic/clic_edm_qq_pf/2.5.0 |
|
93G /local/joosep/mlpf/tensorflow_datasets/clic/clic_edm_ttbar_pf/2.5.0 |
|
74G /local/joosep/mlpf/tensorflow_datasets/clic/clic_edm_ww_fullhad_pf/2.5.0 |
|
``` |
|
|
|
The datasets were generated using Key4HEP with the following scripts: |
|
- https://github.com/HEP-KBFI/key4hep-sim/releases/tag/v1.1.0 |
|
- https://github.com/HEP-KBFI/key4hep-sim/blob/v1.1.0/clic/run_sim.sh |
|
|
|
## Training Procedure |
|
|
|
```bash |
|
#!/bin/bash |
|
#SBATCH --job-name=mlpf-train |
|
#SBATCH --account=project_465000301 |
|
#SBATCH --time=3-00:00:00 |
|
#SBATCH --nodes=1 |
|
#SBATCH --ntasks-per-node=1 |
|
#SBATCH --cpus-per-task=32 |
|
#SBATCH --mem=200G |
|
#SBATCH --gpus-per-task=8 |
|
#SBATCH --partition=small-g |
|
#SBATCH --no-requeue |
|
#SBATCH -o logs/slurm-%x-%j-%N.out |
|
|
|
cd /scratch/project_465000301/particleflow |
|
|
|
module load LUMI/24.03 partition/G |
|
|
|
export IMG=/scratch/project_465000301/pytorch-rocm6.2.simg |
|
export PYTHONPATH=`pwd` |
|
export TFDS_DATA_DIR=/scratch/project_465000301/tensorflow_datasets |
|
#export MIOPEN_DISABLE_CACHE=true |
|
export MIOPEN_USER_DB_PATH=/tmp/${USER}-${SLURM_JOB_ID}-miopen-cache |
|
export MIOPEN_CUSTOM_CACHE_DIR=${MIOPEN_USER_DB_PATH} |
|
export TF_CPP_MAX_VLOG_LEVEL=-1 #to suppress ROCm fusion is enabled messages |
|
export ROCM_PATH=/opt/rocm |
|
#export NCCL_DEBUG=INFO |
|
#export MIOPEN_ENABLE_LOGGING=1 |
|
#export MIOPEN_ENABLE_LOGGING_CMD=1 |
|
#export MIOPEN_LOG_LEVEL=4 |
|
export KERAS_BACKEND=torch |
|
|
|
env |
|
|
|
#TF training |
|
singularity exec \ |
|
--rocm \ |
|
-B /scratch/project_465000301 \ |
|
-B /tmp \ |
|
--env LD_LIBRARY_PATH=/opt/rocm/lib/ \ |
|
--env CUDA_VISIBLE_DEVICES=$ROCR_VISIBLE_DEVICES \ |
|
$IMG python3 mlpf/pipeline.py --gpus 8 \ |
|
--data-dir $TFDS_DATA_DIR --config parameters/pytorch/pyg-clic.yaml \ |
|
--train --gpu-batch-multiplier 128 --num-workers 8 --prefetch-factor 100 --checkpoint-freq 1 --conv-type attention --dtype bfloat16 --lr 0.0001 --num-epochs 50 |
|
``` |
|
|
|
## Evaluation |
|
```bash |
|
#!/bin/bash |
|
#SBATCH --partition gpu |
|
#SBATCH --gres gpu:mig:1 |
|
#SBATCH --mem-per-gpu 200G |
|
#SBATCH -o logs/slurm-%x-%j-%N.out |
|
|
|
IMG=/home/software/singularity/pytorch.simg:2024-08-18 |
|
cd ~/particleflow |
|
|
|
WEIGHTS=experiments/pyg-clic_20241106_104416_929167/checkpoints/checkpoint-20-1.914489.pth |
|
singularity exec -B /scratch/persistent --nv \ |
|
--env PYTHONPATH=`pwd` \ |
|
--env KERAS_BACKEND=torch \ |
|
$IMG python3 mlpf/pipeline.py --gpus 1 \ |
|
--data-dir /scratch/persistent/joosep/tensorflow_datasets --config parameters/pytorch/pyg-clic.yaml \ |
|
--test --make-plots --gpu-batch-multiplier 100 --load $WEIGHTS --dtype bfloat16 --prefetch-factor 10 --num-workers 8 --ntest 50000 |
|
``` |
|
|
|
|
|
## Citation |
|
|
|
## Glossary |
|
|
|
- PF: particle flow reconstruction |
|
- MLPF: machine learning for particle flow |
|
- CLIC: Compact Linear Collider |
|
|
|
## Model Card Contact |
|
|
|
Joosep Pata, [email protected] |
|
|
|
## Full outputs |
|
|
|
``` |
|
/local/joosep/mlpf/results/clic/pyg-clic_20241106_104416_929167 |
|
``` |
|
|