ML-FIGS-LDM
EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS
ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures.
Autoencoder Performance Comparison:
Note:
AutoencoderKL_SD
: Stable Diffusion v1-4 autoencoder trained on LAION.AutoencoderKL_TPL
: Autoencoders trained with Text Perceptual Loss (TPL).- Model A is trained on ML-Figs.
- Model B is trained on ML-Figs + SciCap.
Dataset | Method | PSNR β | SSIM β | FID β | LPIPS β | MSE β | TPL β |
---|---|---|---|---|---|---|---|
ML-Figs Test | AutoencoderKL_SD |
33.01 | 0.970 | 20.51 | 0.022 | 0.003 | 0.043 |
AutoencoderKL_TPL A |
30.71 | 0.954 | 16.13 | 0.056 | 0.002 | 0.017 | |
ML-Figs + SciCap Test | AutoencoderKL_SD |
32.60 | 0.970 | 12.69 | 0.023 | 0.004 | 0.061 |
AutoencoderKL_TPL A |
29.94 | 0.954 | 9.235 | 0.057 | 0.003 | 0.028 | |
AutoencoderKL_TPL B |
31.47 | 0.979 | 6.256 | 0.016 | 0.001 | 0.010 |
Latent Diffusion Model:
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support