ML-FIGS-LDM

EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures.

Autoencoder Performance Comparison:

Note:

  • AutoencoderKL_SD: Stable Diffusion v1-4 autoencoder trained on LAION.
  • AutoencoderKL_TPL: Autoencoders trained with Text Perceptual Loss (TPL).
  • Model A is trained on ML-Figs.
  • Model B is trained on ML-Figs + SciCap.
Dataset Method PSNR ↑ SSIM ↑ FID ↓ LPIPS ↓ MSE ↓ TPL ↓
ML-Figs Test AutoencoderKL_SD 33.01 0.970 20.51 0.022 0.003 0.043
AutoencoderKL_TPL A 30.71 0.954 16.13 0.056 0.002 0.017
ML-Figs + SciCap Test AutoencoderKL_SD 32.60 0.970 12.69 0.023 0.004 0.061
AutoencoderKL_TPL A 29.94 0.954 9.235 0.057 0.003 0.028
AutoencoderKL_TPL B 31.47 0.979 6.256 0.016 0.001 0.010

Latent Diffusion Model:

MlFigs_LDM_12.ckpt

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train salamnocap/ml-figs-ldm