ML-FIGS-LDM

EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures.

Autoencoder Performance Comparison:

Note:

AutoencoderKL_SD: Stable Diffusion v1-4 autoencoder trained on LAION.
AutoencoderKL_TPL: Autoencoders trained with Text Perceptual Loss (TPL).
Model A is trained on ML-Figs.
Model B is trained on ML-Figs + SciCap.

Dataset	Method	PSNR ↑	SSIM ↑	FID ↓	LPIPS ↓	MSE ↓	TPL ↓
ML-Figs Test	`AutoencoderKL_SD`	33.01	0.970	20.51	0.022	0.003	0.043
	`AutoencoderKL_TPL` A	30.71	0.954	16.13	0.056	0.002	0.017
ML-Figs + SciCap Test	`AutoencoderKL_SD`	32.60	0.970	12.69	0.023	0.004	0.061
	`AutoencoderKL_TPL` A	29.94	0.954	9.235	0.057	0.003	0.028
	`AutoencoderKL_TPL` B	31.47	0.979	6.256	0.016	0.001	0.010

Latent Diffusion Model:

MlFigs_LDM_12.ckpt

salamnocap
/

ml-figs-ldm

ML-FIGS-LDM

EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

Autoencoder Performance Comparison:

Latent Diffusion Model:

Dataset used to train salamnocap/ml-figs-ldm