MultiModalHackVAE

A multi-modal Variational Autoencoder trained on NetHack game states for representation learning.

Model Description

This model is a MultiModalHackVAE that learns compact representations of NetHack game states by processing:

  • Game character grids (21x79)
  • Color information
  • Game statistics (blstats)
  • Message text
  • Bag of glyphs
  • Hero information (role, race, gender, alignment)

Model Details

  • Model Type: Multi-modal Variational Autoencoder
  • Framework: PyTorch
  • Dataset: NetHack Learning Dataset
  • Latent Dimensions: 96
  • Low-rank Dimensions: 0

Usage

from train import load_model_from_huggingface
import torch

# Load the model
model = load_model_from_huggingface("CatkinChen/nethack-vae")

# Example usage with synthetic data
batch_size = 1
game_chars = torch.randint(32, 127, (batch_size, 21, 79))
game_colors = torch.randint(0, 16, (batch_size, 21, 79))
blstats = torch.randn(batch_size, 27)
msg_tokens = torch.randint(0, 128, (batch_size, 256))
hero_info = torch.randint(0, 10, (batch_size, 4))

with torch.no_grad():
    output = model(
        glyph_chars=game_chars,
        glyph_colors=game_colors,
        blstats=blstats,
        msg_tokens=msg_tokens,
        hero_info=hero_info
    )
    latent_mean = output['mu']
    latent_logvar = output['logvar']
    lowrank_factors = output['lowrank_factors']

Training

This model was trained using adaptive loss weighting with:

  • Embedding warm-up for quick convergence
  • Gradual raw reconstruction focus
  • KL beta annealing for better latent structure

Citation

If you use this model, please consider citing:

@misc{nethack-vae,
  title={MultiModalHackVAE: Multi-modal Variational Autoencoder for NetHack},
  author={Xu Chen},
  year={2025},
  url={https://huggingface.co/CatkinChen/nethack-vae}
}
Downloads last month
769
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support