MultiModalHackVAE
A multi-modal Variational Autoencoder trained on NetHack game states for representation learning.
Model Description
This model is a MultiModalHackVAE that learns compact representations of NetHack game states by processing:
- Game character grids (21x79)
- Color information
- Game statistics (blstats)
- Message text
- Bag of glyphs
- Hero information (role, race, gender, alignment)
Model Details
- Model Type: Multi-modal Variational Autoencoder
- Framework: PyTorch
- Dataset: NetHack Learning Dataset
- Latent Dimensions: 96
- Low-rank Dimensions: 0
Usage
from train import load_model_from_huggingface
import torch
# Load the model
model = load_model_from_huggingface("CatkinChen/nethack-vae")
# Example usage with synthetic data
batch_size = 1
game_chars = torch.randint(32, 127, (batch_size, 21, 79))
game_colors = torch.randint(0, 16, (batch_size, 21, 79))
blstats = torch.randn(batch_size, 27)
msg_tokens = torch.randint(0, 128, (batch_size, 256))
hero_info = torch.randint(0, 10, (batch_size, 4))
with torch.no_grad():
output = model(
glyph_chars=game_chars,
glyph_colors=game_colors,
blstats=blstats,
msg_tokens=msg_tokens,
hero_info=hero_info
)
latent_mean = output['mu']
latent_logvar = output['logvar']
lowrank_factors = output['lowrank_factors']
Training
This model was trained using adaptive loss weighting with:
- Embedding warm-up for quick convergence
- Gradual raw reconstruction focus
- KL beta annealing for better latent structure
Citation
If you use this model, please consider citing:
@misc{nethack-vae,
title={MultiModalHackVAE: Multi-modal Variational Autoencoder for NetHack},
author={Xu Chen},
year={2025},
url={https://huggingface.co/CatkinChen/nethack-vae}
}
- Downloads last month
- 769