WORLDMEM: Long-term Consistent World Simulation with Memory Paper • 2504.12369 • Published 4 days ago • 28
Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published 3 days ago • 20
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published 3 days ago • 38
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers Paper • 2504.10483 • Published 6 days ago • 19
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published 13 days ago • 114
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 6 days ago • 226
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published 9 days ago • 38
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published 9 days ago • 46
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 9 days ago • 118
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published 13 days ago • 79
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper • 2504.02160 • Published 18 days ago • 33
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 12 days ago • 144
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 12 days ago • 101
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published 20 days ago • 52