VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published 23 days ago • 59
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 22 days ago • 118
Balancing Pipeline Parallelism with Vocabulary Parallelism Paper • 2411.05288 • Published Nov 8 • 19
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Paper • 2410.20650 • Published Oct 28 • 16
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published Oct 30 • 24
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Paper • 2411.02335 • Published Nov 4 • 11
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published Nov 4 • 23
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models Paper • 2411.03884 • Published Nov 6 • 26
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14 • 54
What Matters in Transformers? Not All Attention is Needed Paper • 2406.15786 • Published Jun 22 • 29
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published Oct 21 • 58
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7 • 49