Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7, 2024 • 52
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31, 2024 • 23
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Paper • 2405.19325 • Published May 29, 2024 • 14
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 43
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20, 2024 • 27
LEVER: Learning to Verify Language-to-Code Generation with Execution Paper • 2302.08468 • Published Feb 16, 2023 • 1
Efficient Large Scale Language Modeling with Mixtures of Experts Paper • 2112.10684 • Published Dec 20, 2021 • 2
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization Paper • 2212.12017 • Published Dec 22, 2022 • 1
In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 30