Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7 • 49
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7 • 49
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7 • 49 • 2
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31 • 22 • 5
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31 • 22
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31 • 22 • 5
RA-DIT: Retrieval-Augmented Dual Instruction Tuning Paper • 2310.01352 • Published Oct 2, 2023 • 7
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Paper • 2405.19325 • Published May 29 • 14
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Paper • 2405.19325 • Published May 29 • 14
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 39
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20 • 25
LEVER: Learning to Verify Language-to-Code Generation with Execution Paper • 2302.08468 • Published Feb 16, 2023 • 1
Efficient Large Scale Language Modeling with Mixtures of Experts Paper • 2112.10684 • Published Dec 20, 2021 • 2
OPT: Open Pre-trained Transformer Language Models Paper • 2205.01068 • Published May 2, 2022 • 2
Few-shot Learning with Multilingual Language Models Paper • 2112.10668 • Published Dec 20, 2021 • 1
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization Paper • 2212.12017 • Published Dec 22, 2022 • 1
Stage-wise Fine-tuning for Graph-to-Text Generation Paper • 2105.08021 • Published May 17, 2021 • 1
RA-DIT: Retrieval-Augmented Dual Instruction Tuning Paper • 2310.01352 • Published Oct 2, 2023 • 7