Think Only When You Need with Large Hybrid-Reasoning Models Paper • 2505.14631 • Published May 20 • 19
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought Paper • 2501.07542 • Published Jan 13 • 3
Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders Paper • 2104.08757 • Published Apr 18, 2021
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale Paper • 2502.16684 • Published Feb 23
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Paper • 2208.10442 • Published Aug 22, 2022
RedStone: Curating General, Code, Math, and QA Data for Large Language Models Paper • 2412.03398 • Published Dec 4, 2024 • 2
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 46
You Only Cache Once: Decoder-Decoder Architectures for Language Models Paper • 2405.05254 • Published May 8, 2024 • 10
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 103
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 171
Data Selection via Optimal Control for Language Models Paper • 2410.07064 • Published Oct 9, 2024 • 9
Self-Boosting Large Language Models with Synthetic Preference Data Paper • 2410.06961 • Published Oct 9, 2024 • 17
Kosmos-G: Generating Images in Context with Multimodal Large Language Models Paper • 2310.02992 • Published Oct 4, 2023 • 4