FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference Paper β’ 2505.22758 β’ Published 27 days ago
PaTH Attention: Position Encoding via Accumulating Householder Transformations Paper β’ 2505.16381 β’ Published May 22
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Paper β’ 2502.09927 β’ Published Feb 14
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping Paper β’ 2501.06589 β’ Published Jan 11
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models Paper β’ 2409.04787 β’ Published Sep 7, 2024 β’ 1
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper β’ 2408.13359 β’ Published Aug 23, 2024 β’ 25
Enhancing Training Efficiency Using Packing with Flash Attention Paper β’ 2407.09105 β’ Published Jul 12, 2024 β’ 15
The infrastructure powering IBM's Gen AI model development Paper β’ 2407.05467 β’ Published Jul 7, 2024 β’ 2
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper β’ 2405.12981 β’ Published May 21, 2024 β’ 34
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization Paper β’ 2404.03605 β’ Published Apr 4, 2024 β’ 1
Granite Code Models: A Family of Open Foundation Models for Code Intelligence Paper β’ 2405.04324 β’ Published May 7, 2024 β’ 22
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper β’ 2211.05100 β’ Published Nov 9, 2022 β’ 32
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models Paper β’ 2404.05567 β’ Published Apr 8, 2024 β’ 10
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper β’ 2404.00399 β’ Published Mar 30, 2024 β’ 43
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback Paper β’ 2402.02479 β’ Published Feb 4, 2024 β’ 2
StarCoder 2 and The Stack v2: The Next Generation Paper β’ 2402.19173 β’ Published Feb 29, 2024 β’ 147
Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog Paper β’ 2210.07295 β’ Published Oct 13, 2022 β’ 1
Variational Learning for Unsupervised Knowledge Grounded Dialogs Paper β’ 2112.00653 β’ Published Nov 23, 2021 β’ 1
Variational Inference with Latent Space Quantization for Adversarial Resilience Paper β’ 1903.09940 β’ Published Mar 24, 2019 β’ 1