70 45 8

Mayank Mishra

mayank-mishra

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Recent Activity

authored a paper 20 days ago

FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

authored a paper 20 days ago

PaTH Attention: Position Encoding via Accumulating Householder Transformations

authored a paper 4 months ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

View all activity

Organizations

authored 2 papers 20 days ago

FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

Paper • 2505.22758 • Published 27 days ago

PaTH Attention: Position Encoding via Accumulating Householder Transformations

Paper • 2505.16381 • Published May 22

authored 2 papers 4 months ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Paper • 2502.09927 • Published Feb 14

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Paper • 2501.06589 • Published Jan 11

authored a paper 6 months ago

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Paper • 2409.04787 • Published Sep 7, 2024 • 1

authored a paper 10 months ago

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published Aug 23, 2024 • 25

authored 2 papers 11 months ago

Enhancing Training Efficiency Using Packing with Flash Attention

Paper • 2407.09105 • Published Jul 12, 2024 • 15

Scaling Granite Code Models to 128K Context

Paper • 2407.13739 • Published Jul 18, 2024 • 20

authored a paper 12 months ago

The infrastructure powering IBM's Gen AI model development

Paper • 2407.05467 • Published Jul 7, 2024 • 2

authored 6 papers about 1 year ago

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 34

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Paper • 2404.03605 • Published Apr 4, 2024 • 1

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published May 7, 2024 • 22

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 32

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Paper • 2404.05567 • Published Apr 8, 2024 • 10

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30, 2024 • 43

authored 5 papers over 1 year ago

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Paper • 2402.02479 • Published Feb 4, 2024 • 2

Mayank Mishra

AI & ML interests

Recent Activity

Organizations

mayank-mishra's activity