Apurv

0xe69756

https://vermaapurv.com/aboutme/

verma_apurv5

AI & ML interests

None yet

Recent Activity

upvoted an article 14 days ago

Accelerating LLM Inference: Fast Sampling with Gumbel-Max Trick

liked a dataset 25 days ago

KingNish/reasoning-base-20k

View all activity

Organizations

0xe69756's activity

upvoted an article 14 days ago

Article

Accelerating LLM Inference: Fast Sampling with Gumbel-Max Trick

•

Oct 24

• 10

liked a dataset 25 days ago

KingNish/reasoning-base-20k

Viewer • Updated Oct 5 • 19.9k • 1.58k • 194

upvoted a paper 3 months ago

Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Paper • 2407.14937 • Published Jul 20 • 1

authored 4 papers 5 months ago

Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

Paper • 2203.12574 • Published Mar 23, 2022

updated a collection 5 months ago

Safe AI

Collection

1 item • Updated Nov 17

reacted to osanseviero's post with 👍❤️ 8 months ago

Post

Mixture of experts: beware 🛡️⚔️

New paper by DeepMind: Buffer Overflow in MoE Buffer Overflow in Mixture of Experts (2402.05526)

The paper shows an adversarial attack strategy in which a user sends malicious queries that can affect the output of other user queries from the same batch.

So if in the same batch we have
- User A benign query
- User B malicious query
The response for A might be altered!😱

How is this possible?
One approach is to fill the token buffers with adversarial data, hence forcing the gating to use the non-ideal experts or to entirely drop the bening tokens (in the case of finite limit size).

This assumes that the adversary can use the model as a black-box but can observe the logit outputs + ensure that the data is always grouped in the same batch.

How to mitigate this?
- Randomize batch order (and even run twice if some queries are very sensitive)
- Use a large capacity slack
- Sample from gate weights instead of top-k (not great IMO, as that require more memory for inference)

Very cool paper!!

621 replies