Apurv's picture
2 1

Apurv

0xe69756
·

AI & ML interests

None yet

Recent Activity

Organizations

Bloomberg's profile picture Georgia Tech (Georgia Institute of Technology)'s profile picture New Jersey Institute of Technology's profile picture

0xe69756's activity

upvoted an article 14 days ago
view article
Article

Accelerating LLM Inference: Fast Sampling with Gumbel-Max Trick

By cxdu
10
updated a collection 5 months ago
reacted to osanseviero's post with 👍❤️ 8 months ago
view post
Post
Mixture of experts: beware 🛡️⚔️

New paper by DeepMind: Buffer Overflow in MoE Buffer Overflow in Mixture of Experts (2402.05526)

The paper shows an adversarial attack strategy in which a user sends malicious queries that can affect the output of other user queries from the same batch.

So if in the same batch we have
- User A benign query
- User B malicious query
The response for A might be altered!😱

How is this possible?
One approach is to fill the token buffers with adversarial data, hence forcing the gating to use the non-ideal experts or to entirely drop the bening tokens (in the case of finite limit size).

This assumes that the adversary can use the model as a black-box but can observe the logit outputs + ensure that the data is always grouped in the same batch.

How to mitigate this?
- Randomize batch order (and even run twice if some queries are very sensitive)
- Use a large capacity slack
- Sample from gate weights instead of top-k (not great IMO, as that require more memory for inference)

Very cool paper!!
  • 621 replies
·