Questions about finetune MoE

#8
by RamboChen - opened

Is MoE's finetune also autoregressive?
How should the different sample gradients in a batch be accumulated?
Hope to get your reply!

Sign up or log in to comment