Collections
Discover the best community collections!
Collections trending this week
-
Sparse Backpropagation for MoE Training
Paper • 2310.00811 • Published • 2 -
The Forward-Forward Algorithm: Some Preliminary Investigations
Paper • 2212.13345 • Published • 2 -
Fine-Tuning Language Models with Just Forward Passes
Paper • 2305.17333 • Published • 3 -
Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation
Paper • 2309.13192 • Published • 1
-
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Paper • 2210.17432 • Published • 1 -
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Paper • 2305.08379 • Published • 3 -
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Paper • 2308.12219 • Published • 1 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 73
-
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
Paper • 2302.06218 • Published • 1 -
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Paper • 2306.10209 • Published • 2 -
SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System
Paper • 2205.10034 • Published • 1 -
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Paper • 2303.06318 • Published • 1
-
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Paper • 2210.17432 • Published • 1 -
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Paper • 2305.08379 • Published • 3 -
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Paper • 2308.12219 • Published • 1 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 73
-
Sparse Backpropagation for MoE Training
Paper • 2310.00811 • Published • 2 -
The Forward-Forward Algorithm: Some Preliminary Investigations
Paper • 2212.13345 • Published • 2 -
Fine-Tuning Language Models with Just Forward Passes
Paper • 2305.17333 • Published • 3 -
Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation
Paper • 2309.13192 • Published • 1
-
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
Paper • 2302.06218 • Published • 1 -
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Paper • 2306.10209 • Published • 2 -
SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System
Paper • 2205.10034 • Published • 1 -
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Paper • 2303.06318 • Published • 1