SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training Paper • 2408.10013 • Published Aug 19, 2024
Code generation and runtime techniques for enabling data-efficient deep learning training on GPUs Paper • 2412.04747 • Published Dec 6, 2024
LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme Paper • 2407.15264 • Published Jul 21, 2024
PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses Paper • 2101.07956 • Published Jan 20, 2021
PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks Paper • 2301.06284 • Published Jan 16, 2023 • 1
TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes Paper • 2012.14363 • Published Dec 28, 2020
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture Paper • 2103.03330 • Published Mar 4, 2021