Batch accumulation
Batch accumulation is a technique for reducing the GPU memory requirements of machine learning training. In a normal model training step, the gradients with respect to each parameter of a machine learning model are stored, and a single update is performed. If the batch size is too small, these gradients can cause an "out-of-memory" error. With gradient accumulation, you can accumulate these gradients in place across a set of smaller batches. This trades time for memory, allowing for performing training as if there was a GPU with more VRAM, at the cost of a longer training time.
Related Articles
No items found.