When training with four p40 graphics cards, the following error is always reported.
RuntimeError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 23.88 GiB total capacity; 22.90 GiB already allocated; 76.25 MiB free; 23.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF