CUDA out of memory even there is no running process and it train only half of data


This Content is from Stack Overflow. Question asked by Noura Fayez

I train modal to predict the image label, and I’m using Pytorch 1.12.1 and Cuda 11.6 driver with one GPU Card NVIDIA GeForce RTX 3060 Laptop GPU with the following info

Utilization 0%
Dedicated GPU memory    0.0/6.0 GB
Shared GPU memory   0.0/7.9 GB
GPU Memory  0.0/13.9 GB

my training data is 12000 images with size 244*244,

The problem is when I set the following configuration


it shows the error

RuntimeError: CUDA out of memory. Tried to allocate 784.00 MiB (GPU 0; 6.00 GiB total capacity; 5.20 GiB already allocated; 0 bytes free; 5.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I check the Nvidia status using the following command

and the result is as follows, which indicates the usage of Nvidia is zero and no running process


Moreover, I run the following command and it gives me the following result which indicates that my GPU is empty

| NVIDIA-SMI 516.94       Driver Version: 516.94       CUDA Version: 11.7     |
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   41C    P0    23W /  N/A |      0MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |

I googled the issue and they I find two solutions:

  1. reduce the batch_size:
    so I tried to set the configuration to


And it trained only half of the training data which cause reducing the accuracy of the model. Also, as much as I increase the imgs_per_gpu value, the training data decrease

  1. cleaning the torch cache:
    I run the following code and it’s not work:

    import gc import torch gc.collect() torch.cuda.empty_cache()

I tried to reduce the data set to 6000 and tried to test it all, but it also give the same error (out of memory) even when it trained it before as half of 12000 images

So, my question is how can I fix the issue and train all my 12000 images?
Also, I have one more question, Can I train the same model twice in two different datasets? and merge the training result at the end; I’m thinking about this solution as a workaround solution.

Thanks in advance.


This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?