I have 10 cpu nodes and each node has 72 cpu cores. I am running a code that reads huge files that need to be read in parallel to take less time. But the problem is that it goes to the memory limit. So, I guess I am only running all of them on different cpu cores instead of different cpu nodes and am using only 1 node!
The part of the code is as followed:

    import multiprocessing
    def Reading(ROI, j):
        ds = datasets[j]
        f = h5py.File(os.path.join(path_raw, ds),'r') # Opening the data file
        data = f[data_string][:, ROI, 0, 0]
        return data   

    num_cores = multiprocessing.cpu_count()
    data_tmp = Parallel(n_jobs=num_cores)(delayed(Reading)(ROI, j) for j in range(13))

In addition, the whole code (from start to end) should be done 8 times for 8 different datasets which I parallelized the whole code separately by using “argparse” library as follows:

def argparser():
    parser = argparse.ArgumentParser()
    parser.add_argument('--mod_num', type=int, default=0)
    return parser

And the ‘mod_num’ parameter will be controlled by the following bash file:

#SBATCH --nodes=10
#SBATCH --time=1-00:00:00

source /etc/profile.d/
module purge
module load anaconda-python

parallel python --mod_num ::: {0..7}

So, it means that I have two different kinds of parallel processing (one by Multiprocessing and one by Argparse), but both of them are using only 1 cpu node and I cannot use the other 9 cpu nodes!

It would be so appreciated if someone can help me to solve this.


