Issue
This Content is from Stack Overflow. Question asked by Max
I try to calculate the 90 percentile over a period of 15 years. The data is stored in netCDF files (where 1 month is stored in 1 file –> 12files/year * 16years).
The data was opened using following command:
ds = xr.open_mfdataset(f"{rootdir}/*.nc, chunks={"time":1})
Trying to calculate the quantile (from some data variable) with following code:
q90 = data_varaible.qunatile(0.95, "time")
, yields follwing error message:
ValueError: dimension time on 0th function argument to apply_ufunc with dask=’parallelized’ consists of multiple chunks, but is also a core dimension. To fix, either rechunk into a single dask array chunk along this dimension, i.e.,
.chunk(dict(time=-1))
, or passallow_rechunk=True
indask_gufunc_kwargs
but beware that this may significantly increase memory usage.
I tried to rechunk, as explained in the error message by apply: data_variable.chunk(dict(time=-1).quantile(0.95,'time')
, with no success (got the exact same error.
Further I tired to rechunk in the following way: data_variable.chunk({'time':1}))
, which was also not successful.
Printing out the data.variable.chunk()
, actually shows that the chunk size in time dimension is supposed to be 1, so i don’t understand where I made a mistake.
ps: I didn’t try allow_rechunk=True
in dask_gufunc_kwargs
, since I don’t know where to pass that argument.
Thanks for the help,
Max
Solution
chunk({"time": 1}
will produce as many chunks as there are time steps.
Each chunk will have a size of 1.
Printing out the data.variable.chunk(), actually shows that the chunk size in time dimension is supposed to be 1, so i don’t understand where I made a mistake.
To compute percentiles dask needs to load the full timeseries into memory so it forbids chunking over "time" dimension.
So what you want is either chunk({"time": len(ds.time)}
or to use directly the shorthand chunk({"time": -1}
.
I don’t understand why data_variable.chunk(dict(time=-1).quantile(0.95,'time')
would not work though.
This Question was asked in StackOverflow by Max and Answered by Abel It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.