[SOLVED] Rechunk DataArray to calculate 90% quantile over over chunked time dimension

Issue

This Content is from Stack Overflow. Question asked by Max

I try to calculate the 90 percentile over a period of 15 years. The data is stored in netCDF files (where 1 month is stored in 1 file –> 12files/year * 16years).

The data was opened using following command:
ds = xr.open_mfdataset(f"{rootdir}/*.nc, chunks={"time":1})

Trying to calculate the quantile (from some data variable) with following code:
q90 = data_varaible.qunatile(0.95, "time"), yields follwing error message:

ValueError: dimension time on 0th function argument to apply_ufunc with dask=’parallelized’ consists of multiple chunks, but is also a core dimension. To fix, either rechunk into a single dask array chunk along this dimension, i.e., .chunk(dict(time=-1)), or pass allow_rechunk=True in dask_gufunc_kwargs but beware that this may significantly increase memory usage.

I tried to rechunk, as explained in the error message by apply: data_variable.chunk(dict(time=-1).quantile(0.95,'time'), with no success (got the exact same error.
Further I tired to rechunk in the following way: data_variable.chunk({'time':1})), which was also not successful.

Printing out the data.variable.chunk(), actually shows that the chunk size in time dimension is supposed to be 1, so i don’t understand where I made a mistake.

ps: I didn’t try allow_rechunk=True in dask_gufunc_kwargs, since I don’t know where to pass that argument.

Thanks for the help,

Max



Solution

chunk({"time": 1} will produce as many chunks as there are time steps.
Each chunk will have a size of 1.

Printing out the data.variable.chunk(), actually shows that the chunk size in time dimension is supposed to be 1, so i don’t understand where I made a mistake.

To compute percentiles dask needs to load the full timeseries into memory so it forbids chunking over "time" dimension.
So what you want is either chunk({"time": len(ds.time)} or to use directly the shorthand chunk({"time": -1}.

I don’t understand why data_variable.chunk(dict(time=-1).quantile(0.95,'time') would not work though.


This Question was asked in StackOverflow by Max and Answered by Abel It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?