Skip to content

RegularGridInterpolator returning 4D array using xarray object as input #14824

@rebeccaringuette

Description

@rebeccaringuette

I am using a chunked xarray object from a netCDF4 file as the input to scipy's RegularGridInterpolator as below.

cdf_data = xarray.open_dataset(filename, chunks={'time':100, 'x':10, 'y':10, 'z':10})  #import using dask for chunking
test_da = getattr(cdf_data, 'rr')  #sample DataArray for testing containing one variable
test_da = test_da.assign_coords(time=cdf_data._time, x=cdf_data.x, y=cdf_data._y, z=cdf_data._z)  #add arrays for coordinate values
rgi1 = RegularGridInterpolator((test_da.time, test_da.x, test_da.y, test_da.z), \
                              test_da, bounds_error = False, fill_value=np.NaN)   #define interpolator function

#given time, x, y, z are 1D arrays of the same length, the coordinate values are given by:
track = np.array([[t, c1_val, c2_val, c3_val] for t, c1_val, c2_val, c3_val in zip(
            time,x,y,z)])  #which has shape (59,4)
result1 = rgi1(track)  #calling rgi1 with track  gives the error:
#ValueError: cannot reshape array of size 12117361 into shape (59,)
#Note: 59**4 = 12117361   
#It is attempting to return an NxNxNxN array instead of the normal 1D array of length N

#creating the interpolator using numpy arrays does not produce this error
rgi2 = RegularGridInterpolator((test_da.time.values, test_da.x.values, 
                                test_da.y.values, test_da.z.values), 
                              test_da.values, bounds_error = False, fill_value=np.NaN)
result2 = rgi2(track)
#This returns a 1D array of length N (59 in my case) with beautiful speed.

I am dealing with 'medium data', which fits on my disk but not in memory, so I need to avoid converting the object into a numpy array, which takes about 5GB on my computer. I have to do this 15 times, so the second option I described is not practical. My current (irritating) solution is:

result = np.squeeze(np.array([rgi1(track0) for track0 in track]))

which is clumsy and takes several seconds longer than the simplistic result2 call. Is there a better way to do this? Why the differing behavior in the two cases? I get the same behavior using interpn via xarray.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions