-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Closed
Description
I am using a chunked xarray object from a netCDF4 file as the input to scipy's RegularGridInterpolator as below.
cdf_data = xarray.open_dataset(filename, chunks={'time':100, 'x':10, 'y':10, 'z':10}) #import using dask for chunking
test_da = getattr(cdf_data, 'rr') #sample DataArray for testing containing one variable
test_da = test_da.assign_coords(time=cdf_data._time, x=cdf_data.x, y=cdf_data._y, z=cdf_data._z) #add arrays for coordinate values
rgi1 = RegularGridInterpolator((test_da.time, test_da.x, test_da.y, test_da.z), \
test_da, bounds_error = False, fill_value=np.NaN) #define interpolator function
#given time, x, y, z are 1D arrays of the same length, the coordinate values are given by:
track = np.array([[t, c1_val, c2_val, c3_val] for t, c1_val, c2_val, c3_val in zip(
time,x,y,z)]) #which has shape (59,4)
result1 = rgi1(track) #calling rgi1 with track gives the error:
#ValueError: cannot reshape array of size 12117361 into shape (59,)
#Note: 59**4 = 12117361
#It is attempting to return an NxNxNxN array instead of the normal 1D array of length N
#creating the interpolator using numpy arrays does not produce this error
rgi2 = RegularGridInterpolator((test_da.time.values, test_da.x.values,
test_da.y.values, test_da.z.values),
test_da.values, bounds_error = False, fill_value=np.NaN)
result2 = rgi2(track)
#This returns a 1D array of length N (59 in my case) with beautiful speed.
I am dealing with 'medium data', which fits on my disk but not in memory, so I need to avoid converting the object into a numpy array, which takes about 5GB on my computer. I have to do this 15 times, so the second option I described is not practical. My current (irritating) solution is:
result = np.squeeze(np.array([rgi1(track0) for track0 in track]))
which is clumsy and takes several seconds longer than the simplistic result2 call. Is there a better way to do this? Why the differing behavior in the two cases? I get the same behavior using interpn via xarray.
Metadata
Metadata
Assignees
Labels
No labels