-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
I'm writing an xarray+dask library to perform 1D interpolation (https://github.com/crusaderky/xarray_extras/blob/master/xarray_extras/interpolate.py).
The library takes a ND xarray with dask backend as y and one of its coords (a plain numpy array) as x.
Currently I'm wrapping around scipy.interpolate.make_interp_spline
, and it's not great for a couple of reasons:
- when working in dask, I might have multiple chunks on any axis of y except the interpolation axis; each chunk will translate to an equally-shaped array of spline coefficients. However, across all chunks there is only one array of knots, which (since my x is always a pure numpy array) is also a pure numpy array. With
make_interp_spline
, I am forced to needlessly re-compute the knots vector multiple times. - dask makes it super easy to wrap around functions that return numpy arrays.
make_interp_spline
however returns aBSpline
object, from which I have to extract thet
andc
arrays.
My current code looks pretty awful:
spline = map_blocks(
make_interp_spline,
x, a.data, k=k, check_finite=False, dtype=float)
t_name = 'spline_t-' + tokenize(getattr, spline.name, 't')
spline_first_key = (spline.name, ) + (0, ) * spline.ndim
t = Array(
{(t_name, 0): (getattr, spline_first_key, 't')},
name=t_name, dtype=float,
chunks=((x.size + k + 1, ), ))
t.dask.update(spline.dask)
c = map_blocks(getattr, spline, 'c', dtype=float)
Besides looking ugly and needlessly recomputing t for each chunk, it also introduces a dependency on the first chunk from all other chunks in order to save memory (the spline evaluation of all chunks will read t from the first chunk). It would have been much nicer to just compute t in plain numpy.
Proposed design
Gut make_interp_spline
and create two new functions out of it, make_interp_knots(x, k)
and make_interp_coeffs(y, t, k)
, both of which return a plain np.ndarray.
make_interp_spline
would become a simple wrapper around those two. The only non-trivial piece of logic it would retain is the special short-circuit for k=0
and (k1=1 and t=None)
(https://github.com/scipy/scipy/blob/v1.0.0/scipy/interpolate/_bsplines.py#L706).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status