Skip to content

Dask-friendly 1D interpolation #8810

@crusaderky

Description

@crusaderky

I'm writing an xarray+dask library to perform 1D interpolation (https://github.com/crusaderky/xarray_extras/blob/master/xarray_extras/interpolate.py).
The library takes a ND xarray with dask backend as y and one of its coords (a plain numpy array) as x.
Currently I'm wrapping around scipy.interpolate.make_interp_spline, and it's not great for a couple of reasons:

  1. when working in dask, I might have multiple chunks on any axis of y except the interpolation axis; each chunk will translate to an equally-shaped array of spline coefficients. However, across all chunks there is only one array of knots, which (since my x is always a pure numpy array) is also a pure numpy array. With make_interp_spline, I am forced to needlessly re-compute the knots vector multiple times.
  2. dask makes it super easy to wrap around functions that return numpy arrays. make_interp_spline however returns a BSpline object, from which I have to extract the t and c arrays.

My current code looks pretty awful:

spline = map_blocks(
    make_interp_spline,
    x, a.data, k=k, check_finite=False, dtype=float)
t_name = 'spline_t-' + tokenize(getattr, spline.name, 't')
spline_first_key = (spline.name, ) + (0, ) * spline.ndim
t = Array(
    {(t_name, 0): (getattr, spline_first_key, 't')},
    name=t_name, dtype=float,
    chunks=((x.size + k + 1, ), ))
    t.dask.update(spline.dask)
c = map_blocks(getattr, spline, 'c', dtype=float)

Besides looking ugly and needlessly recomputing t for each chunk, it also introduces a dependency on the first chunk from all other chunks in order to save memory (the spline evaluation of all chunks will read t from the first chunk). It would have been much nicer to just compute t in plain numpy.

Proposed design

Gut make_interp_spline and create two new functions out of it, make_interp_knots(x, k) and make_interp_coeffs(y, t, k), both of which return a plain np.ndarray.
make_interp_spline would become a simple wrapper around those two. The only non-trivial piece of logic it would retain is the special short-circuit for k=0 and (k1=1 and t=None) (https://github.com/scipy/scipy/blob/v1.0.0/scipy/interpolate/_bsplines.py#L706).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    triaged, waiting-for-contributor

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions