Replies: 8 comments 7 replies
-
If a linear fill function were to be added, I think it'd be make sense to simultaneously add "closest fill". That can of course also already be implemented using existing window functions, but it's a bit of a pain, and if the linear fill is already implemented, why not include this too. I don't think there is much point in a non-linear fill function with arbitrary lambdas. To sensibly do non-piecewise linear interpolation, just getting the two neighboring points is not enough, you really need all points (at least in a neighborhood) and do at least some linear algebra on that to get anything better than pieewise linear, and I think that's extension territory. |
Beta Was this translation helpful? Give feedback.
-
Some more design proposals:
|
Beta Was this translation helpful? Give feedback.
-
Will this function extrapolate linearly, constantly, or not at all, or offer the user the choice, at the boundaries? |
Beta Was this translation helpful? Give feedback.
-
What does this function return when there are two datapoints (x, y0) and (x, y1) and you need a fill for (x, NULL)? One of the two y values (doesn't matter which one because their order isn't well defined anyway), or the midpoint, or |
Beta Was this translation helpful? Give feedback.
-
Hi DuckDB Team! I started this original discussion on Discord and wanted to chime in to clarify this functionality, and why it'll be a very important and valuable addition to DuckDB. ImportanceCode SimplicityCurrently, the For example, this is a The current core DuckDB implementation uses C++, and various extensions are made with Rust. Could implementing these features in core DuckDB or as an extension result in speed, memory, or I/O performance improvements? If these features were to be implemented as code, then new features (e.g., Cubic Spline Interpolation) can be added in the future. Could these features make a good usecase for a core extension? Feature SupportDuckDB is used for time-series data analysis, see blog: https://motherduck.com/glossary/time-series and time-series forecasting with DuckDB: https://github.com/evidence-dev/sql-prophet. There are a number of time-series datases such as InfluxDB
QuestDB
TimescaleDB
Amazon Time Stream
ClickHouse
Fill BehaviorsReading up on documentation from the primary time-series database providers' implementations is recommended.
|
Beta Was this translation helpful? Give feedback.
-
There's a lot of discussion and interest in this feature. Should I create an Issue addressing this? |
Beta Was this translation helpful? Give feedback.
-
Just wanted to say thank you @hawkfish for both of the PRs #17686 and #17821! I will test my queries against your implementation of linear interpolation fill, and report back here. |
Beta Was this translation helpful? Give feedback.
-
Hey @hawkfish (Richard), we tested your feature -- works great and as expected! Thank you again! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
It might be nice to have a new window function called
FILL
for interpolating missing values.The most common use cases are linear interpolation using weights from another column as described here and here. This could be implemented fairly easily like
LEAD
andLAG
. Secondary ordering might even be possible.One variant could be a single argument version (or maybe constant
NULL
in the weighting column?) that would use the partition row number for the weighting (i.e., straight linear interpolation).Another variant to support non-linear interpolation would be to allow a lambda as the second argument, passing it the start and end values, along with the gap description and the current row number. If these were available in the windowing environment, then we could just make it a simple expression.
Beta Was this translation helpful? Give feedback.
All reactions