Skip to content

Conversation

hawkfish
Copy link
Contributor

@hawkfish hawkfish commented Jan 9, 2023

Add a new row-by-row join operator between two relations with OUTER semantics called POSITIONAL JOIN

  • Parse a new join type keyword POSITIONAL.
  • Add PositionalJoinRef.
  • Add LogicalPositionalJoin.
  • Add PhysicalPositionalJoin to implement joins between relations
  • Add PhysicalPositionalScan to implement joins between table scans.
  • Update the optimizer to handle the outer semantics
  • Add tests to cover basic functionality, multi-block raggedness issues and outer semantics.

fixes #3423

Richard Wesley added 13 commits January 2, 2023 14:45
Add new POSITIONAL JOIN join type
and plumb it through to a straw man
physical implementation.
Simple benchmark joining two integer columns.
This is 2x faster than a hash join on rowid.
Add a second physical operator that manges two table scans.
make format-head doesn't work unless you have added the files...
Special case rendering of physical positional scans
to display the internal tables.
Test (and fix) ragged PJ implementation.
Rename files for consistency.
Change positional scanning to have outer join semantics( NULL padding).
Change positional joining to have outer join semantics (NULL padding).
Add NULL propagation to all columns of a PJ.
@hawkfish hawkfish requested a review from Mytherin January 9, 2023 18:09
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good! Some comments:

@Mytherin
Copy link
Collaborator

Thanks for the updates! Could you just merge with master and have a look at the remaining failing CI runs?

@Mytherin Mytherin merged commit 484c476 into duckdb:master Jan 23, 2023
@Mytherin
Copy link
Collaborator

Thanks!

@hawkfish hawkfish deleted the positional-join branch March 7, 2023 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Efficient row by row join equivalent of pd.concat axis=1
2 participants