Skip to content

Conversation

Tmonster
Copy link
Contributor

@Tmonster Tmonster commented Aug 2, 2022

PR to improve the join order optimizer. Some queries in TPCH, TPCDS, and JOB will have worse performance, but that's because the previous optimizer was lucky and selected great plans. This new query optimizer improves the performance of the JOB suite by over 90% on average. See statistics below. Performance is based on sum of cardinalities of intermediate nodes during execution. Pictured below are stats for the join order benchmark. Explicit represents the optimal plans

image

In addition to selecting better query plans, the new optimizer also

  1. Adds a new cardinality estimator class. Given two relations and a set of equality filters that can join them, the cardinality estimator will estimate the cardinality.
  2. Adds a join node class. Previously Join Nodes were structs, but now are now classes so they can encapsulate more functionality.
  3. Adds an EstimatedProperties class. This class hold estimated properties for join nodes and result operators.

This PR also fixes a couple of bugs in the Dynamic Programming algorithm. The previous cost model hid a lot of these bugs, but they came to light when debugging my code.

Some substrate tests still don't pass, but I will fix those soon.

Further improvements will come. Better selectivity estimates will improve results even more and make them comparable to postgres and sqlserver (potentially, no guarantees).

@lnkuiper lnkuiper requested a review from Mytherin August 2, 2022 11:39
@lnkuiper
Copy link
Contributor

lnkuiper commented Aug 2, 2022

Nice to see the PR! I have reviewed this a few times already so I approve.

Happy to discuss any review comments

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Very impressive results! Great stuff. Below are some comments from my side:

@@ -143,6 +143,11 @@ void InterpretedBenchmark::LoadBenchmark() {
throw std::runtime_error(reader.FormatException("require requires a single parameter"));
}
extensions.insert(splits[1]);
} else if (splits[0] == "connect") {
if (splits.size() != 2) {
throw std::runtime_error(reader.FormatException("connect reqiures a database path"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: requires


//! When calculating the cost of a join. Multiple filters may be present.
//! These values keep track of the lowest cost join
double lowest_card;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public?


//! When calculating the cost of a join. Multiple filters may be present.
//! These values keep track of the lowest cost join
double lowest_card;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the cardinality be measured as a double, instead of as (u)int64?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purely by way of reference and not as a recommendation, Calcite uses doubles for cardinality. I suspect it has something to do with the extremely large numeric range, and inherent ability to represent things like infinite for pathologically explosive cardinality estimates (or cases where you want the optimizer to be able to special case something to never be chosen despite being a valid thing to cost).


bool full_plan_found;
bool must_update_full_plan;
unordered_set<string> join_nodes_in_full_plan;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a string? Can't we create an unordered_set on the relation node by defining a hash/equality function?

if (comparison_filter.comparison_type == ExpressionType::COMPARE_EQUAL) {
auto base_stats = catalog_table->storage->GetStatistics(context, column_index);
auto column_count = base_stats->GetDistinctCount();
auto increment = MaxValue((idx_t)((cardinality + column_count - 1) / column_count), (idx_t)1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: MaxValue<idx_t> is a bit cleaner than casting both sides


for (idx_t ind = 0; ind < equivalent_relations.size(); ind++) {
column_binding_set_t i_set = equivalent_relations.at(ind);
if (i_set.count(key) == 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: early-out

explicit LogicalOperator(LogicalOperatorType type);
LogicalOperator(LogicalOperatorType type, vector<unique_ptr<Expression>> expressions);
virtual ~LogicalOperator();
explicit LogicalOperator(LogicalOperatorType type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these constructors be moved back into the C++ file?

@@ -37,7 +44,10 @@ class LogicalOperator {
//! The types returned by this logical operator. Set by calling LogicalOperator::ResolveTypes.
vector<LogicalType> types;
//! Estimated Cardinality
idx_t estimated_cardinality = 0;
idx_t estimated_cardinality;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we turn this into a struct or class, instead of 3 separate attributes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The estimated_cardinality in the logical operator is pretty tightly coupled to a lot of physicaloperator constructors. For example the PhysicalTableScan has estimated_cardinality in the constructor, which is usually copied from the logical_operator. I agree changing it into a struct or class would be better, but I don't think changing every instantiation of a physical operator should be in this pull request

PhysicalColumnDataScan(vector<LogicalType> types, PhysicalOperatorType op_type, idx_t estimated_cardinality)
	    : PhysicalOperator(op_type, move(types), estimated_cardinality), collection(nullptr) {
	}

#include "duckdb/common/common.hpp"
#include "duckdb/common/enums/logical_operator_type.hpp"
#include "duckdb/common/unordered_map.hpp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is adding these includes necessary?

auto has_equality_filter = false;
auto cardinality_after_filters = cardinality;
for (auto &child_filter : filter->child_filters) {
if (child_filter->filter_type == TableFilterType::CONSTANT_COMPARISON) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: early-out here as well please

@Mytherin
Copy link
Collaborator

Mytherin commented Aug 3, 2022

There also appears to be a few test failures remaining, could you have a look at those as well?

Tmonster added a commit to Tmonster/duckdb that referenced this pull request Aug 9, 2022
More robust estimation logic that is truly dynamic at this point. This also helps the PR duckdb#4274 pass all CI tests except for some optimizer regression checks
Tmonster and others added 5 commits August 9, 2022 16:44
basically completed the new join order optimizer. Need to inspect a few tpcds queries though

* added imdb_parquet benchmark

* Adding multiplicities branch

* add benchmark to query direct from query files

* clean up code that is unnecessary

* fix bugs in join ordering code

* code clean up. This commit has code that changes how analyze structures are printed

* small changes but nothing major

* fix tpcds query error, but now there is one that times out

* tpc-ds passes now

* format fix

* final fixes

* fix tpcds benchmark again

* fixed hopefully the last issues with tpcds and tpch and imdb

* lawrence comments

* format fix

* forgot a D_ASSERT

* make sure it can build

* mult and sel looks good. Still needs cleanup and variable renaming. Going to add mult and sel on column levels now

* first commit, have some ideas

* now we are tracking at the column level, very nice

* mults and sels are now tracked on a column level. stable for tpch, imdb not as much, going to do some memory switching

* no more memory errors

* getting there but we'll see. works if you init the left and right every time

* it works on my machine, but I think there is something wrong with my hardware

* things work, going to work on a big refactor now

* refactor is working so far, but results aren't any better

* remove some comments

* format fix

* some more smaller commits to clean up code

* ok looks good, computing cardinality sums should work now

* fix cmake stuff

* comments and remove old code

* this is going in the right direction

* you can now see some of the join stats when you print the querygraph

* should be able to see sels and muls in graph, but for some reason not yet

* can now see selectivities and multiplicities on the query graph

* some refactoring and cleaning, adding more metadata to query graph should be straightforward now

* remove unnecessary header file

* add cost to JoinStats

* refactor looks much nicer now

* more refactor

* updated benchmark

* cardinality estimation is a little bit better, still seg faulting on tpch query 5

* code to later add HLL

* last commit before attemptin to implement new idea made with laurens

* it works now. going to attempt big rebase

* tracking uniqur values, debugging

* works for the most part, except for a DP bug in q08a for the JOB tests

* checkpoint, less segfaults, some laurens code as well

* when applying a filter set a min resulting cardinality of at least 1

* fix rebase issues

* properly update DP table

* stopping coding to prove symmetry

* ok looks good

* ok here we have some really good results with very approximate joining. DP seems to still have an issue

* we are defs on the right track here. Just implemented a way to check if a table filter is an equality comparison

* good version going to do a clean up commit next

* did some clean up and it builds

* refactor, add or filter estimation logic

* fixed non-reoderable joins bug

* some more refactoring

* some more refactoring

* fix all benchmark files

* fix benchmark files again

* more refactoring

* more refactoring

* more refactoring and commentts

* format fix refactor

* remove two lines

* big refactor. putting cardinality estimation logic in its own class

* refactor looks good now. compiles and runs. Just need to fix laurens comments

* using column_binding_t now, new copy function for estimated properties, unnesting some code

* more refactoring

* format-fix commit

* removing dead code and other not necessary things

* fixed a couple of issues regarding cartesian joins and the approximate join order solver. Still need to handle the case where there is more than one filter

* allunit tests should pass now

* format-fix

* fix the std issue for finding the max element

* fix more tests

* format fix changes

* debug and unittests pass

* format-fix changes

* more fixes and comments from lawrence

* styling changes and also changes to make tpcds pass when using parquet

* forgot to do a format fix

* remove semi colon not found by format fix

* try to get tests passing again, going to merge with master soon

* fix rebase conflicts

* format fix

* more refactoring, need to look at some results because we regressed, but don't really want to do that now

* make debug pass

* should fix debug now

* try to get more tests to pass. run analyze statement tests have better results as well

* fix int/double error breaking some tests

* everything looks good

* fix divide by 0

* fix bug

* make allunit passes now, still need to test make unittestci

* add a parquet vs base table test. A test in select4.test_slow is still failing :(

* fix last bug

* fix compiling bugs

* fix debug failing tests

* ok, fixed bug where not all query edges are considered

* fix issue where not all edges are considered, but results are worse

* fix regressions that took me 3 hours to find

* fix small bug involving idx_t to double casting, and fix executing queries on just parquet

* fix the make tidy errors (hopefully)

* remove not important files

* some clang tidy stuff, but also some stuff for phase timings, but for some reason they aren't reported in python

* deleted stuff for phase timings

* tidy and format fixes

* last comments

* fix last test case

Co-authored-by: Laurens Kuiper <laurens.kuiper@cwi.nl>
More robust estimation logic that is truly dynamic at this point. This also helps the PR duckdb#4274 pass all CI tests except for some optimizer regression checks
@Tmonster Tmonster force-pushed the new-join-order-optimizer branch from 2e65b93 to bcfb72f Compare August 9, 2022 14:44

#include <functional>

template <>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was unsure where to put this template because I kept getting namespace errors. Feedback appreciated here!

@Tmonster
Copy link
Contributor Author

@Mytherin I should have addressed most of the comments. I had to refactor the estimator because one of the failing tests revealed a weakness in the estimation logic, this issue was fixed in Tmonster#4. The other commits are pretty small.

All tests should pass except some regression tests for the tpch and imdb benchmarks, which we can talk about if you want. For both benchmarks, the execution times are faster and number of intermediate tuples processed is less. The results below are from running the imdb benchmark on my laptop (M1 Pro 32 GB of memory)
image

@lnkuiper
Copy link
Contributor

One thing that we figured out is that reducing the number of intermediate tuples does not always reduce the execution time. Two joins that produce the same amount of tuples can have wildly different execution times because they differ in the size of the build side.

We could try to make the cost model a bit smarter to deal with this, but maybe that's for another PR.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! This looks good to me. I have no issues with the (minor) performance regressions. My remaining comments here:

  • The cardinalities and costs are still stored as double - Laurens mentioned you guys had problems because of that before because of floating point inaccuracies - can this not be converted into idx_t? Is there a reason to allow partial tuples to exist here instead of always rounding down?
  • The regression test in the Python client appears to segfault. Could you have a look at that? That might be related to doing joins on arrow/pandas that have missing statistics.

Another minor note - we actually skip two queries in TPC-DS generally (64 & 85) because they are too slow, likely due to an exploding join order. Perhaps also worth checking to see if this PR fixes that issue.

@lnkuiper
Copy link
Contributor

We had to use doubles before because the rounding was order dependent. Since then, Tom has refactored the code, and I believe idx_t should work now

@hannes
Copy link
Member

hannes commented Aug 22, 2022

@Tmonster could you resolve the merge conflict when you have a moment please?

@Tmonster
Copy link
Contributor Author

@Mytherin Attempted to change the estimated_cardinality to type idx_t but was getting regressions in a number of JOB queries. If I change line 284 in cardinality_estimator.cpp from return numerator / denom; to return floor(numerator / denom); I get the following change in the aggregate results for the join order benchmark.

-------------JOB STATS RETURNING FLOOR()-------------
MIN = 0.0053298743452922515
MAX = 371.8834550651
AVG = 17.343067858263982
MEDIAN = 1.140868084544508

-------------JOB STATS RETURNING DOUBLE -----------
MIN = 0.005305284382914894
MAX = 327.89171165449346
AVG = 16.911370726401206
MEDIAN = 1.0057959168508852

19 imdb regressions. 14 imdb improvements

Talked about this with Laurens, I could dive into why this happens for the affected queries, but I don't have the time scheduling-wise. I need some time to work on my thesis

@hannes hannes requested a review from lnkuiper August 24, 2022 11:41
@hannes
Copy link
Member

hannes commented Aug 24, 2022

And another conflict, sorry. But I think we are ready to merge otherwise here.

@lnkuiper
Copy link
Contributor

@hannes still one segfault to go, hopefully the fix is easy

@Tmonster Tmonster requested review from Mytherin and removed request for lnkuiper August 24, 2022 14:17
auto &table_scan_bind_data = (TableScanBindData &)*get->bind_data;
auto column_statistics = get->function.statistics(context, &table_scan_bind_data, it.first);
column_statistics = nullptr;
if (get->bind_data && get->function.name.compare("arrow_scan") != 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedroerp @Mytherin not quite sure if this is the best way to fix the segfault. TableScanBindData doesn't exist when the table function is an arrow scan. I don't want to hardcode the string arrow_scan here, but it seems to be used in multiple places, so i imagine there will be a PR to fix that everywhere.

I was wondering if you guys know of any other table functions where this might break? I couldn't find a list of the existing ones in the codebase, so wasn't sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for now you can just check whether its a base table scan or not? If not, then do some default behaviour

@hannes hannes requested review from lnkuiper and removed request for Mytherin August 24, 2022 14:46
Copy link
Contributor

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on fixing the CI! I have a bunch of code style nitpicks, I think this is good to go if we fix these (and run a make format-fix, of course).


class JoinOrderOptimizer;

class JoinNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a code-style thing, but could you move the constructor to the implementation file, and separate the functions/fields in different private/public blocks?

@@ -184,39 +248,63 @@ bool JoinOrderOptimizer::ExtractJoinRelations(LogicalOperator &input_op, vector<
return false;
}

//! Update the exclusion set with all entries in the subgraph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the word "Update" removed here?

sort(neighbors.begin(), neighbors.end());

//! Neighbors should be reversed when iterating over them.
std::sort(neighbors.begin(), neighbors.end(), ReverseSort);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: std::sort(neighbors.begin(), neighbors.end(), std::greater<idx_t>()); will do the trick (or std::greater_equal<idx_t>()), no need for ReverseSort.

@@ -338,8 +460,10 @@ bool JoinOrderOptimizer::EnumerateCSGRecursive(JoinRelationSet *node, unordered_
// recursively enumerate the sets
unordered_set<idx_t> new_exclusion_set = exclusion_set;
for (idx_t i = 0; i < neighbors.size(); i++) {
// updated the set of excluded entries with this neighbor
// this line is necessary, need to remember why.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you remember why?

@@ -544,6 +770,10 @@ JoinOrderOptimizer::GenerateJoins(vector<unique_ptr<LogicalOperator>> &extracted
result_relation = node->set;
result_operator = move(extracted_relations[node->set->relations[0]]);
}
auto max_idx_t = NumericLimits<idx_t>::Maximum() - 10000;
result_operator->estimated_cardinality = (idx_t)MinValue(node->GetCardinality(), (double)max_idx_t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: use MinValue<idx_t> instead of (idx_t)MinValue

}

double CardinalityEstimator::ComputeCost(JoinNode *left, JoinNode *right, double expected_cardinality) {
double cost = expected_cardinality + left->GetCost() + right->GetCost();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: just return expected_cardinality + left->GetCost() + right->GetCost();

}

double CardinalityEstimator::EstimateCrossProduct(const JoinNode *left, const JoinNode *right) {
// need to explicity use double here, otherwise auto converts it to an int, then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: this can be simplified to:

return left->GetCardinality() >= (NumericLimits<double>::Maximum() / right->GetCardinality()) ?
       NumericLimits<double>::Maximum() : left->GetCardinality() * right->GetCardinality()

}

void UpdateDenom(Subgraph2Denominator *relation_2_denom, RelationsToTDom *relation_to_tdom) {
if (relation_to_tdom->has_tdom_hll) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we can simplify to:

relation_2_denom->denom *= relation_to_tdom->has_tdom_hll ? relation_to_tdom->tdom_hll :
                           relation_to_tdom->tdom_no_hll;

// means that the filter joins relations in the given set, but there is no
// connection to any subgraph in subgraphs. Add a new subgraph, and maybe later there will be
// a connection.
if (!found_match) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Here you have less repetition and more readability if you do:

subgraphs.emplace_back(Subgraph2Denominator());
auto &subgraph = subgraphs.back();
subgraph.relations.insert(filter->left_binding.table_index);
subgraph.relations.insert(filter->right_binding.table_index);
UpdateDenom(subgraph, &relation_2_tdom);

TableFilterSet *CardinalityEstimator::GetTableFilters(LogicalOperator *op) {
// First check table filters
auto get = GetLogicalGet(op);
if (get) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Again, we can simplify to:

return get ? &get->table_filters : nullptr;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants