Join Order Optimizer improvements #4274

Tmonster · 2022-08-02T11:36:19Z

PR to improve the join order optimizer. Some queries in TPCH, TPCDS, and JOB will have worse performance, but that's because the previous optimizer was lucky and selected great plans. This new query optimizer improves the performance of the JOB suite by over 90% on average. See statistics below. Performance is based on sum of cardinalities of intermediate nodes during execution. Pictured below are stats for the join order benchmark. Explicit represents the optimal plans

In addition to selecting better query plans, the new optimizer also

Adds a new cardinality estimator class. Given two relations and a set of equality filters that can join them, the cardinality estimator will estimate the cardinality.
Adds a join node class. Previously Join Nodes were structs, but now are now classes so they can encapsulate more functionality.
Adds an EstimatedProperties class. This class hold estimated properties for join nodes and result operators.

This PR also fixes a couple of bugs in the Dynamic Programming algorithm. The previous cost model hid a lot of these bugs, but they came to light when debugging my code.

Some substrate tests still don't pass, but I will fix those soon.

Further improvements will come. Better selectivity estimates will improve results even more and make them comparable to postgres and sqlserver (potentially, no guarantees).

lnkuiper · 2022-08-02T11:56:22Z

Nice to see the PR! I have reviewed this a few times already so I approve.

Happy to discuss any review comments

Mytherin

Thanks for the PR! Very impressive results! Great stuff. Below are some comments from my side:

Mytherin · 2022-08-02T11:46:35Z

benchmark/interpreted_benchmark.cpp

@@ -143,6 +143,11 @@ void InterpretedBenchmark::LoadBenchmark() {
 				throw std::runtime_error(reader.FormatException("require requires a single parameter"));
 			}
 			extensions.insert(splits[1]);
+		} else if (splits[0] == "connect") {
+			if (splits.size() != 2) {
+				throw std::runtime_error(reader.FormatException("connect reqiures a database path"));


nit: requires

Mytherin · 2022-08-02T13:14:55Z

src/include/duckdb/optimizer/cardinality_estimator.hpp

+
+	//! When calculating the cost of a join. Multiple filters may be present.
+	//! These values keep track of the lowest cost join
+	double lowest_card;


Should this be public?

Mytherin · 2022-08-02T13:16:13Z

src/include/duckdb/optimizer/cardinality_estimator.hpp

+
+	//! When calculating the cost of a join. Multiple filters may be present.
+	//! These values keep track of the lowest cost join
+	double lowest_card;


Should the cardinality be measured as a double, instead of as (u)int64?

Purely by way of reference and not as a recommendation, Calcite uses doubles for cardinality. I suspect it has something to do with the extremely large numeric range, and inherent ability to represent things like infinite for pathologically explosive cardinality estimates (or cases where you want the optimizer to be able to special case something to never be chosen despite being a valid thing to cost).

Mytherin · 2022-08-03T12:43:00Z

src/include/duckdb/optimizer/join_order_optimizer.hpp

+
+	bool full_plan_found;
+	bool must_update_full_plan;
+	unordered_set<string> join_nodes_in_full_plan;


Should this be a string? Can't we create an unordered_set on the relation node by defining a hash/equality function?

Mytherin · 2022-08-03T12:44:30Z

src/optimizer/cardinality_estimator.cpp

+			if (comparison_filter.comparison_type == ExpressionType::COMPARE_EQUAL) {
+				auto base_stats = catalog_table->storage->GetStatistics(context, column_index);
+				auto column_count = base_stats->GetDistinctCount();
+				auto increment = MaxValue((idx_t)((cardinality + column_count - 1) / column_count), (idx_t)1);


nit: MaxValue<idx_t> is a bit cleaner than casting both sides

Mytherin · 2022-08-03T12:54:53Z

src/optimizer/cardinality_estimator.cpp

+
+		for (idx_t ind = 0; ind < equivalent_relations.size(); ind++) {
+			column_binding_set_t i_set = equivalent_relations.at(ind);
+			if (i_set.count(key) == 1) {


nit: early-out

Mytherin · 2022-08-03T12:56:29Z

src/include/duckdb/planner/logical_operator.hpp

-	explicit LogicalOperator(LogicalOperatorType type);
-	LogicalOperator(LogicalOperatorType type, vector<unique_ptr<Expression>> expressions);
-	virtual ~LogicalOperator();
+	explicit LogicalOperator(LogicalOperatorType type)


Can these constructors be moved back into the C++ file?

Mytherin · 2022-08-03T12:56:47Z

src/include/duckdb/planner/logical_operator.hpp

@@ -37,7 +44,10 @@ class LogicalOperator {
 	//! The types returned by this logical operator. Set by calling LogicalOperator::ResolveTypes.
 	vector<LogicalType> types;
 	//! Estimated Cardinality
-	idx_t estimated_cardinality = 0;
+	idx_t estimated_cardinality;


Could we turn this into a struct or class, instead of 3 separate attributes?

The estimated_cardinality in the logical operator is pretty tightly coupled to a lot of physicaloperator constructors. For example the PhysicalTableScan has estimated_cardinality in the constructor, which is usually copied from the logical_operator. I agree changing it into a struct or class would be better, but I don't think changing every instantiation of a physical operator should be in this pull request

PhysicalColumnDataScan(vector<LogicalType> types, PhysicalOperatorType op_type, idx_t estimated_cardinality) : PhysicalOperator(op_type, move(types), estimated_cardinality), collection(nullptr) { }

Mytherin · 2022-08-03T12:57:05Z

src/include/duckdb/planner/logical_operator.hpp

 #include "duckdb/common/common.hpp"
 #include "duckdb/common/enums/logical_operator_type.hpp"
+#include "duckdb/common/unordered_map.hpp"


Is adding these includes necessary?

Mytherin · 2022-08-03T12:57:33Z

src/optimizer/cardinality_estimator.cpp

+	auto has_equality_filter = false;
+	auto cardinality_after_filters = cardinality;
+	for (auto &child_filter : filter->child_filters) {
+		if (child_filter->filter_type == TableFilterType::CONSTANT_COMPARISON) {


nit: early-out here as well please

Mytherin · 2022-08-03T13:01:12Z

There also appears to be a few test failures remaining, could you have a look at those as well?

More robust estimation logic that is truly dynamic at this point. This also helps the PR duckdb#4274 pass all CI tests except for some optimizer regression checks

basically completed the new join order optimizer. Need to inspect a few tpcds queries though * added imdb_parquet benchmark * Adding multiplicities branch * add benchmark to query direct from query files * clean up code that is unnecessary * fix bugs in join ordering code * code clean up. This commit has code that changes how analyze structures are printed * small changes but nothing major * fix tpcds query error, but now there is one that times out * tpc-ds passes now * format fix * final fixes * fix tpcds benchmark again * fixed hopefully the last issues with tpcds and tpch and imdb * lawrence comments * format fix * forgot a D_ASSERT * make sure it can build * mult and sel looks good. Still needs cleanup and variable renaming. Going to add mult and sel on column levels now * first commit, have some ideas * now we are tracking at the column level, very nice * mults and sels are now tracked on a column level. stable for tpch, imdb not as much, going to do some memory switching * no more memory errors * getting there but we'll see. works if you init the left and right every time * it works on my machine, but I think there is something wrong with my hardware * things work, going to work on a big refactor now * refactor is working so far, but results aren't any better * remove some comments * format fix * some more smaller commits to clean up code * ok looks good, computing cardinality sums should work now * fix cmake stuff * comments and remove old code * this is going in the right direction * you can now see some of the join stats when you print the querygraph * should be able to see sels and muls in graph, but for some reason not yet * can now see selectivities and multiplicities on the query graph * some refactoring and cleaning, adding more metadata to query graph should be straightforward now * remove unnecessary header file * add cost to JoinStats * refactor looks much nicer now * more refactor * updated benchmark * cardinality estimation is a little bit better, still seg faulting on tpch query 5 * code to later add HLL * last commit before attemptin to implement new idea made with laurens * it works now. going to attempt big rebase * tracking uniqur values, debugging * works for the most part, except for a DP bug in q08a for the JOB tests * checkpoint, less segfaults, some laurens code as well * when applying a filter set a min resulting cardinality of at least 1 * fix rebase issues * properly update DP table * stopping coding to prove symmetry * ok looks good * ok here we have some really good results with very approximate joining. DP seems to still have an issue * we are defs on the right track here. Just implemented a way to check if a table filter is an equality comparison * good version going to do a clean up commit next * did some clean up and it builds * refactor, add or filter estimation logic * fixed non-reoderable joins bug * some more refactoring * some more refactoring * fix all benchmark files * fix benchmark files again * more refactoring * more refactoring * more refactoring and commentts * format fix refactor * remove two lines * big refactor. putting cardinality estimation logic in its own class * refactor looks good now. compiles and runs. Just need to fix laurens comments * using column_binding_t now, new copy function for estimated properties, unnesting some code * more refactoring * format-fix commit * removing dead code and other not necessary things * fixed a couple of issues regarding cartesian joins and the approximate join order solver. Still need to handle the case where there is more than one filter * allunit tests should pass now * format-fix * fix the std issue for finding the max element * fix more tests * format fix changes * debug and unittests pass * format-fix changes * more fixes and comments from lawrence * styling changes and also changes to make tpcds pass when using parquet * forgot to do a format fix * remove semi colon not found by format fix * try to get tests passing again, going to merge with master soon * fix rebase conflicts * format fix * more refactoring, need to look at some results because we regressed, but don't really want to do that now * make debug pass * should fix debug now * try to get more tests to pass. run analyze statement tests have better results as well * fix int/double error breaking some tests * everything looks good * fix divide by 0 * fix bug * make allunit passes now, still need to test make unittestci * add a parquet vs base table test. A test in select4.test_slow is still failing :( * fix last bug * fix compiling bugs * fix debug failing tests * ok, fixed bug where not all query edges are considered * fix issue where not all edges are considered, but results are worse * fix regressions that took me 3 hours to find * fix small bug involving idx_t to double casting, and fix executing queries on just parquet * fix the make tidy errors (hopefully) * remove not important files * some clang tidy stuff, but also some stuff for phase timings, but for some reason they aren't reported in python * deleted stuff for phase timings * tidy and format fixes * last comments * fix last test case Co-authored-by: Laurens Kuiper <laurens.kuiper@cwi.nl>

More robust estimation logic that is truly dynamic at this point. This also helps the PR duckdb#4274 pass all CI tests except for some optimizer regression checks

Tmonster · 2022-08-11T13:42:13Z

src/include/duckdb/optimizer/join_order_optimizer.hpp


 #include <functional>

+template <>


was unsure where to put this template because I kept getting namespace errors. Feedback appreciated here!

Tmonster · 2022-08-11T13:50:20Z

@Mytherin I should have addressed most of the comments. I had to refactor the estimator because one of the failing tests revealed a weakness in the estimation logic, this issue was fixed in Tmonster#4. The other commits are pretty small.

All tests should pass except some regression tests for the tpch and imdb benchmarks, which we can talk about if you want. For both benchmarks, the execution times are faster and number of intermediate tuples processed is less. The results below are from running the imdb benchmark on my laptop (M1 Pro 32 GB of memory)

lnkuiper · 2022-08-12T08:08:56Z

One thing that we figured out is that reducing the number of intermediate tuples does not always reduce the execution time. Two joins that produce the same amount of tuples can have wildly different execution times because they differ in the size of the build side.

We could try to make the cost model a bit smarter to deal with this, but maybe that's for another PR.

Mytherin

Thanks for the fixes! This looks good to me. I have no issues with the (minor) performance regressions. My remaining comments here:

The cardinalities and costs are still stored as double - Laurens mentioned you guys had problems because of that before because of floating point inaccuracies - can this not be converted into idx_t? Is there a reason to allow partial tuples to exist here instead of always rounding down?
The regression test in the Python client appears to segfault. Could you have a look at that? That might be related to doing joins on arrow/pandas that have missing statistics.

Another minor note - we actually skip two queries in TPC-DS generally (64 & 85) because they are too slow, likely due to an exploding join order. Perhaps also worth checking to see if this PR fixes that issue.

lnkuiper · 2022-08-17T06:54:57Z

We had to use doubles before because the rounding was order dependent. Since then, Tom has refactored the code, and I believe idx_t should work now

hannes · 2022-08-22T08:53:26Z

@Tmonster could you resolve the merge conflict when you have a moment please?

Tmonster · 2022-08-22T13:26:09Z

@Mytherin Attempted to change the estimated_cardinality to type idx_t but was getting regressions in a number of JOB queries. If I change line 284 in cardinality_estimator.cpp from return numerator / denom; to return floor(numerator / denom); I get the following change in the aggregate results for the join order benchmark.

-------------JOB STATS RETURNING FLOOR()-------------
MIN = 0.0053298743452922515
MAX = 371.8834550651
AVG = 17.343067858263982
MEDIAN = 1.140868084544508

-------------JOB STATS RETURNING DOUBLE -----------
MIN = 0.005305284382914894
MAX = 327.89171165449346
AVG = 16.911370726401206
MEDIAN = 1.0057959168508852

19 imdb regressions. 14 imdb improvements

Talked about this with Laurens, I could dive into why this happens for the affected queries, but I don't have the time scheduling-wise. I need some time to work on my thesis

…etReader.ReadStatistics

hannes · 2022-08-24T12:47:47Z

And another conflict, sorry. But I think we are ready to merge otherwise here.

lnkuiper · 2022-08-24T13:20:49Z

@hannes still one segfault to go, hopefully the fix is easy

Tmonster · 2022-08-24T14:20:11Z

src/optimizer/cardinality_estimator.cpp

-		auto &table_scan_bind_data = (TableScanBindData &)*get->bind_data;
-		auto column_statistics = get->function.statistics(context, &table_scan_bind_data, it.first);
+		column_statistics = nullptr;
+		if (get->bind_data && get->function.name.compare("arrow_scan") != 0) {


@pedroerp @Mytherin not quite sure if this is the best way to fix the segfault. TableScanBindData doesn't exist when the table function is an arrow scan. I don't want to hardcode the string arrow_scan here, but it seems to be used in multiple places, so i imagine there will be a PR to fix that everywhere.

I was wondering if you guys know of any other table functions where this might break? I couldn't find a list of the existing ones in the codebase, so wasn't sure.

Maybe for now you can just check whether its a base table scan or not? If not, then do some default behaviour

lnkuiper

Good job on fixing the CI! I have a bunch of code style nitpicks, I think this is good to go if we fix these (and run a make format-fix, of course).

lnkuiper · 2022-08-25T06:51:20Z

src/include/duckdb/optimizer/join_node.hpp

+
+class JoinOrderOptimizer;
+
+class JoinNode {


More of a code-style thing, but could you move the constructor to the implementation file, and separate the functions/fields in different private/public blocks?

lnkuiper · 2022-08-25T07:26:53Z

src/optimizer/join_order_optimizer.cpp

@@ -184,39 +248,63 @@ bool JoinOrderOptimizer::ExtractJoinRelations(LogicalOperator &input_op, vector<
 	return false;
 }

-//! Update the exclusion set with all entries in the subgraph


Why was the word "Update" removed here?

lnkuiper · 2022-08-25T07:32:01Z

src/optimizer/join_order_optimizer.cpp

-	sort(neighbors.begin(), neighbors.end());
+
+	//! Neighbors should be reversed when iterating over them.
+	std::sort(neighbors.begin(), neighbors.end(), ReverseSort);


nitpick: std::sort(neighbors.begin(), neighbors.end(), std::greater<idx_t>()); will do the trick (or std::greater_equal<idx_t>()), no need for ReverseSort.

lnkuiper · 2022-08-25T07:32:24Z

src/optimizer/join_order_optimizer.cpp

@@ -338,8 +460,10 @@ bool JoinOrderOptimizer::EnumerateCSGRecursive(JoinRelationSet *node, unordered_
 	// recursively enumerate the sets
 	unordered_set<idx_t> new_exclusion_set = exclusion_set;
 	for (idx_t i = 0; i < neighbors.size(); i++) {
-		// updated the set of excluded entries with this neighbor
+		// this line is necessary, need to remember why.


Did you remember why?

lnkuiper · 2022-08-25T07:34:18Z

src/optimizer/join_order_optimizer.cpp

@@ -544,6 +770,10 @@ JoinOrderOptimizer::GenerateJoins(vector<unique_ptr<LogicalOperator>> &extracted
 		result_relation = node->set;
 		result_operator = move(extracted_relations[node->set->relations[0]]);
 	}
+	auto max_idx_t = NumericLimits<idx_t>::Maximum() - 10000;
+	result_operator->estimated_cardinality = (idx_t)MinValue(node->GetCardinality(), (double)max_idx_t);


nitpick: use MinValue<idx_t> instead of (idx_t)MinValue

lnkuiper · 2022-08-25T08:02:53Z

src/optimizer/cardinality_estimator.cpp

+}
+
+double CardinalityEstimator::ComputeCost(JoinNode *left, JoinNode *right, double expected_cardinality) {
+	double cost = expected_cardinality + left->GetCost() + right->GetCost();


Nitpick: just return expected_cardinality + left->GetCost() + right->GetCost();

lnkuiper · 2022-08-25T08:04:34Z

src/optimizer/cardinality_estimator.cpp

+}
+
+double CardinalityEstimator::EstimateCrossProduct(const JoinNode *left, const JoinNode *right) {
+	// need to explicity use double here, otherwise auto converts it to an int, then


Nitpick: this can be simplified to:

return left->GetCardinality() >= (NumericLimits<double>::Maximum() / right->GetCardinality()) ? NumericLimits<double>::Maximum() : left->GetCardinality() * right->GetCardinality()

lnkuiper · 2022-08-25T08:06:12Z

src/optimizer/cardinality_estimator.cpp

+}
+
+void UpdateDenom(Subgraph2Denominator *relation_2_denom, RelationsToTDom *relation_to_tdom) {
+	if (relation_to_tdom->has_tdom_hll) {


same here, we can simplify to:

relation_2_denom->denom *= relation_to_tdom->has_tdom_hll ? relation_to_tdom->tdom_hll : relation_to_tdom->tdom_no_hll;

lnkuiper · 2022-08-25T08:08:45Z

src/optimizer/cardinality_estimator.cpp

+			// means that the filter joins relations in the given set, but there is no
+			// connection to any subgraph in subgraphs. Add a new subgraph, and maybe later there will be
+			// a connection.
+			if (!found_match) {


Nitpick: Here you have less repetition and more readability if you do:

subgraphs.emplace_back(Subgraph2Denominator()); auto &subgraph = subgraphs.back(); subgraph.relations.insert(filter->left_binding.table_index); subgraph.relations.insert(filter->right_binding.table_index); UpdateDenom(subgraph, &relation_2_tdom);

lnkuiper · 2022-08-25T08:14:36Z

src/optimizer/cardinality_estimator.cpp

+TableFilterSet *CardinalityEstimator::GetTableFilters(LogicalOperator *op) {
+	// First check table filters
+	auto get = GetLogicalGet(op);
+	if (get) {


Nitpick: Again, we can simplify to:

return get ? &get->table_filters : nullptr;

lnkuiper requested a review from Mytherin August 2, 2022 11:39

Mytherin reviewed Aug 3, 2022

View reviewed changes

Mytherin mentioned this pull request Aug 7, 2022

Improve numeric hash function to a better but slightly slower hash function #4304

Merged

Tmonster added a commit to Tmonster/duckdb that referenced this pull request Aug 9, 2022

More robust estimation logic (#4)

2e65b93

More robust estimation logic that is truly dynamic at this point. This also helps the PR duckdb#4274 pass all CI tests except for some optimizer regression checks

Tmonster and others added 5 commits August 9, 2022 16:44

add support for non-reorderable left joins

1319cff

clean up touched files

4378173

More robust estimation logic (#4)

15221a8

More robust estimation logic that is truly dynamic at this point. This also helps the PR duckdb#4274 pass all CI tests except for some optimizer regression checks

some PR comments addressed

bcfb72f

Tmonster force-pushed the new-join-order-optimizer branch from 2e65b93 to bcfb72f Compare August 9, 2022 14:44

Tmonster added 6 commits August 9, 2022 17:20

fix bugs

355aa54

fix the format check

d0d09e5

more format fix changes

511453e

fix how we get statistics

c3b9a09

remove join only parquet

2e3eedd

nodes in full plan is now a hash of a join node

c099f32

Tmonster commented Aug 11, 2022

View reviewed changes

Tmonster requested a review from Mytherin August 11, 2022 13:50

Mytherin mentioned this pull request Aug 12, 2022

Allow table functions to set cardinality stats through the C API - and utilize this in Julia DataFrame scans #4362

Merged

see if this fixes the hash issue

fa6a509

format fix changes

eb82a33

Mytherin reviewed Aug 16, 2022

View reviewed changes

lnkuiper mentioned this pull request Aug 18, 2022

Out of Memory Error in TPCH Query 21 #4432

Closed

2 tasks

Merge branch 'master' into new-join-order-optimizer

9e63785

Tmonster added 2 commits August 23, 2022 11:19

add check for nullptr for base stats. There may be a bug in the parqu…

430702d

…etReader.ReadStatistics

Trigger Build, python test should not fail as all I did was merge master

0622c8a

hannes requested a review from lnkuiper August 24, 2022 11:41

Merge branch 'master' into new-join-order-optimizer

18827cc

fix segfault caused by reading arrow

9aa3352

Tmonster requested review from Mytherin and removed request for lnkuiper August 24, 2022 14:17

Tmonster commented Aug 24, 2022

View reviewed changes

hannes requested review from lnkuiper and removed request for Mytherin August 24, 2022 14:46

lnkuiper suggested changes Aug 25, 2022

View reviewed changes

Tmonster added 2 commits August 25, 2022 11:24

fix laurens nitpiks

713029c

added explicit check for sequential scan instead of not arrow scan

8729b4a

hannes merged commit a38bdd1 into duckdb:master Aug 25, 2022

This was referenced Aug 26, 2022

0.4.1.dev2251 crash Query 5 of TPCH #4502

Closed

Cardinality estimation #75

Closed

Mytherin mentioned this pull request Nov 11, 2022

duckdb run the tpcds query no.85 failed, and no.64 out of memory #2907

Closed

2 tasks

Join Order Optimizer improvements #4274

Join Order Optimizer improvements #4274

Uh oh!

Conversation

Tmonster commented Aug 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lnkuiper commented Aug 2, 2022

Uh oh!

Mytherin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mytherin commented Aug 3, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tmonster commented Aug 11, 2022

Uh oh!

lnkuiper commented Aug 12, 2022

Uh oh!

Mytherin left a comment

Choose a reason for hiding this comment

Uh oh!

lnkuiper commented Aug 17, 2022

Uh oh!

hannes commented Aug 22, 2022

Uh oh!

Tmonster commented Aug 22, 2022

Uh oh!

hannes commented Aug 24, 2022

Uh oh!

lnkuiper commented Aug 24, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lnkuiper left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tmonster commented Aug 2, 2022 •

edited

Loading