Skip to content

Conversation

@joshua-spacetime
Copy link
Collaborator

Closes #623.

Before this patch query optimization was entirely syntax driven. Now that we keep table size metrics we can be somewhat data driven.

This patch improves index joins,
by using row counts to determine the index side and the probe side.

Description of Changes

Please describe your change, mention any related tickets, and so on here.

API and ABI breaking changes

If this is an API or ABI breaking change, please apply the
corresponding GitHub label.

Expected complexity level and risk

How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.

This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.

If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.

@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/623/table-size-metrics-index-join branch from ce7e57b to 95c5eba Compare December 14, 2023 07:30
@joshua-spacetime joshua-spacetime force-pushed the joshua/test/index-to-inner-join branch from 3d88f58 to 9c22936 Compare December 14, 2023 15:33
@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/623/table-size-metrics-index-join branch from 95c5eba to 743439b Compare December 14, 2023 15:42
@joshua-spacetime joshua-spacetime force-pushed the joshua/test/index-to-inner-join branch 2 times, most recently from 285bf54 to 89ec242 Compare December 14, 2023 20:20
@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/623/table-size-metrics-index-join branch 2 times, most recently from 745efa7 to 1cd8d9d Compare December 14, 2023 20:46
fn test_eval_incr_for_index_join() -> ResultTest<()> {
let (db, _) = make_test_db()?;
run_eval_incr_for_index_join(&db)?;
run_eval_incr_for_index_join(&db.with_row_count(Arc::new(|_| 5)))?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test case will run our incremental evaluation suite under the new query plan.

Comment on lines +457 to +460
// If the size of the indexed table is sufficiently large, do not reorder.
Table::DbTable(DbTable { table_id, .. }) if row_count(table_id) > 1000 => self,
// If this is a delta table, we must reorder.
// If this is a sufficiently small physical table, we should reorder.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sufficiently small means 1000 rows or less.

@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/623/table-size-metrics-index-join branch from 1cd8d9d to d4ad20d Compare December 14, 2023 20:56
Ok(db)
}

/// Returns an approximate row count for a particular table.
Copy link
Contributor

@mamcx mamcx Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a RowCount object:

#[derive(Debug, Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]
pub struct RowCount {
    pub min: usize,
    pub max: Option<usize>,
}

on sats -> Relation

The trait RelOps use it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a TODO like you mentioned.

I wonder if we should move Relation out of sats, since it will need access to database statistics. Because right now RowCount only works on MemTables.

Copy link
Contributor

@mamcx mamcx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/623/table-size-metrics-index-join branch from d4ad20d to a1b50d8 Compare December 15, 2023 15:39
Base automatically changed from joshua/test/index-to-inner-join to master December 15, 2023 15:59
@joshua-spacetime joshua-spacetime added this pull request to the merge queue Dec 15, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 15, 2023
@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/623/table-size-metrics-index-join branch from a1b50d8 to b8583e7 Compare December 15, 2023 16:20
Closes #623.

Before this patch query optimization was entirely syntax driven.
Now that we keep table size metrics we can be somewhat data driven.

This patch improves index joins,
by using row counts to determine the index side and the probe side.
@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/623/table-size-metrics-index-join branch from b8583e7 to 8ee1346 Compare December 15, 2023 16:23
@joshua-spacetime joshua-spacetime added this pull request to the merge queue Dec 15, 2023
Merged via the queue into master with commit 69f819a Dec 15, 2023
@joshua-spacetime joshua-spacetime deleted the joshua/perf/623/table-size-metrics-index-join branch December 15, 2023 17:40
joshua-spacetime added a commit that referenced this pull request Jan 12, 2024
Table size metrics were previously moved out of core.
This was due to the query planner needing access to them.
However that dependency was ultimately managed differently via #677.
github-merge-queue bot pushed a commit that referenced this pull request Jan 12, 2024
* chore: Remove unused metrics

* chore: Move table size metrics back into core

Table size metrics were previously moved out of core.
This was due to the query planner needing access to them.
However that dependency was ultimately managed differently via #677.
@joshua-spacetime joshua-spacetime linked an issue Jan 23, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better join ordering

3 participants