-
Notifications
You must be signed in to change notification settings - Fork 646
perf(623): Use row counts in query optimization #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(623): Use row counts in query optimization #677
Conversation
ce7e57b to
95c5eba
Compare
3d88f58 to
9c22936
Compare
95c5eba to
743439b
Compare
285bf54 to
89ec242
Compare
745efa7 to
1cd8d9d
Compare
| fn test_eval_incr_for_index_join() -> ResultTest<()> { | ||
| let (db, _) = make_test_db()?; | ||
| run_eval_incr_for_index_join(&db)?; | ||
| run_eval_incr_for_index_join(&db.with_row_count(Arc::new(|_| 5)))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new test case will run our incremental evaluation suite under the new query plan.
| // If the size of the indexed table is sufficiently large, do not reorder. | ||
| Table::DbTable(DbTable { table_id, .. }) if row_count(table_id) > 1000 => self, | ||
| // If this is a delta table, we must reorder. | ||
| // If this is a sufficiently small physical table, we should reorder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sufficiently small means 1000 rows or less.
1cd8d9d to
d4ad20d
Compare
| Ok(db) | ||
| } | ||
|
|
||
| /// Returns an approximate row count for a particular table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a RowCount object:
#[derive(Debug, Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]
pub struct RowCount {
pub min: usize,
pub max: Option<usize>,
}on sats -> Relation
The trait RelOps use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a TODO like you mentioned.
I wonder if we should move Relation out of sats, since it will need access to database statistics. Because right now RowCount only works on MemTables.
mamcx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
d4ad20d to
a1b50d8
Compare
a1b50d8 to
b8583e7
Compare
Closes #623. Before this patch query optimization was entirely syntax driven. Now that we keep table size metrics we can be somewhat data driven. This patch improves index joins, by using row counts to determine the index side and the probe side.
b8583e7 to
8ee1346
Compare
Table size metrics were previously moved out of core. This was due to the query planner needing access to them. However that dependency was ultimately managed differently via #677.
* chore: Remove unused metrics * chore: Move table size metrics back into core Table size metrics were previously moved out of core. This was due to the query planner needing access to them. However that dependency was ultimately managed differently via #677.
Closes #623.
Before this patch query optimization was entirely syntax driven. Now that we keep table size metrics we can be somewhat data driven.
This patch improves index joins,
by using row counts to determine the index side and the probe side.
Description of Changes
Please describe your change, mention any related tickets, and so on here.
API and ABI breaking changes
If this is an API or ABI breaking change, please apply the
corresponding GitHub label.
Expected complexity level and risk
How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.
This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.