Skip to content

Conversation

@joshua-spacetime
Copy link
Collaborator

@joshua-spacetime joshua-spacetime commented Feb 16, 2024

Closes #832.

The database already operates under set semantics, so unless multiple queries return rows from the same table, deduplication of the result set is not necessary.

Performance numbers

> cargo bench --bench=subscription --profile=profiling -- full-scan --exact

full-scan               time:   [251.71 ms 251.99 ms 252.28 ms]
                        change: [-43.284% -43.172% -43.062%] (p = 0.00 < 0.05)
                        Performance has improved.

Before

Screenshot 2024-02-16 at 2 30 53 PM

After

Screenshot 2024-02-16 at 2 31 05 PM

@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/832/subscription/row-dedup branch from 40686e7 to 0b011b5 Compare February 17, 2024 00:19

/// A set of [supported][`SupportedQuery`] [`QueryExpr`]s.
#[derive(Debug, Deref, DerefMut, PartialEq, From, IntoIterator)]
pub struct QuerySet(BTreeSet<SupportedQuery>);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed QuerySet in favor of ExecutionSet

///
/// NOTE: The returned `rows` in [DatabaseUpdate] are **deduplicated** so if 2 queries match the same `row`, only one copy is returned.
#[tracing::instrument(skip_all)]
pub fn eval_incr(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have split the logic for eval and eval_incr depending on whether we are executing a single query or a batch of queries that return rows from the same table.

type Error = DBError;

fn try_from(expr: QueryExpr) -> Result<Self, Self::Error> {
Ok(ExecutionSet::from_iter(vec![SupportedQuery::try_from(expr)?]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ok(ExecutionSet::from_iter(vec![SupportedQuery::try_from(expr)?]))
Ok(ExecutionSet::from_iter([SupportedQuery::try_from(expr)?]))

#[derive(Debug, PartialEq, Eq)]
pub struct ExecutionSet(BTreeSet<ExecOne>, BTreeSet<ExecMany>);

impl ExecutionSet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs on these methods?

@bfops bfops added performance release-any To be landed in any release window labels Feb 20, 2024
let row_pk = pk_for_row(&row);

// Skip duplicate rows
if dup.contains(&row_pk) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, is possible to get the same row_pk by different tables? because the comment say it need to dedup by table, not by row.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible, but that's fine, because the same row in two different tables is considered two different rows.

Comment on lines 48 to 55
let mut queries = Vec::new();
for sql in subscription.query_strings {
let qset = compile_read_only_query(&self.relational_db, tx, &auth, &sql)?;
queries.extend(qset);
}

let n = queries.len();
let queries = ExecutionSet::from_iter(queries);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably just be all one iter chain now.

@coolreader18
Copy link
Collaborator

coolreader18 commented Feb 27, 2024

@joshua-spacetime it'd probably be best to merge this first now that #888 has been reverted

@coolreader18
Copy link
Collaborator

I could clean this up if that'd be easiest?

@joshua-spacetime
Copy link
Collaborator Author

@coolreader18 that would be great! You should be able to simplify this greatly now that we don't have to do any deduplication at all.

@coolreader18 coolreader18 force-pushed the joshua/perf/832/subscription/row-dedup branch 3 times, most recently from e5863b1 to d18afce Compare February 28, 2024 18:44
Copy link
Collaborator Author

@joshua-spacetime joshua-spacetime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coolreader18 LGTM!

There are two tests that are failing

  1. test_subscribe_dedup
  2. test_subscribe_private

For (1) just remove it since we're no longer deduping. In (2) we call two queries that overlap

SELECT * FROM inventory
SELECT * FROM inventory WHERE inventory_id = 1

Just remove one of those queries from the test. Feel free to merge after that.

joshua-spacetime and others added 2 commits February 28, 2024 13:27
Closes #832.

The database already operates under set semantics,
so unless multiple queries return rows from the same table,
deduplication of the result set is not necessary.
@coolreader18 coolreader18 force-pushed the joshua/perf/832/subscription/row-dedup branch from d18afce to 45d9ba4 Compare February 28, 2024 19:30
@coolreader18 coolreader18 reopened this Feb 28, 2024
@coolreader18 coolreader18 added this pull request to the merge queue Feb 28, 2024
Merged via the queue into master with commit b2b8993 Feb 28, 2024
@joshua-spacetime joshua-spacetime deleted the joshua/perf/832/subscription/row-dedup branch February 28, 2024 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-any To be landed in any release window

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove unnecessary row deduplication in subscription evaluation

6 participants