Skip to content

Conversation

@Shubham8287
Copy link
Contributor

Description of Changes

  • Removed mutex Locking on wrapper state object (Inner) and added more granual locks on indivisual states.
  • Introduced of RWlock on CommittedState.
  • Moved implemented methods from Inner to MutTxId struct.
  • Made bootstrap only dependent on CommittedState.

API and ABI breaking changes

If this is an API or ABI breaking change, please apply the
corresponding GitHub label.

Expected complexity level and risk

How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.

This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.

If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.

@joshua-spacetime
Copy link
Collaborator

Are any updates to the threading model needed here?

Copy link
Contributor

@gefjon gefjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave up commenting on the unwraps, but I'd like you to search through this change for every unwrap call, and:

  • If you believe it never panics based only on local reasoning, add a comment describing why.
  • If the caller must uphold some invariant to avoid a panic, add them as a doc comment to the function, and add a comment to the unwrap describing why those invariants are sufficient to avoid a panic.

Ok(())
}
//TODO(shubham): Need to confirm, if indexes exist during bootstrap to be used here.
// This iter has only been implemented to use during bootstrap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me why this is only used during bootstrap; could you add a comment describing who calls this, and what non-bootstrap paths do instead?

row.encode(&mut bytes);
let data_key = DataKey::from_data(&bytes);
let row_id = RowId(data_key);
let tx_state = self.tx_state_lock.as_mut().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, comment on enclosing function re: panic safety, or local justification why this will not panic.

table
} else {
let Some(committed_table) = self.committed_state.tables.get(&table_id) else {
let Some(committed_table) = self.committed_state_write_lock.as_ref().unwrap().tables.get(&table_id) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic safety.

Comment on lines 1821 to 1822
drop(sequence_state);
drop(commit_state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't these get dropped automatically? Why do we need to explicitly call drop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are looking to return datastore, which requires to drop all the borrowers first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect that to be the case - Rust is usually able to infer when borrows can/should/must end and insert drops automatically as appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right but unexpectly for some reason, Compiler is not able to infer sequence_state (type MutexGaurd) last usage and expecting it to drop only when scope ends.
Though, I am able to get rid of explicit drop by not binding sequence_state, but looks something interesting happening here.

@Shubham8287
Copy link
Contributor Author

Replied to few, I feel I need to add few more comments in code, will do that. Thanks to give a first look @gefjon.

@Shubham8287 Shubham8287 force-pushed the shub/rw-lock branch 2 times, most recently from 84cbe1a to 0cfb632 Compare November 29, 2023 12:17
Comment on lines 1730 to 1740
/// Struct contains various database states, each protected by
/// their own lock. To avoid deadlocks, it is crucial to acquire these locks
/// in a consistent order throughout the application.
///
/// Lock Acquisition Order:
/// 1. `memory`
/// 2. `committed_state`
/// 3. `tx_state`
/// 4. `sequence_state`
///
/// All locking mechanisms are encapsulated within the struct through local methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this!

@Shubham8287
Copy link
Contributor Author

Shubham8287 commented Nov 29, 2023

I gave up commenting on the unwraps, but I'd like you to search through this change for every unwrap call, and:

  • If you believe it never panics based only on local reasoning, add a comment describing why.
  • If the caller must uphold some invariant to avoid a panic, add them as a doc comment to the function, and add a comment to the unwrap describing why those invariants are sufficient to avoid a panic.

@gefjon unwrap() has been used only for tx_state in MutTxId, and tx_state must not be None during whole transaction journey. So, It will never going to panic.

I feel it will be redundant to add comment for it on every method but also didn't find any good way to state this in comments, though I made an attempt here https://github.com/clockworklabs/SpacetimeDB/pull/599/files#diff-92219f25d92114287cc7ccb95ba7bae63f6172afd393de14df7a44968b7fa952R125 on MuTxId, Let me know if you have better Idea :)

@Shubham8287 Shubham8287 requested a review from gefjon November 29, 2023 18:41
@Shubham8287
Copy link
Contributor Author

Are any updates to the threading model needed here?

@joshua-spacetime, this PR doesn't make any funcational changes, so No. May be my next part of "read write lock" would require them, Can you point me to code to understand bit about threading model.

Copy link
Contributor

@gefjon gefjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generally looks great. More or less exactly what I was looking for. I have a few comments inline. The most important one is that I think the Option can be removed around TxState now that it's no longer stored in the Inner struct.

Please see this PR where I pulled it out. I'm 90% sure that this should work, but the 10% is because there's one place in iter_by_col_range that actually cared whether it was None or not. Could you confirm that my PR makes sense or that it doesn't?

///
/// The initialization of this struct is sensitive because improper
/// handling can lead to deadlocks. Therefore, it is strongly recommended to use
///`Locking::begin_mut_tx()` for instantiation to ensure safe acquisition of locks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///`Locking::begin_mut_tx()` for instantiation to ensure safe acquisition of locks.
/// `Locking::begin_mut_tx()` for instantiation to ensure safe acquisition of locks.

/// The initialization of this struct is sensitive because improper
/// handling can lead to deadlocks. Therefore, it is strongly recommended to use
///`Locking::begin_mut_tx()` for instantiation to ensure safe acquisition of locks.
/// `tx_state_lock` should remain `Some` throughout the transaction's lifecycle, until `commit()` or `rollback()` is called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks for adding this comment.


// TODO(shubham): Need to confirm, if indexes exist during bootstrap to be used here.
/// Iter for`CommittedState`, Only to be used during bootstrap.
/// For transcation, consider using MutTxId::Iters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// For transcation, consider using MutTxId::Iters.
/// For transaction, consider using MutTxId::Iters.

Ok(())
}

// TODO(shubham): Need to confirm, if indexes exist during bootstrap to be used here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this comment means. I'm assuming that this will do the equality check by scanning rather than using an index, but we only use this during bootstrapping, so the question is is it worth it to add indexes here?

Copy link
Contributor Author

@Shubham8287 Shubham8287 Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My doubt is, If its possible to use Indexes during bootstrap we should do that, If we don't generate indexes by this time during bootstrap, we are fine with scanning. I just need to verify it, so added a TODO.

I have another thought back of mind that we may be using this iter for read only transactions as well, then we have to add the indexes but that depends on the approach, little early to think about that here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to read this code again after #596 lands which refactors bootstrapping and hopefully makes it clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is now more clear after my PR, bootstrapping is a list of steps:

pub(crate) fn system_tables() -> [TableSchema; 6] {
    [
        st_table_schema(),
        st_columns_schema(),
        st_indexes_schema(),
        st_constraints_schema(),
        st_module_schema(),
        // Is important this is always last, so the starting sequence for each
        // system table is correct.
        st_sequences_schema(),
    ]
}

... where each part of the system is progressively added to the DB environment. It means is partially "broken/unfinished" until the last step is executed.

Ideally, It should be PartialDB -> Db but this will duplicate some code.

About indexes, they are incomplete, so is bad idea to rely on them. They need a "dehydrate" step:

        // Create the system tables and insert information about themselves into
        datastore.bootstrap_system_tables()?;
        // The database tables are now initialized with the correct data.
        // Now we have to build our in-memory structures.
        datastore.build_sequence_state()?;
        datastore.build_indexes()?;

(Now I think this should be moved inside datastore.bootstrap_system_tables)

pub struct MutTxId {
lock: ArcMutexGuard<RawMutex, Inner>,
committed_state_write_lock: SharedWriteGuard<CommittedState>,
tx_state_lock: SharedMutexGuard<Option<TxState>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer needs to be Option now that it lives inside the MutTxId itself. It's not possible for there to be a MutTxId unless there's a transaction in process.

@Shubham8287
Copy link
Contributor Author

Shubham8287 commented Nov 30, 2023

I think the Option can be removed around TxState now that it's no longer stored in the Inner struct.

I had the same thoughts, which lead me to start discussion about dropping transaction object after commit or rollback to be sure about its lifecycle.

Please see this PR where I pulled it out. I'm 90% sure that this should work, but the 10% is because there's one place in iter_by_col_range that actually cared whether it was None or not. Could you confirm that my PR makes sense or that it doesn't?

None handling is only for bootstrapping, this branch implements CommittedState:schema_for_tablei to handle that, hence it should be fine.

let's move this conversaiton to your PR :)

Shubham8287 and others added 3 commits November 30, 2023 12:14
///
/// The initialization of this struct is sensitive because improper
/// handling can lead to deadlocks. Therefore, it is strongly recommended to use
/// `Locking::begin_mut_tx()` for instantiation to ensure safe acquisition of locks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this for the moment.

In the long run, presumably we have finer grained locks so as to not take them all, every time. I think it's worthwhile to write a proposal about what operations will operate on which bits of state, and what strategies we will use to ensure we don't take locks in the wrong order as things get more complicated, and what strategies we will use to detect when we mess it up.

I'm particularly interested in the controldb which is one plausible way to have "surprising" reentrancy.

Ok(())
}

// TODO(shubham): Need to confirm, if indexes exist during bootstrap to be used here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to read this code again after #596 lands which refactors bootstrapping and hopefully makes it clearer.

}

fn create_sequence(&mut self, seq: SequenceDef) -> super::Result<SequenceId> {
fn create_sequence(&mut self, seq: SequenceDef, database_address: Address) -> super::Result<SequenceId> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For next time: I think we could have factored this PR in two. Some of these changes are very mechanical, and IMO could have landed separately first. I'm always going to ask for more and smaller PRs.

@Shubham8287
Copy link
Contributor Author

Blocked on https://github.com/clockworklabs/SpacetimeDB/pull/596/files to get merge, to avoid conflicts.

Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now LGTM.

Copy link
Contributor

@mamcx mamcx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants