Skip to content

Conversation

@mamcx
Copy link
Contributor

@mamcx mamcx commented Aug 25, 2023

Description of Changes

Note: This is a multi-step set of PRs to bring constraints + multi-column indexes

Step:

IMPORTANT:

The use of multi-column indexes changes what is stored in the index according to how many fields are defined:

  • 1 Field: Single AlgebraicValue
  • 2 or more: AlgebraicValue::Product

If the API is breaking, please state below what will break

  • This is a breaking change to the module ABI
  • This is a breaking change to the module API
  • This is a breaking change to the ClientAPI
  • This is a breaking change to the SDK API
  • This is a breaking change to the database system tables

@mamcx mamcx requested a review from kulakowski August 25, 2023 21:07
@mamcx mamcx changed the title Mamcx/multi column index Adding multi-column indexes to the DB Aug 25, 2023
@mamcx mamcx self-assigned this Aug 25, 2023
@mamcx mamcx marked this pull request as ready for review August 25, 2023 21:09
@mamcx mamcx force-pushed the mamcx/multi-column-index branch from 9ba0b95 to 9282989 Compare August 30, 2023 14:09
Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally seems quite straightforward. Please see the comments.

}
}

pub(crate) fn get_fields(&self, row: &ProductValue) -> Result<AlgebraicValue, DBError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doing a project of a ProductValue right? Shouldn't it also be returning a product value? Why is it returning an AlgebraicValue?

pub(crate) cols: NonEmpty<u32>,
pub(crate) name: String,
pub(crate) is_unique: bool,
idx: BTreeSet<IndexKey>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the IndexKey have a ProductValue now rather than an AlgebraicValue given that it is indexing over multiple (but at least one) columns?

Copy link
Contributor Author

@mamcx mamcx Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the major reason for this is because "single columns" are the most common case for a index key, and "multiple columns" is relatively rare.

So, when storing the key, most of the time will be like:

CREATE INDEX a ON t1 (column_i32); -- Most common case index `BuiltIn`

CREATE INDEX b ON t1 (column_i32, column_i32); -- Sometimes we have more complex keys

Will be on memory:

[AlgebraicValue::I32(0)] = Row(ProductValue(...)) 
[AlgebraicValue::Product(AlgebraicValue::I32(0), AlgebraicValue::I32(1))] = Row(ProductValue(...)) 

The comment in product_value.rs explain the logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this handle the difference between a field with a product value or several fields as a product value? Otherwise we could do something like enum OneOrMore<T> { One(T), More(Vec<T>) } for a tighter representation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you go into how that is achieved / where in the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub(crate) fn get_rows_that_violate_unique_constraint<'a>(
&'a self,
row: &'a ProductValue,
row: &'a AlgebraicValue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are rows represented as AlgebraicValues now? Can a row be a SumType or a Builtin type?

table.indexes.remove(&ColId(col));
table.schema.indexes.retain(|x| x.col_id != col);
table.indexes.remove(&col.clone().map(ColId));
table.schema.indexes.retain(|x| x.cols != col);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second part could be done by index_id instead of by cols, but I suppose this works as well.

for index in table.indexes.iter() {
let [index_col_id] = *index.col_ids else {
anyhow::bail!("multi-column indexes not yet supported")
//Ignore multi-column indexes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we ignoring multi-column indexes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs constraint to fully resolve this info, and that comes in the next PR.

@cloutiertyler
Copy link
Contributor

Looks like you should also give this PR a rebase

@mamcx mamcx force-pushed the mamcx/multi-column-index branch from e10d191 to 410a125 Compare September 11, 2023 20:55
@mamcx mamcx mentioned this pull request Sep 11, 2023
3 tasks
pub(crate) index_id: IndexId,
pub(crate) table_id: u32,
pub(crate) col_id: u32,
pub(crate) cols: NonEmpty<u32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) cols: NonEmpty<u32>,
pub(crate) cols: NonEmpty<ColId>,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this suggestions are already applied in the next PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOH! xD


impl BTreeIndex {
pub(crate) fn new(index_id: IndexId, table_id: u32, col_id: u32, name: String, is_unique: bool) -> Self {
pub(crate) fn new(index_id: IndexId, table_id: u32, cols: NonEmpty<u32>, name: String, is_unique: bool) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) fn new(index_id: IndexId, table_id: u32, cols: NonEmpty<u32>, name: String, is_unique: bool) -> Self {
pub(crate) fn new(index_id: IndexId, table_id: u32, cols: NonEmpty<ColId>, name: String, is_unique: bool) -> Self {

&'a self,
table_id: &TableId,
col_id: &ColId,
cols: NonEmpty<u32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cols: NonEmpty<u32>,
cols: NonEmpty<ColId>,

Comment on lines 1082 to 1101
let mut cols = vec![];
for index in table.indexes.values_mut() {
if index.index_id == *index_id {
cols.push(index.col_id);
cols.push(index.cols.clone());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler: table.indexes.values().filter(|i| i.index_id == index_id).map(|i| i.cols.clone()).collect();

Comment on lines +1312 to +1330
col_names: index
.cols
.iter()
.map(|&x| insert_table.schema.columns[x as usize].col_name.clone())
.collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's dedup the code here

Comment on lines +1322 to 1347
let value = row.project_not_empty(&index.cols)?;
return Err(IndexError::UniqueConstraintViolation {
constraint_name: index.name.clone(),
table_name: table.schema.table_name.clone(),
col_name: table.schema.columns[index.col_id as usize].col_name.clone(),
value: value.clone(),
col_names: index
.cols
.iter()
.map(|&x| insert_table.schema.columns[x as usize].col_name.clone())
.collect(),
value,
}
.into());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

pub(crate) cols: NonEmpty<u32>,
pub(crate) name: String,
pub(crate) is_unique: bool,
idx: BTreeSet<IndexKey>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this handle the difference between a field with a product value or several fields as a product value? Otherwise we could do something like enum OneOrMore<T> { One(T), More(Vec<T>) } for a tighter representation

@mamcx mamcx force-pushed the mamcx/multi-column-index branch from 410a125 to 5865a29 Compare September 12, 2023 16:18
@mamcx mamcx force-pushed the mamcx/multi-column-index branch 2 times, most recently from ee4ca12 to 8be67fd Compare September 14, 2023 18:49
@mamcx mamcx force-pushed the mamcx/multi-column-index branch from 8be67fd to c2e7a71 Compare September 14, 2023 18:51
Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants