Skip to content

feat: Add Catalog API #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Sep 21, 2023
Merged
1 change: 1 addition & 0 deletions crates/iceberg/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ keywords = ["iceberg"]
[dependencies]
anyhow = "1.0.72"
apache-avro = "0.15"
async-trait = "0.1"
bimap = "0.6"
bitvec = "1.0.1"
chrono = "0.4"
Expand Down
149 changes: 149 additions & 0 deletions crates/iceberg/src/catalog.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! Catalog API for Apache Iceberg

use crate::spec::{PartitionSpec, Schema, SortOrder};
use crate::table::Table;
use crate::Result;
use async_trait::async_trait;
use std::collections::HashMap;

/// The catalog API for Iceberg Rust.
#[async_trait]
pub trait Catalog {
/// List namespaces from table.
async fn list_namespaces(&self, parent: Option<&NamespaceIdent>)
-> Result<Vec<NamespaceIdent>>;

/// Create a new namespace inside the catalog.
async fn create_namespace(
&self,
namespace: &NamespaceIdent,
properties: HashMap<String, String>,
) -> Result<Namespace>;

/// Get a namespace information from the catalog.
async fn get_namespace(&self, namespace: &NamespaceIdent) -> Result<Namespace>;

/// Update a namespace inside the catalog.
///
/// # Behavior
///
/// The properties must be the full set of namespace.
async fn update_namespace(
&self,
namespace: &NamespaceIdent,
properties: HashMap<String, String>,
) -> Result<()>;

/// Drop a namespace from the catalog.
async fn drop_namespace(&self, namespace: &NamespaceIdent) -> Result<()>;

/// List tables from namespace.
async fn list_tables(&self, namespace: &NamespaceIdent) -> Result<Vec<TableIdent>>;

/// Create a new table inside the namespace.
async fn create_table(
&self,
namespace: &NamespaceIdent,
creation: TableCreation,
) -> Result<Table>;

/// Load table from the catalog.
async fn load_table(&self, table: &TableIdent) -> Result<Table>;

/// Drop a table from the catalog.
async fn drop_table(&self, table: &TableIdent) -> Result<()>;

/// Check if a table exists in the catalog.
async fn stat_table(&self, table: &TableIdent) -> Result<bool>;

/// Rename a table in the catalog.
async fn rename_table(&self, src: &TableIdent, dest: &TableIdent) -> Result<()>;

/// Update a table to the catalog.
async fn update_table(&self, table: &TableIdent, commit: TableCommit) -> Result<Table>;

/// Update multiple tables to the catalog as an atomic operation.
async fn update_tables(&self, tables: &[(TableIdent, TableCommit)]) -> Result<()>;
}

/// NamespaceIdent represents the identifier of a namespace in the catalog.
pub struct NamespaceIdent(Vec<String>);

/// Namespace represents a namespace in the catalog.
pub struct Namespace {
name: NamespaceIdent,
properties: HashMap<String, String>,
}

/// TableIdent represents the identifier of a table in the catalog.
pub struct TableIdent {
namespace: NamespaceIdent,
name: String,
}

/// TableCreation represents the creation of a table in the catalog.
pub struct TableCreation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a rustee 🦀 , so forgive me if this is a silly question. Why would you create a struct for this, and not just have an argument to create_table for each of the fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more of a convention in Rust to pass values via structs.

Firstly, Rust structs have zero cost. Therefore, it is exactly the same for Rust to pass values via op.do(struct Abc {a, b}) or op.do(a, b). Additionally, unpacking a struct is also zero cost. Implementors can unpack the value from the struct when needed.

Considering all these reasons, I prefer to pass arguments in a struct to make it more readable and maintainable. This implementation will align with our design ideas:

  • Clean
  • Easy to undetstand (both for using and implementing)
  • Optimized for rust developers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, adding a new argument to a trait function is a breaking change. However, adding a new field to a struct can be compatible if it is given a default value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree that we should use struct as method arguments rather than several field to avoid breaking changes when we need to add more arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's better to provide a builder for argument?

Copy link
Member Author

@Xuanwo Xuanwo Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's better to provide a builder for argument?

Yep, I plan to leave them in following PRs. I believe they are not conflicts with this PR.

name: String,
location: String,
schema: Schema,
partition_spec: Option<PartitionSpec>,
sort_order: SortOrder,
properties: HashMap<String, String>,
}

/// TableCommit represents the commit of a table in the catalog.
pub struct TableCommit {
ident: TableIdent,
requirements: Vec<TableRequirement>,
updates: Vec<TableUpdate>,
}

/// TableRequirement represents a requirement for a table in the catalog.
pub enum TableRequirement {
/// The table must not already exist; used for create transactions
NotExist,
/// The table UUID must match the requirement.
UuidMatch(String),
/// The table branch or tag identified by the requirement's `reference` must
/// reference the requirement's `snapshot-id`.
RefSnapshotIdMatch {
/// The reference of the table to assert.
reference: String,
/// The snapshot id of the table to assert.
/// If the id is `None`, the ref must not already exist.
snapshot_id: Option<i64>,
},
/// The table's last assigned column id must match the requirement.
LastAssignedFieldIdMatch(i64),
/// The table's current schema id must match the requirement.
CurrentSchemaIdMatch(i64),
/// The table's last assigned partition id must match the
/// requirement.
LastAssignedPartitionIdMatch(i64),
/// The table's default spec id must match the requirement.
DefaultSpecIdMatch(i64),
/// The table's default sort order id must match the requirement.
DefaultSortOrderIdMatch(i64),
}

/// TableUpdate represents an update to a table in the catalog.
///
/// TODO: we should fill with UpgradeFormatVersionUpdate, AddSchemaUpdate and so on.
pub enum TableUpdate {}
7 changes: 7 additions & 0 deletions crates/iceberg/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,13 @@ pub use error::Error;
pub use error::ErrorKind;
pub use error::Result;

/// There is no implementation for this trait, allow dead code for now, should
/// be removed after we have one.
#[allow(dead_code)]
pub mod catalog;
#[allow(dead_code)]
pub mod table;

mod avro;
pub mod io;
pub mod spec;
2 changes: 1 addition & 1 deletion crates/iceberg/src/spec/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ pub struct SchemaBuilder {
}

impl SchemaBuilder {
/// Add fields to schem builder.
/// Add fields to schema builder.
pub fn with_fields(mut self, fields: impl IntoIterator<Item = NestedFieldRef>) -> Self {
self.fields.extend(fields);
self
Expand Down
26 changes: 26 additions & 0 deletions crates/iceberg/src/table.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! Table API for Apache Iceberg

use crate::spec::TableMetadata;

/// Table represents a table in the catalog.
pub struct Table {
metadata_location: String,
metadata: TableMetadata,
}