-
Notifications
You must be signed in to change notification settings - Fork 0
Update docs and JSON schema with "query-only" encryption changes #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
3a47194
Update query docs to use the new "for query" field
CDThomas 8c00927
Add variants for encrypted queries to JSON schema
CDThomas 07e0e92
Update match filter in select diagram
CDThomas eef888e
Remove STE term and websearch_to_match variants in JSON schema
CDThomas 18825fe
Fix up "k" (kind) property in match query example
CDThomas 6122f41
Revert to previous example in "Reading data" section
CDThomas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,14 +40,14 @@ Once the custom types and functions are installed, you can start using EQL in yo | |
|
|
||
| 1. Create a table with a column of type `cs_encrypted_v1` which will store your encrypted data. | ||
| 1. Use EQL functions to add indexes for the columns you want to encrypt. | ||
| - Indexes are used by Cipherstash Proxy to understand what cryptography schemes are required for your use case. | ||
| - Indexes are used by Cipherstash Proxy to understand what cryptography schemes are required for your use case. | ||
| 1. Initialize Cipherstash Proxy for cryptographic operations. | ||
| - The Proxy will dynamically encrypt data on the way in and decrypt data on the way out based on the indexes you have defined. | ||
| - The Proxy will dynamically encrypt data on the way in and decrypt data on the way out based on the indexes you have defined. | ||
| 1. Insert data into the defined columns using a specific payload format. | ||
| - The payload format is defined in the [data format](#data-format) section. | ||
| - The payload format is defined in the [data format](#data-format) section. | ||
| 1. Query the data using the EQL functions defined in the [querying data with EQL](#querying-data-with-eql) section. | ||
| - No modifications are required to simply `SELECT` data from your encrypted columns. | ||
| - In order to perform `WHERE` and `ORDER BY` queries, you must wrap the queries in the EQL functions defined in the [querying data with EQL](#querying-data-with-eql) section. | ||
| - No modifications are required to simply `SELECT` data from your encrypted columns. | ||
| - In order to perform `WHERE` and `ORDER BY` queries, you must wrap the queries in the EQL functions defined in the [querying data with EQL](#querying-data-with-eql) section. | ||
| 1. Integrate with your application via the [helper packages](#helper-packages) to interact with the encrypted data. | ||
|
|
||
| You can find a full getting started guide in the [GETTINGSTARTED.md](GETTINGSTARTED.md) file. | ||
|
|
@@ -150,13 +150,13 @@ Which will execute on the server as: | |
| SELECT encrypted_email FROM users; | ||
| ``` | ||
|
|
||
| And is the EQL equivalent of the following plaintext query. | ||
| And is the EQL equivalent of the following plaintext query: | ||
|
|
||
| ```sql | ||
| SELECT email FROM users; | ||
| ``` | ||
|
|
||
| All the data returned from the database is fully decrypted and an audit trail is generated. | ||
| All the data returned from the database is fully decrypted. | ||
|
|
||
| ## Querying data with EQL | ||
|
|
||
|
|
@@ -170,15 +170,15 @@ Enables basic full-text search. | |
|
|
||
| ```rb | ||
| # Create the EQL payload using helper functions | ||
| payload = eqlPayload("users", "encrpyted_field", "plaintext value") | ||
| payload = EQL.for_match("users", "encrypted_field", "plaintext value") | ||
|
|
||
| Users.where("cs_match_v1(field) @> cs_match_v1(?)", payload) | ||
| ``` | ||
|
|
||
| Which will execute on the server as: | ||
|
|
||
| ```sql | ||
| SELECT * FROM users WHERE cs_match_v1(field) @> cs_match_v1('{"v":1,"k":"pt","p":"plaintext value","i":{"t":"users","c":"encrpyted_field"}}'); | ||
| SELECT * FROM users WHERE cs_match_v1(field) @> cs_match_v1('{"v":1,"k":"pt","p":"plaintext value","i":{"t":"users","c":"encrypted_field"},"q":"match"}'); | ||
| ``` | ||
|
|
||
| And is the EQL equivalent of the following plaintext query. | ||
|
|
@@ -195,15 +195,15 @@ Retrieves the unique index for enforcing uniqueness. | |
|
|
||
| ```rb | ||
| # Create the EQL payload using helper functions | ||
| payload = eqlPayload("users", "encrpyted_field", "plaintext value") | ||
| payload = EQL.for_unique("users", "encrypted_field", "plaintext value") | ||
|
|
||
| Users.where("cs_unique_v1(field) = cs_unique_v1(?)", payload) | ||
| ``` | ||
|
|
||
| Which will execute on the server as: | ||
|
|
||
| ```sql | ||
| SELECT * FROM users WHERE cs_unique_v1(field) = cs_unique_v1('{"v":1,"k":"pt","p":"plaintext value","i":{"t":"users","c":"encrpyted_field"}}'); | ||
| SELECT * FROM users WHERE cs_unique_v1(field) = cs_unique_v1('{"v":1,"k":"pt","p":"plaintext value","i":{"t":"users","c":"encrypted_field"},"q":"unique"}'); | ||
| ``` | ||
|
|
||
| And is the EQL equivalent of the following plaintext query. | ||
|
|
@@ -220,7 +220,7 @@ Retrieves the Order-Revealing Encryption index for range queries. | |
|
|
||
| ```rb | ||
| # Create the EQL payload using helper functions | ||
| eqlPayload("users", "encrypted_date", Time.now) | ||
| date = EQL.for_ore("users", "encrypted_date", Time.now) | ||
|
|
||
| User.where("cs_ore_64_8_v1(encrypted_date) < cs_ore_64_8_v1(?)", date) | ||
| ``` | ||
|
|
@@ -246,7 +246,7 @@ User.order("cs_ore_64_8_v1(encrypted_field)").all().map(&:id) | |
| Which will execute on the server as: | ||
|
|
||
| ```sql | ||
| SELECT id FROM examples ORDER BY cs_ore_64_8_v1(feild) DESC; | ||
| SELECT id FROM examples ORDER BY cs_ore_64_8_v1(encrypted_field) DESC; | ||
| ``` | ||
|
|
||
| And is the EQL equivalent of the following plaintext query. | ||
|
|
@@ -259,7 +259,7 @@ SELECT id FROM examples ORDER BY field DESC; | |
|
|
||
| ### `cs_ste_term_v1(val JSONB, epath TEXT)` | ||
|
|
||
| Retrieves the encrypted *term* associated with the encrypted JSON path, `epath`. | ||
| Retrieves the encrypted _term_ associated with the encrypted JSON path, `epath`. | ||
|
|
||
| ### `cs_ste_vec_v1(val JSONB)` | ||
|
|
||
|
|
@@ -269,7 +269,7 @@ Retrieves the Structured Encryption Vector for containment queries. | |
|
|
||
| ```rb | ||
| # Serialize a JSONB value bound to the users table column | ||
| term = User::ENCRYPTED_JSONB.serialize({field: "value"}) | ||
| term = EQL.for_ste_vec("users", "attrs", {field: "value"}) | ||
| User.where("cs_ste_vec_v1(attrs) @> cs_ste_vec_v1(?)", term) | ||
| ``` | ||
|
|
||
|
|
@@ -295,8 +295,8 @@ This is useful for sorting or filtering on integers in encrypted JSON objects. | |
|
|
||
| ```rb | ||
| # Serialize a JSONB value bound to the users table column | ||
| path = EJSON_PATH.serialize("$.login_count") | ||
| term = User::ENCRYPTED_INT.serialize(100) | ||
| path = EQL.for_ejson_path("users", "attrs", "$.login_count") | ||
| term = EQL.for_ore("users", "attrs", 100) | ||
| User.where("cs_ste_term_v1(attrs, ?) > cs_ore_64_8_v1(?)", path, term) | ||
| ``` | ||
|
|
||
|
|
@@ -309,18 +309,18 @@ SELECT * FROM users WHERE cs_ste_term_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') > 'QAJ | |
| And is the EQL equivalent of the following plaintext query. | ||
|
|
||
| ```sql | ||
| SELECT * FROM users WHERE attrs->'login_count' > 10; | ||
| SELECT * FROM users WHERE attrs->'login_count' > 10; | ||
| ``` | ||
|
|
||
| ### `cs_ste_value_v1(val JSONB, epath TEXT)` | ||
|
|
||
| Retrieves the encrypted *value* associated with the encrypted JSON path, `epath`. | ||
| Retrieves the encrypted _value_ associated with the encrypted JSON path, `epath`. | ||
|
|
||
| **Example:** | ||
|
|
||
| ```rb | ||
| # Serialize a JSONB value bound to the users table column | ||
| path = EJSON_PATH.serialize("$.login_count") | ||
| path = EQL.for_ejson_path("users", "attrs", "$.login_count") | ||
| User.find_by_sql(["SELECT cs_ste_value_v1(attrs, ?) FROM users", path]) | ||
| ``` | ||
|
|
||
|
|
@@ -333,7 +333,7 @@ SELECT cs_ste_value_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') FROM users; | |
| And is the EQL equivalent of the following plaintext query. | ||
|
|
||
| ```sql | ||
| SELECT attrs->'login_count' FROM users; | ||
| SELECT attrs->'login_count' FROM users; | ||
| ``` | ||
|
|
||
| ## Managing indexes with EQL | ||
|
|
@@ -346,24 +346,25 @@ These functions expect a `jsonb` value that conforms to the storage schema. | |
| cs_add_index(table_name text, column_name text, index_name text, cast_as text, opts jsonb) | ||
| ``` | ||
|
|
||
| | Parameter | Description | Notes | ||
| | ------------- | -------------------------------------------------- | ------------------------------------ | ||
| | `table_name` | Name of target table | Required | ||
| | `column_name` | Name of target column | Required | ||
| | `index_name` | The index kind | Required. | ||
| | `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` | ||
| | `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) | ||
| | Parameter | Description | Notes | | ||
| | ------------- | -------------------------------------------------- | ------------------------------------------------------------------------ | | ||
| | `table_name` | Name of target table | Required | | ||
| | `column_name` | Name of target column | Required | | ||
| | `index_name` | The index kind | Required. | | ||
| | `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` | | ||
| | `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) | | ||
|
|
||
| #### cast_as | ||
|
|
||
| Supported types: | ||
| - `text` | ||
| - `int` | ||
| - `small_int` | ||
| - `big_int` | ||
| - `boolean` | ||
| - `date` | ||
| - `jsonb` | ||
|
|
||
| - `text` | ||
| - `int` | ||
| - `small_int` | ||
| - `big_int` | ||
| - `boolean` | ||
| - `date` | ||
| - `jsonb` | ||
|
|
||
| #### match opts | ||
|
|
||
|
|
@@ -428,13 +429,13 @@ An ste_vec index requires one piece of configuration: the `context` (a string) w | |
| This ensures that all of the encrypted values are unique to that context. | ||
| It is generally recommended to use the table and column name as a the context (e.g. `users/name`). | ||
|
|
||
| Within a dataset, encrypted columns indexed using an `ste_vec` that use different contexts cannot be compared. | ||
| Containment queries that manage to mix index terms from multiple columns will never return a positive result. | ||
| Within a dataset, encrypted columns indexed using an `ste_vec` that use different contexts cannot be compared. | ||
| Containment queries that manage to mix index terms from multiple columns will never return a positive result. | ||
| This is by design. | ||
|
|
||
| The index is generated from a JSONB document by first flattening the structure of the document such that a hash can be generated for each unique path prefix to a node. | ||
|
|
||
| The complete set of JSON types is supported by the indexer. | ||
| The complete set of JSON types is supported by the indexer. | ||
| Null values are ignored by the indexer. | ||
|
|
||
| - Object `{ ... }` | ||
|
|
@@ -451,12 +452,9 @@ For a document like this: | |
| "email": "[email protected]", | ||
| "name": { | ||
| "first_name": "Alice", | ||
| "last_name": "McCrypto", | ||
| "last_name": "McCrypto" | ||
| }, | ||
| "roles": [ | ||
| "admin", | ||
| "owner", | ||
| ] | ||
| "roles": ["admin", "owner"] | ||
| } | ||
| } | ||
| ``` | ||
|
|
@@ -466,17 +464,33 @@ Hashes would be produced from the following list of entries: | |
| ```js | ||
| [ | ||
| [Obj, Key("account"), Obj, Key("email"), String("[email protected]")], | ||
| [Obj, Key("account"), Obj, Key("name"), Obj, Key("first_name"), String("Alice")], | ||
| [Obj, Key("account"), Obj, Key("name"), Obj, Key("last_name"), String("McCrypto")], | ||
| [ | ||
| Obj, | ||
| Key("account"), | ||
| Obj, | ||
| Key("name"), | ||
| Obj, | ||
| Key("first_name"), | ||
| String("Alice"), | ||
| ], | ||
| [ | ||
| Obj, | ||
| Key("account"), | ||
| Obj, | ||
| Key("name"), | ||
| Obj, | ||
| Key("last_name"), | ||
| String("McCrypto"), | ||
| ], | ||
| [Obj, Key("account"), Obj, Key("roles"), Array, String("admin")], | ||
| [Obj, Key("account"), Obj, Key("roles"), Array, String("owner")], | ||
| ] | ||
| ]; | ||
| ``` | ||
|
|
||
| Using the first entry to illustrate how an entry is converted to hashes: | ||
|
|
||
| ```js | ||
| [Obj, Key("account"), Obj, Key("email"), String("[email protected]")] | ||
| [Obj, Key("account"), Obj, Key("email"), String("[email protected]")]; | ||
| ``` | ||
|
|
||
| The hashes would be generated for all prefixes of the full path to the leaf node. | ||
|
|
@@ -489,15 +503,15 @@ The hashes would be generated for all prefixes of the full path to the leaf node | |
| [Obj, Key("account"), Obj, Key("email")], | ||
| [Obj, Key("account"), Obj, Key("email"), String("[email protected]")], | ||
| // (remaining leaf nodes omitted) | ||
| ] | ||
| ]; | ||
| ``` | ||
|
|
||
| Query terms are processed in the same manner as the input document. | ||
|
|
||
| A query prior to encrypting & indexing looks like a structurally similar subset of the encrypted document, for example: | ||
|
|
||
| ```json | ||
| { "account": { "email": "[email protected]", "roles": "admin" }} | ||
| { "account": { "email": "[email protected]", "roles": "admin" } } | ||
| ``` | ||
|
|
||
| The expression `cs_ste_vec_v1(encrypted_account) @> cs_ste_vec_v1($query)` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "[email protected]". | ||
|
|
@@ -510,11 +524,12 @@ When reduced to a prefix list, it would look like this: | |
| [Obj, Key("account")], | ||
| [Obj, Key("account"), Obj], | ||
| [Obj, Key("account"), Obj, Key("email")], | ||
| [Obj, Key("account"), Obj, Key("email"), String("[email protected]")] | ||
| [Obj, Key("account"), Obj, Key("roles")], | ||
| [Obj, Key("account"), Obj, Key("email"), String("[email protected]")][ | ||
| (Obj, Key("account"), Obj, Key("roles")) | ||
| ], | ||
| [Obj, Key("account"), Obj, Key("roles"), Array], | ||
| [Obj, Key("account"), Obj, Key("roles"), Array, String("admin")] | ||
| ] | ||
| [Obj, Key("account"), Obj, Key("roles"), Array, String("admin")], | ||
| ]; | ||
| ``` | ||
|
|
||
| Which is then turned into an ste_vec of hashes which can be directly queries against the index. | ||
|
|
@@ -573,19 +588,20 @@ The format is defined as a [JSON Schema](src/cs_encrypted_v1.schema.json). | |
| It should never be necessary to directly interact with the stored `jsonb`. | ||
| Cipherstash proxy handles the encoding, and EQL provides the functions. | ||
|
|
||
| | Field | Name | Description | ||
| | -------- | ------------------ | ------------------------------------------------------------ | ||
| | s | Schema version | JSON Schema version of this json document. | ||
| | v | Version | The configuration version that generated this stored value. | ||
| | k | Kind | The kind of the data (plaintext/pt, ciphertext/ct, encrypting/et). | ||
| | i.t | Table identifier | Name of the table containing encrypted column. | ||
| | i.c | Column identifier | Name of the encrypted column. | ||
| | p | Plaintext | Plaintext value sent by database client. Required if kind is plaintext/pt or encrypting/et. | ||
| | c | Ciphertext | Ciphertext value. Encrypted by proxy. Required if kind is plaintext/pt or encrypting/et. | ||
| | m.1 | Match index | Ciphertext index value. Encrypted by proxy. | ||
| | o.1 | ORE index | Ciphertext index value. Encrypted by proxy. | ||
| | u.1 | Unique index | Ciphertext index value. Encrypted by proxy. | ||
| | sv.1 | STE vector index | Ciphertext index value. Encrypted by proxy. | ||
| | Field | Name | Description | | ||
| | ----- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | s | Schema version | JSON Schema version of this json document. | | ||
| | v | Version | The configuration version that generated this stored value. | | ||
| | k | Kind | The kind of the data (plaintext/pt, ciphertext/ct, encrypting/et). | | ||
| | i.t | Table identifier | Name of the table containing encrypted column. | | ||
| | i.c | Column identifier | Name of the encrypted column. | | ||
| | p | Plaintext | Plaintext value sent by database client. Required if kind is plaintext/pt or encrypting/et. | | ||
| | q | For query | Specifies that the plaintext should be encrypted for a specific query operation. If `null`, source encryption and encryption for all indexes will be performed. Valid values are `"match"`, `"ore"`, `"unique"`, `"ste_vec"`, `"ejson_path"`, and `"websearch_to_match"`. | | ||
| | c | Ciphertext | Ciphertext value. Encrypted by proxy. Required if kind is plaintext/pt or encrypting/et. | | ||
| | m | Match index | Ciphertext index value. Encrypted by proxy. | | ||
| | o | ORE index | Ciphertext index value. Encrypted by proxy. | | ||
| | u | Unique index | Ciphertext index value. Encrypted by proxy. | | ||
| | sv | STE vector index | Ciphertext index value. Encrypted by proxy. | | ||
|
|
||
| ## Helper packages | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are probably more idiomatic ways to do this in Ruby but not something we need to worry about now. Just sharing thoughts while they're fresh.
You could do:
If you made
EQL.table("users")a class method:And used
method_missingto handle the column name, you could do something like:Just ideas for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very fair. At the moment I'm mostly aiming for examples that get the point across and would be easy to port to other languages. Agreed that it'll make sense to implement a more idiomatic/ergonomic interface for Ruby at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth just writting all examples in psuedo code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly! Or maybe we can go with one of the languages included in this repo (Go or JS).