diff --git a/JSON.md b/JSON.md new file mode 100644 index 00000000..c9e22aff --- /dev/null +++ b/JSON.md @@ -0,0 +1,132 @@ +# JSON Encrypted Indexing + +> [!NOTE] +> This section is under construction + + +JSONB objects can be encrypted in EQL using a Structured Encryption Vec (`ste_vec`) +or a Structured Encryption Map, `ste_map`. + +```json +{ + : +} +``` + + +## Simplified JSON Path + +CipherStash EQL supports a simplified JSONPath syntax as follows: + +| Expression | Description | +|------------|-------------| +| `$` | The root object or array. | +| `.property` | Selects the specified property in a parent object. | +| `[n]` | Selects the n-th element from an array. Indexes are 0-based. | +| `[*]` | Matches any array element | + +### Examples + +Given the following JSON: + +```json +{ + "firstName": "John", + "lastName": "doe", + "scores": [1, 2, 3] +} +``` + +`$.firstName` returns `[John]` +`$.scores` returns `[[1, 2, 3]]` +`$[0]` returns nothing +`$.scores[0]` returns `[1]` +`$.scores[*]` returns `[1, 2, 3]` +`$.` returns the entire object + +### Path Segments + +A Simplified JSON Path can be tokenized into segments where each segment is one of: + +* `.` +* A property +* `[*]` +* `[n]` + +Below are some paths along with their segment tokenizations: + +* `$.firstName` -> `[".", "firstName"]` +* `$.scores[0]` -> `[".", "scores", "[0]"]` +* `$.` -> `["."]` +* `$` -> `["."]` + + +## Selectors + +A selector represents an encryption of a Simplied JSON Path for a leaf node in the JSON tree (*including* the leaf node itself), +along with information about what type it selects (i.e. a `term` or a `ciphertext`). + +Given: + +* An `INFO` string representing storage context (e.g the table and column name) +* A `TYPE` - either `T` (term) or `C` (ciphertext) +* A sub-type, `t`, comprising *exactly* 1-byte (set to 0 for the default sub-type) +* A path `P` made up of segments `P(0)..P(N)` +* The length (in bytes) of `x` defined by `len(x)` +* A secure Message Authenticated Code function, `MAC` (such as Blake3 or SHA2-512) +* A length parameter `L` which, when passed to `TRUNCATE(x, L)` will truncate X to `L` bytes +* `+` means string concatenation + +The selector is defined as: + +``` +TRUNCATE(MAC( + + len() + {P(0) + len(P(0))} + ... {P(N) + len(P(N))}), L) +``` + +## Examples + +* `INFO`: `customers/attrs` +* `TYPE`: `T` +* `t` : `0` +* `L`: `16` + +A given input: + +```json +{ + "firstName": "John", + "lastName": "doe", + "scores" [] +} +``` + +The selector, `S1` for the path `$.firstName` is: + +``` +S1 = TRUNCATE(MAC("T" + 0 + "customers/attrs" + 15 + "." + 1 + "firstName" + 9), 16) +``` + +The selector, `S2` for the path `$.scores[*]` is: + +``` +S2 = TRUNCATE(MAC("T" + 0 + "customers/attrs" + 15 + "." + 1 + "scores" + 6 + "[*]" + 3), 16) +``` + + +## Terms + + + + + + +For arrays we could do: +``` +$.scores[0] +``` + +Or (if position is not important) +``` +$.scores[] +``` + diff --git a/README.md b/README.md index 604bfa63..eaf53ef1 100644 --- a/README.md +++ b/README.md @@ -222,17 +222,94 @@ CREATE TABLE users ( ### EQL functions -EQL provides specialized functions to interact with encrypted data: +EQL provides specialized functions to interact with encrypted data. +These Functions expect an encrypted domain type (which is effectively just JSONB). - **`cs_ciphertext_v1(val JSONB)`**: Extracts the ciphertext for decryption by CipherStash Proxy. - **`cs_match_v1(val JSONB)`**: Enables basic full-text search. - **`cs_unique_v1(val JSONB)`**: Retrieves the unique index for enforcing uniqueness. - **`cs_ore_v1(val JSONB)`**: Retrieves the Order-Revealing Encryption index for range queries. -- **`cs_ste_vec_v1(val JSONB)`**: Retrieves the Structured Encryption Vector for containment queries. + + +#### `cs_ste_vec_v1(val JSONB)` + +Retrieves the Structured Encryption Vector for containment queries. + +**Example:** + +```rb +# Serialize a JSONB value bound to the users table column +term = User::ENCRYPTED_JSONB.serialize({field: "value"}) +User.where("cs_ste_vec_v1(attrs) @> cs_ste_vec_v1(?)", term) +``` + +Which will execute on the server as: + +```sql +SELECT * FROM users WHERE cs_ste_vec_v1(attrs) @> '53T8dtvW4HhofDp9BJnUkw'; +``` + +And is the EQL equivalent of the following plaintext query. + +```sql +SELECT * FROM users WHERE attrs @> '{"field": "value"}`; +``` + +#### `cs_ste_term_v1(val JSONB, epath TEXT)` + +Retrieves the encrypted index term associated with the encrypted JSON path, `epath`. + +This is useful for sorting or filtering on integers in encrypted JSON objects. + +**Example:** + +```rb +# Serialize a JSONB value bound to the users table column +path = EJSON_PATH.serialize("$.login_count") +term = User::ENCRYPTED_INT.serialize(100) +User.where("cs_ste_term_v1(attrs, ?) > cs_ore_64_8_v1(?)", path, term) +``` + +Which will execute on the server as: + +```sql +SELECT * FROM users WHERE cs_ste_term_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') > 'QAJ3HezijfTHaKrhdKxUEg'; +``` + +And is the EQL equivalent of the following plaintext query. + +```sql +SELECT * FROM users WHERE attrs->'login_count' > 10; +``` + +#### `cs_ste_value_v1(val JSONB, epath TEXT)` + +Retrieves the encrypted *value* associated with the encrypted JSON path, `epath`. + +**Example:** + +```rb +# Serialize a JSONB value bound to the users table column +path = EJSON_PATH.serialize("$.login_count") +User.find_by_sql(["SELECT cs_ste_value_v1(attrs, ?) FROM users", path]) +``` + +Which will execute on the server as: + +```sql +SELECT cs_ste_value_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') FROM users; +``` + +And is the EQL equivalent of the following plaintext query. + +```sql +SELECT attrs->'login_count' FROM users; +``` + ### Index functions -These Functions expect a `jsonb` value that conforms to the storage schema. +These functions expect a `jsonb` value that conforms to the storage schema. #### cs_add_index @@ -242,22 +319,22 @@ cs_add_index(table_name text, column_name text, index_name text, cast_as text, o | Parameter | Description | Notes | ------------- | -------------------------------------------------- | ------------------------------------ -| table_name | Name of target table | Required -| column_name | Name of target column | Required -| index_name | The index kind | Required. -| cast_as | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` -| opts | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) +| `table_name` | Name of target table | Required +| `column_name` | Name of target column | Required +| `index_name` | The index kind | Required. +| `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` +| `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) ##### cast_as Supported types: - - text - - int - - small_int - - big_int - - boolean - - date - - jsonb + - `text` + - `int` + - `small_int` + - `big_int` + - `boolean` + - `date` + - `jsonb` ##### match opts