Skip to content
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 109 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ A variety of searchable encryption techniques are available, including:
- **Matching** - Equality or partial matches
- **Ordering** - comparison operations using order revealing encryption
- **Uniqueness** - enforcing unique constraints
- **Containment** - containment queries using structured encryption

### 1.1 What is encryption in use?

Expand Down Expand Up @@ -120,6 +121,7 @@ EQL provides specialized functions to interact with encrypted data:
- **`cs_match_v1(val JSONB)`**: Enables basic full-text search.
- **`cs_unique_v1(val JSONB)`**: Retrieves the unique index for enforcing uniqueness.
- **`cs_ore_v1(val JSONB)`**: Retrieves the Order-Revealing Encryption index for range queries.
- **`cs_ste_vec_v1(val JSONB)`**: Retrieves the Structured Encryption Vector for containment queries.

### 3.3 Index functions

Expand All @@ -137,8 +139,7 @@ cs_add_index(table_name text, column_name text, index_name text, cast_as text, o
| column_name | Name of target column | Required
| index_name | The index kind | Required.
| cast_as | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text`
| opts | Index options | Optional for `match` indexes (see below)

| opts | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below)

###### cast_as

Expand All @@ -149,6 +150,7 @@ Supported types:
- big_int
- boolean
- date
- jsonb

###### match opts

Expand Down Expand Up @@ -205,6 +207,100 @@ If you're using n-gram as a token filter, then a token that is already shorter t
However, if that same short string only appears as a part of a larger token, then it will not match that record.
In general, therefore, you should try to ensure that the string you search for is at least as long as the `tokenLength` of the index, except in the specific case where you know that there are shorter tokens to match, _and_ you are explicitly OK with not returning records that have that short string as part of a larger token.

###### ste_vec opts

An ste_vec index on a encrypted JSONB column enables the use of PostgreSQL's `@>` and `<@` [containment operators](https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSONB-OP-TABLE).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice


An ste_vec index requires one piece of configuration: the `prefix` (a string) which is functionally similar to a salt for the hashing process.

Within a dataset, encrypted columns indexed using an ste_vec that use different prefixes can never compare as equal and containment queries that manage to mix index terms from multiple columns will never return a positive result. This is by design.

The index is generated from a JSONB document by first flattening the structure of the document such that a hash can be generated for each unique path prefix to a node.

The complete set of JSON types is supported by the indexer. Null values are ignored by the indexer.

- Object `{ ... }`
- Array `[ ... ]`
- String `"abc"`
- Boolean `true`
- Number `123.45`

For a document like this:

```json
{
"account": {
"email": "[email protected]",
"name": {
"first_name": "Alice",
"last_name": "McCrypto",
},
"roles": [
"admin",
"owner",
]
}
}
```

Hashes would be produced from the following list of entries:

```json
[
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")],
[Obj, Key("account"), Obj, Key("name"), Obj, Key("first_name"), String("Alice")],
[Obj, Key("account"), Obj, Key("name"), Obj, Key("last_name"), String("McCrypto")],
[Obj, Key("account"), Obj, Key("roles"), Array, String("admin")],
[Obj, Key("account"), Obj, Key("roles"), Array, String("owner")],
]
```

Using the first entry to illustrate how an entry is converted to hashes:

```json
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")]
```

The hashes would be generated for all prefixes of the full path to the leaf node.

```json
[
[Obj],
[Obj, Key("account")],
[Obj, Key("account"), Obj],
[Obj, Key("account"), Obj, Key("email")],
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")],
// (remaining leaf nodes omitted)
]
```

Query terms are processed in the same manner as the input document.

A query prior to encrypting & indexing looks like a structurally similar subset of the encrypted document, for example:

```json
{ "account": { "email": "[email protected]", "roles": "admin" }}
```

The expression `cs_ste_vec_v1(encrypted_account) @> cs_ste_vec_v1($query)` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "[email protected]".

When reduced to a prefix list, it would look like this:

```json
[
[Obj],
[Obj, Key("account")],
[Obj, Key("account"), Obj],
[Obj, Key("account"), Obj, Key("email")],
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")]
[Obj, Key("account"), Obj, Key("roles")],
[Obj, Key("account"), Obj, Key("roles"), Array],
[Obj, Key("account"), Obj, Key("roles"), Array, String("admin")]
]
```

Which is then turned into an ste_vec of hashes which can be directly queries against the index.

#### 3.3.2 cs_modify_index

```sql
Expand Down Expand Up @@ -262,6 +358,15 @@ cs_ore_v1(val jsonb)
Extracts an ore index from the `jsonb` value.
Returns `null` if no ore index is present.

#### 3.3.5 cs_ste_vec_v1

```sql
cs_ste_vec_v1(val jsonb)
```

Extracts an ste_vec index from the `jsonb` value.
Returns `null` if no ste_vec index is present.

### 3.4 Data Format

Encrypted data is stored as `jsonb` with a specific schema:
Expand Down Expand Up @@ -310,7 +415,8 @@ Cipherstash proxy handles the encoding, and EQL provides the functions.
| c | Ciphertext | Ciphertext value. Encrypted by proxy. Required if kind is plaintext/pt or encrypting/et.
| m.1 | Match index | Ciphertext index value. Encrypted by proxy.
| o.1 | ORE index | Ciphertext index value. Encrypted by proxy.
| u.1 | Uniqueindex | Ciphertext index value. Encrypted by proxy.
| u.1 | Unique index | Ciphertext index value. Encrypted by proxy.
| sv.1 | STE vector index | Ciphertext index value. Encrypted by proxy.

#### 3.4.1 Helper packages

Expand Down