Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 82 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,14 @@ Once the custom types and functions are installed, you can start using EQL in yo

1. Create a table with a column of type `cs_encrypted_v1` which will store your encrypted data.
1. Use EQL functions to add indexes for the columns you want to encrypt.
- Indexes are used by Cipherstash Proxy to understand what cryptography schemes are required for your use case.
- Indexes are used by Cipherstash Proxy to understand what cryptography schemes are required for your use case.
1. Initialize Cipherstash Proxy for cryptographic operations.
- The Proxy will dynamically encrypt data on the way in and decrypt data on the way out based on the indexes you have defined.
- The Proxy will dynamically encrypt data on the way in and decrypt data on the way out based on the indexes you have defined.
1. Insert data into the defined columns using a specific payload format.
- The payload format is defined in the [data format](#data-format) section.
- The payload format is defined in the [data format](#data-format) section.
1. Query the data using the EQL functions defined in the [querying data with EQL](#querying-data-with-eql) section.
- No modifications are required to simply `SELECT` data from your encrypted columns.
- In order to perform `WHERE` and `ORDER BY` queries, you must wrap the queries in the EQL functions defined in the [querying data with EQL](#querying-data-with-eql) section.
- No modifications are required to simply `SELECT` data from your encrypted columns.
- In order to perform `WHERE` and `ORDER BY` queries, you must wrap the queries in the EQL functions defined in the [querying data with EQL](#querying-data-with-eql) section.
1. Integrate with your application via the [helper packages](#helper-packages) to interact with the encrypted data.

You can find a full getting started guide in the [GETTINGSTARTED.md](GETTINGSTARTED.md) file.
Expand Down Expand Up @@ -150,13 +150,13 @@ Which will execute on the server as:
SELECT encrypted_email FROM users;
```

And is the EQL equivalent of the following plaintext query.
And is the EQL equivalent of the following plaintext query:

```sql
SELECT email FROM users;
```

All the data returned from the database is fully decrypted and an audit trail is generated.
All the data returned from the database is fully decrypted.

## Querying data with EQL

Expand All @@ -170,15 +170,15 @@ Enables basic full-text search.

```rb
# Create the EQL payload using helper functions
payload = eqlPayload("users", "encrpyted_field", "plaintext value")
payload = EQL.for_match("users", "encrypted_field", "plaintext value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably more idiomatic ways to do this in Ruby but not something we need to worry about now. Just sharing thoughts while they're fresh.

You could do:

EQL.table("users").column("encrypted_field").match("plaintext value")

If you made EQL.table("users") a class method:

class User
  def self.eql
    EQL.table("users")
  end
end

And used method_missing to handle the column name, you could do something like:

User.eql.encrypted_field = "foo"
User.eql.encrypted_field > 10
# etc

Just ideas for later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very fair. At the moment I'm mostly aiming for examples that get the point across and would be easy to port to other languages. Agreed that it'll make sense to implement a more idiomatic/ergonomic interface for Ruby at some point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth just writting all examples in psuedo code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly! Or maybe we can go with one of the languages included in this repo (Go or JS).


Users.where("cs_match_v1(field) @> cs_match_v1(?)", payload)
```

Which will execute on the server as:

```sql
SELECT * FROM users WHERE cs_match_v1(field) @> cs_match_v1('{"v":1,"k":"pt","p":"plaintext value","i":{"t":"users","c":"encrpyted_field"}}');
SELECT * FROM users WHERE cs_match_v1(field) @> cs_match_v1('{"v":1,"k":"pt","p":"plaintext value","i":{"t":"users","c":"encrypted_field"},"q":"match"}');
```

And is the EQL equivalent of the following plaintext query.
Expand All @@ -195,15 +195,15 @@ Retrieves the unique index for enforcing uniqueness.

```rb
# Create the EQL payload using helper functions
payload = eqlPayload("users", "encrpyted_field", "plaintext value")
payload = EQL.for_unique("users", "encrypted_field", "plaintext value")

Users.where("cs_unique_v1(field) = cs_unique_v1(?)", payload)
```

Which will execute on the server as:

```sql
SELECT * FROM users WHERE cs_unique_v1(field) = cs_unique_v1('{"v":1,"k":"pt","p":"plaintext value","i":{"t":"users","c":"encrpyted_field"}}');
SELECT * FROM users WHERE cs_unique_v1(field) = cs_unique_v1('{"v":1,"k":"pt","p":"plaintext value","i":{"t":"users","c":"encrypted_field"},"q":"unique"}');
```

And is the EQL equivalent of the following plaintext query.
Expand All @@ -220,7 +220,7 @@ Retrieves the Order-Revealing Encryption index for range queries.

```rb
# Create the EQL payload using helper functions
eqlPayload("users", "encrypted_date", Time.now)
date = EQL.for_ore("users", "encrypted_date", Time.now)

User.where("cs_ore_64_8_v1(encrypted_date) < cs_ore_64_8_v1(?)", date)
```
Expand All @@ -246,7 +246,7 @@ User.order("cs_ore_64_8_v1(encrypted_field)").all().map(&:id)
Which will execute on the server as:

```sql
SELECT id FROM examples ORDER BY cs_ore_64_8_v1(feild) DESC;
SELECT id FROM examples ORDER BY cs_ore_64_8_v1(encrypted_field) DESC;
```

And is the EQL equivalent of the following plaintext query.
Expand All @@ -259,7 +259,7 @@ SELECT id FROM examples ORDER BY field DESC;

### `cs_ste_term_v1(val JSONB, epath TEXT)`

Retrieves the encrypted *term* associated with the encrypted JSON path, `epath`.
Retrieves the encrypted _term_ associated with the encrypted JSON path, `epath`.

### `cs_ste_vec_v1(val JSONB)`

Expand All @@ -269,7 +269,7 @@ Retrieves the Structured Encryption Vector for containment queries.

```rb
# Serialize a JSONB value bound to the users table column
term = User::ENCRYPTED_JSONB.serialize({field: "value"})
term = EQL.for_ste_vec("users", "attrs", {field: "value"})
User.where("cs_ste_vec_v1(attrs) @> cs_ste_vec_v1(?)", term)
```

Expand All @@ -295,8 +295,8 @@ This is useful for sorting or filtering on integers in encrypted JSON objects.

```rb
# Serialize a JSONB value bound to the users table column
path = EJSON_PATH.serialize("$.login_count")
term = User::ENCRYPTED_INT.serialize(100)
path = EQL.for_ejson_path("users", "attrs", "$.login_count")
term = EQL.for_ore("users", "attrs", 100)
User.where("cs_ste_term_v1(attrs, ?) > cs_ore_64_8_v1(?)", path, term)
```

Expand All @@ -309,18 +309,18 @@ SELECT * FROM users WHERE cs_ste_term_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') > 'QAJ
And is the EQL equivalent of the following plaintext query.

```sql
SELECT * FROM users WHERE attrs->'login_count' > 10;
SELECT * FROM users WHERE attrs->'login_count' > 10;
```

### `cs_ste_value_v1(val JSONB, epath TEXT)`

Retrieves the encrypted *value* associated with the encrypted JSON path, `epath`.
Retrieves the encrypted _value_ associated with the encrypted JSON path, `epath`.

**Example:**

```rb
# Serialize a JSONB value bound to the users table column
path = EJSON_PATH.serialize("$.login_count")
path = EQL.for_ejson_path("users", "attrs", "$.login_count")
User.find_by_sql(["SELECT cs_ste_value_v1(attrs, ?) FROM users", path])
```

Expand All @@ -333,7 +333,7 @@ SELECT cs_ste_value_v1(attrs, 'DQ1rbhWJXmmqi/+niUG6qw') FROM users;
And is the EQL equivalent of the following plaintext query.

```sql
SELECT attrs->'login_count' FROM users;
SELECT attrs->'login_count' FROM users;
```

## Managing indexes with EQL
Expand All @@ -346,24 +346,25 @@ These functions expect a `jsonb` value that conforms to the storage schema.
cs_add_index(table_name text, column_name text, index_name text, cast_as text, opts jsonb)
```

| Parameter | Description | Notes
| ------------- | -------------------------------------------------- | ------------------------------------
| `table_name` | Name of target table | Required
| `column_name` | Name of target column | Required
| `index_name` | The index kind | Required.
| `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text`
| `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below)
| Parameter | Description | Notes |
| ------------- | -------------------------------------------------- | ------------------------------------------------------------------------ |
| `table_name` | Name of target table | Required |
| `column_name` | Name of target column | Required |
| `index_name` | The index kind | Required. |
| `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` |
| `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) |

#### cast_as

Supported types:
- `text`
- `int`
- `small_int`
- `big_int`
- `boolean`
- `date`
- `jsonb`

- `text`
- `int`
- `small_int`
- `big_int`
- `boolean`
- `date`
- `jsonb`

#### match opts

Expand Down Expand Up @@ -428,13 +429,13 @@ An ste_vec index requires one piece of configuration: the `context` (a string) w
This ensures that all of the encrypted values are unique to that context.
It is generally recommended to use the table and column name as a the context (e.g. `users/name`).

Within a dataset, encrypted columns indexed using an `ste_vec` that use different contexts cannot be compared.
Containment queries that manage to mix index terms from multiple columns will never return a positive result.
Within a dataset, encrypted columns indexed using an `ste_vec` that use different contexts cannot be compared.
Containment queries that manage to mix index terms from multiple columns will never return a positive result.
This is by design.

The index is generated from a JSONB document by first flattening the structure of the document such that a hash can be generated for each unique path prefix to a node.

The complete set of JSON types is supported by the indexer.
The complete set of JSON types is supported by the indexer.
Null values are ignored by the indexer.

- Object `{ ... }`
Expand All @@ -451,12 +452,9 @@ For a document like this:
"email": "[email protected]",
"name": {
"first_name": "Alice",
"last_name": "McCrypto",
"last_name": "McCrypto"
},
"roles": [
"admin",
"owner",
]
"roles": ["admin", "owner"]
}
}
```
Expand All @@ -466,17 +464,33 @@ Hashes would be produced from the following list of entries:
```js
[
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")],
[Obj, Key("account"), Obj, Key("name"), Obj, Key("first_name"), String("Alice")],
[Obj, Key("account"), Obj, Key("name"), Obj, Key("last_name"), String("McCrypto")],
[
Obj,
Key("account"),
Obj,
Key("name"),
Obj,
Key("first_name"),
String("Alice"),
],
[
Obj,
Key("account"),
Obj,
Key("name"),
Obj,
Key("last_name"),
String("McCrypto"),
],
[Obj, Key("account"), Obj, Key("roles"), Array, String("admin")],
[Obj, Key("account"), Obj, Key("roles"), Array, String("owner")],
]
];
```

Using the first entry to illustrate how an entry is converted to hashes:

```js
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")]
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")];
```

The hashes would be generated for all prefixes of the full path to the leaf node.
Expand All @@ -489,15 +503,15 @@ The hashes would be generated for all prefixes of the full path to the leaf node
[Obj, Key("account"), Obj, Key("email")],
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")],
// (remaining leaf nodes omitted)
]
];
```

Query terms are processed in the same manner as the input document.

A query prior to encrypting & indexing looks like a structurally similar subset of the encrypted document, for example:

```json
{ "account": { "email": "[email protected]", "roles": "admin" }}
{ "account": { "email": "[email protected]", "roles": "admin" } }
```

The expression `cs_ste_vec_v1(encrypted_account) @> cs_ste_vec_v1($query)` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "[email protected]".
Expand All @@ -510,11 +524,12 @@ When reduced to a prefix list, it would look like this:
[Obj, Key("account")],
[Obj, Key("account"), Obj],
[Obj, Key("account"), Obj, Key("email")],
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")]
[Obj, Key("account"), Obj, Key("roles")],
[Obj, Key("account"), Obj, Key("email"), String("[email protected]")][
(Obj, Key("account"), Obj, Key("roles"))
],
[Obj, Key("account"), Obj, Key("roles"), Array],
[Obj, Key("account"), Obj, Key("roles"), Array, String("admin")]
]
[Obj, Key("account"), Obj, Key("roles"), Array, String("admin")],
];
```

Which is then turned into an ste_vec of hashes which can be directly queries against the index.
Expand Down Expand Up @@ -573,19 +588,20 @@ The format is defined as a [JSON Schema](src/cs_encrypted_v1.schema.json).
It should never be necessary to directly interact with the stored `jsonb`.
Cipherstash proxy handles the encoding, and EQL provides the functions.

| Field | Name | Description
| -------- | ------------------ | ------------------------------------------------------------
| s | Schema version | JSON Schema version of this json document.
| v | Version | The configuration version that generated this stored value.
| k | Kind | The kind of the data (plaintext/pt, ciphertext/ct, encrypting/et).
| i.t | Table identifier | Name of the table containing encrypted column.
| i.c | Column identifier | Name of the encrypted column.
| p | Plaintext | Plaintext value sent by database client. Required if kind is plaintext/pt or encrypting/et.
| c | Ciphertext | Ciphertext value. Encrypted by proxy. Required if kind is plaintext/pt or encrypting/et.
| m.1 | Match index | Ciphertext index value. Encrypted by proxy.
| o.1 | ORE index | Ciphertext index value. Encrypted by proxy.
| u.1 | Unique index | Ciphertext index value. Encrypted by proxy.
| sv.1 | STE vector index | Ciphertext index value. Encrypted by proxy.
| Field | Name | Description |
| ----- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| s | Schema version | JSON Schema version of this json document. |
| v | Version | The configuration version that generated this stored value. |
| k | Kind | The kind of the data (plaintext/pt, ciphertext/ct, encrypting/et). |
| i.t | Table identifier | Name of the table containing encrypted column. |
| i.c | Column identifier | Name of the encrypted column. |
| p | Plaintext | Plaintext value sent by database client. Required if kind is plaintext/pt or encrypting/et. |
| q | For query | Specifies that the plaintext should be encrypted for a specific query operation. If `null`, source encryption and encryption for all indexes will be performed. Valid values are `"match"`, `"ore"`, `"unique"`, `"ste_vec"`, `"ejson_path"`, and `"websearch_to_match"`. |
| c | Ciphertext | Ciphertext value. Encrypted by proxy. Required if kind is plaintext/pt or encrypting/et. |
| m | Match index | Ciphertext index value. Encrypted by proxy. |
| o | ORE index | Ciphertext index value. Encrypted by proxy. |
| u | Unique index | Ciphertext index value. Encrypted by proxy. |
| sv | STE vector index | Ciphertext index value. Encrypted by proxy. |

## Helper packages

Expand Down
Loading