Skip to content

Conversation

@CDThomas
Copy link
Contributor

@CDThomas CDThomas commented Oct 17, 2024

This change updates the EQL query docs to use the new field "q" (for query). This field tells proxy to perform encryption for a specific query operation (instead of performing source encryption and encryption for all indexes).

This change also updates the JSON schema:

  • add new "for query" field
  • add missing STE vec index field
  • remove ".1" suffix from m, u, and o index fields
  • add variants for the encrypted payloads for queries. These variants are used internally by proxy for query-only encryption (for filters in WHERE clauses, for example).

@CDThomas CDThomas force-pushed the add-docs-on-query-only-encryption branch 5 times, most recently from 5afdfbd to b67e940 Compare October 17, 2024 01:23
@CDThomas CDThomas changed the title Add docs on query only encryption Update docs and JSON schema with "query-only" encryption changes Oct 17, 2024
"title": "kind",
"type": "string",
"enum": ["pt", "ct", "en"]
"enum": ["pt", "ct", "en", "qm", "qo", "qu", "qejp", "qsv"]
Copy link
Contributor Author

@CDThomas CDThomas Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When "for query" is set on the plaintext payload, Proxy will encrypt the payload as one of the new variants here:

  • qm = query match
  • qo = query ORE
  • qu = query unique
  • qejp = query EJSON path
  • qsv= query STE vec

Proxy will only generate the index needed for the specific query and set the relevant kind/index field in the payload.

For example, if the client sends through a plaintext payload with "for query" set to "match":

{
  "v": 1,
  "k": "pt",
  "p": "abc",
  "i": {
    "t": "users",
    "c": "email_encrypted"
  },
  "q": "match"
}

Proxy would encrypt that as the match query variant:

{
  "v": 1,
  "k": "qm",
  "m": [1, 2, 3]
}

Note that the "m" key in the match query payload lines up with the same key used for the match index that's stored in the column. This allows the EQL SQL funcs to look up the index field using the same key (whether the value comes from the column or from a param):

-- $1 here should be encrypted for a match query
SELECT * FROM users WHERE cs_match_v1(email_encrypted) @> cs_match_v1($1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the new kinds required?
The postgres function shouldn't need the kind as long as the actual field is encoded correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't strictly required (we could re-use the existing "ct"/ciphertext variant), but I'd prefer to model this as new variants for a few reasons.

On the Proxy/Rust side:

  • The source ciphertext ("c" field) in the existing "ct"/ciphertext variant would have to become optional (since queries wouldn't use it)
  • All of the indexes are optional in the existing "ct"/ciphertext variant, which is bit awkward to work with when encrypting for a query because behaviour-wise only one (required) index field should be set in the payload depending on the operation
  • Basically, specific variants for each makes it harder for us to construct something incorrectly and means that we can lean more on the compiler to catch problems

On the PG side:

  • If PG attempts to store a q* variant, something has gone wrong. It's convenient to be able to reject anything that isn't the ct variant
  • It's nice for debugging to be able to be able to identify with specific variant was chosen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having variants for every option adds complexity.
We could have a single "query" variant to disambiguate from ct and pt.

Coupled with having the "q" value be the identifier, we can then validate.

"knd": "qt", 
"qry": "u",  
"u": "etc",

if knd == qt
qry is required and qry.value (u) should exist as a field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be an OK middle ground if we're concerned about the number of variants.

Conceptually there's a variant for each and I rather make that explicit than have to model it as a single type that requires extra validation logic to handle what the compiler could catch for us.

@CDThomas CDThomas marked this pull request as ready for review October 17, 2024 02:01
README.md Outdated
Comment on lines 103 to 119
Note that plaintext payloads for query operations should set the `"q"` (for query) property.
This property tells CipherStash Proxy to only perform encryption necessary for a specific operation.
Otherwise, Proxy will perform source encryption and encryption for all indexes configured for the given column.

For reference, the EQL payload is defined as a `jsonb` with a specific schema:

```json
{
"v": 1,
"k": "pt",
"p": "[email protected]",
"i": {
"t": "users",
"c": "email_encrypted"
},
"q": "match"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I like that.

```rb
# Create the EQL payload using helper functions
payload = eqlPayload("users", "encrpyted_field", "plaintext value")
payload = EQL.for_match("users", "encrypted_field", "plaintext value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably more idiomatic ways to do this in Ruby but not something we need to worry about now. Just sharing thoughts while they're fresh.

You could do:

EQL.table("users").column("encrypted_field").match("plaintext value")

If you made EQL.table("users") a class method:

class User
  def self.eql
    EQL.table("users")
  end
end

And used method_missing to handle the column name, you could do something like:

User.eql.encrypted_field = "foo"
User.eql.encrypted_field > 10
# etc

Just ideas for later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very fair. At the moment I'm mostly aiming for examples that get the point across and would be easy to port to other languages. Agreed that it'll make sense to implement a more idiomatic/ergonomic interface for Ruby at some point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth just writting all examples in psuedo code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly! Or maybe we can go with one of the languages included in this repo (Go or JS).

Copy link
Contributor

@coderdan coderdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I like the approach. Left some musings on Ruby DSL improvements but not needed for the moment.

@tobyhede
Copy link
Contributor

I would make the q field be a valid index/encoding type and add the json path as a new accepted encoding.

"q": "m"

@CDThomas
Copy link
Contributor Author

I would make the q field be a valid index/encoding type and add the json path as a new accepted encoding.

"q": "m"

I think this might not work for use cases like websearch_to_tsquery support, @tobyhede. Our websearch_to_tsquery implementation will use the same output type as a regular match index, but will have some extra tokenisation logic that only needs to happen for queries.

@tobyhede
Copy link
Contributor

@CDThomas Do you have an example of what that means?

@CDThomas
Copy link
Contributor Author

CDThomas commented Oct 17, 2024

@CDThomas Do you have an example of what that means?

websearch_to_tsquery is a PG function that allows basic "web search" syntax on top of Postgres full text search. There are some docs here.

For example, the search 'cat or dog' could be used to search for documents that contain the words "cat" or "dog" in them.

In our implementation, the search 'cat or dog' will need to be handled differently to the text 'cat or dog' that's stored in a column (or as a logical operator vs as a word in the text itself). Both will use match indexes/bloom filters, but the search needs some extra parsing/tokenisation logic.

@tobyhede

@cipherstash cipherstash deleted a comment from CDThomas Oct 17, 2024
**Example:**

```rb
Users.findAll(&:encrypted_email)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CDThomas I was trying to show a basic SELECT statement here withour any WHERE clauses and highlighting that responses are simply decrypted. I like what you've done here but maybe for the cs_match_v1 example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense @calvinbrewer. I reverted the changes in this section (and already updated the match example).

This change updates the EQL query docs to use the new field `"q"` (for query).
This field tells proxy to perform encryption for a specific query operation (instead
of performing source encryption and encryption for all indexes).

This change also updates the JSON schema:
- add new "for query" field
- add missing STE vec index field
- remove ".1" suffix from m, u, and o fields
This change adds variants for the encrypted payloads for queries to the
EQL JSON schema.

These variants are used internally by proxy for query-only encryption
(for filters in `WHERE` clauses, for example).
Update the match filter in the select diagram to reflect actual usage.
These aren't actually needed since STE term plaintext queries will be
encrypted as the same payload as ORE queries and websearch_to_match
queries will be the same payload as match queries.

Also add missing query variant for EJSON path queries.
@CDThomas CDThomas force-pushed the add-docs-on-query-only-encryption branch from bd33ebf to 6122f41 Compare October 18, 2024 05:07
@CDThomas
Copy link
Contributor Author

Merging this one since the docs here reflect what we've already started to implement on the Proxy side, but we can adjust the docs if we end up simplifying more.

@CDThomas CDThomas merged commit 59a6982 into main Oct 18, 2024
@CDThomas CDThomas deleted the add-docs-on-query-only-encryption branch October 18, 2024 05:13
tobyhede pushed a commit that referenced this pull request Oct 27, 2025
…tion

Update docs and JSON schema with "query-only" encryption changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants