-
Notifications
You must be signed in to change notification settings - Fork 78
239 vault cluster authentication #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
239 vault cluster authentication #255
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation instructions would need to be very explicit with regards setting up the VAULT_ADDR environment variable in the operator's manifest. This is the kind of installation step people miss. However, if someone missed it, the messaging-topology-operator's logs will clearly indicate that the operator is trying to access Vault on the loopback url.
We discussed internally if there could be a way of figuring out the Vault's address without having to manually configure it via an environment variable. It certainly was not trivial and for now, we have considered your approach is good enough.
internal/vault_reader.go
Outdated
| // if role == "" { | ||
| // return nil, errors.New("no role value set in Vault secret backend") | ||
| // } | ||
| role := "messaging-topology-operator" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed internally the pro's and con's of using a dedicated role for this operator versus using the role defined in the cluster definition (rabbitmqv1beta1.VaultSpec.Role) and the conclusion is that what you did is the best solution, i.e. a dedicated role.
Just for reference here was our analysis:
messaging topology operator having its own role:
Pros:
- If we ever needed to prevent the messaging-topology-operator from operating on any cluster we would only need to remove its role from Vault
- It allows us to grant different permissions to the cluster-operator and messaging-topology-operator. For instance, the cluster-operator needs to request private key and certificate in addition to looking up secrets whereas the messaging-topology-operator only needs to lookup secrets.
Cons :
- If we wanted to have a fine grained control on which cluster the messaging-topology-operator can operate on, we would have to modify the policy associated to the role so that it include/exclude secret paths the operator can access
messaging topology operator using the cluster's role defined in secreBackends.vault.role`
Pros:
- We can easily configure which clusters the messaging-topology-operator can operate when we declare the role. For instance, in the example below, we are granting the messaging-topology-operator to operate on the cluster
rabbitmq-1by simply adding the messaging-topology-operator in thebound_service_account_nameslist.
vault_exec "vault write auth/kubernetes/role/rabbitmq-1-role \
bound_service_account_names=rabbitmq-1,messaging-topology-operator \
bound_service_account_namespaces=$RABBITMQ_NAMESPACE \
policies=rabbitmq-policy ttl=24h"
Cons:
- If ever wanted to prevent messaging-topology-operator from operating on any cluster, we have to go over all the roles and remove its SA from the
bound_service_account_names. Whereas if we used a dedicated role for the operator, it is just a matter of removing the role as I mentioned earlier.
internal/vault_reader.go
Outdated
| // use the Vault token for making all future calls to Vault | ||
| vaultClient.SetToken(vaultToken) | ||
|
|
||
| return VaultClient{Reader: &VaultSecretReader{client: vaultClient}}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered caching the VaultClient or the vaultToken? I would imagine authentication is going to be more expensive if we compared it looking up a secret's path. I guess we could cache the vaultToken or the actual initialized VaultClient and reuse it over and over again until the token expires.
I am concerned with the messaging-topology-operator overloading Vault if it has to reconcile way too many resources.
This does not need to be done now. But it is worth considering for future releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would second what @MarcialRosales wrote: since applications will typically create hundreds to thousands of queues and bindings it is well worth caching the vault session
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a strawman, I've made a change to vault_reader.go so that after the initial login to Vault the returned token is periodically renewed in a background thread. The maxiumum TTL of the returned token is also respected so that when reached, a new Vault login is triggered to retrieve a fresh token for subsequent renewals. Would you be happy with this kind of approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@georgeharley Sorry I did not reply directly to your question. Yes, I am happy with your approach and in fact it adheres to Vault Go Client library recommendation. I am happy to leave the TTL up to what the server says rather providing one.
a4bef88 to
e60c9eb
Compare
internal/vault_reader.go
Outdated
| return | ||
| } | ||
|
|
||
| _, err = login(vaultClient, vaultSpec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor issue: When the client is first initialized, it looks like we are logging in twice. Once from InitializeClient() function so that we can fail fast if the credentials are wrong before scheduling the credentials renewal go routing. And then again, within the renewal go routing.
If we wanted to fix it we would have to pass the first vaultLoginResp onto renewToken function and have an index, in the for loop, that tells us whether this is the first time and hence use the vaultLoginResp from the parameter list. Or instead ignore it and login. The code is certainly not as clean as the current one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted and apologies for not mentioning it earlier as it was something that caused me some concern. You are correct that on start up there are two Vault logins in very quick succession so we end up with a log that contains something like the following ...
{"level":"info","ts":1637150459.7343986,"msg":"Authenticating to Vault"}
{"level":"info","ts":1637150459.746867,"msg":"Authenticating to Vault"}To avoid this I also looked at keeping the login() call out of InitializeClient() altgether and only carrying it out in the renewal thread but that turned worse IMHO since it created a tiny window of time where the Vault client was trying to carry out its work without a valid Vault token and the log output contained a smattering of "unable to retrieve credentials" messages which looked worse. In the interests of trying to get a strawman in front of you sooner rather than later I opted for the former approach since it is not as messy in the logs and only happens on the first reconciliation after startup. I will see if I can take another pass at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcialRosales I've just pushed a small change to the code which stops the double logins occurring on startup. The InitializeClient() function no longer attempts a Vault login but instead waits for the renewToken() function to notify it of the outcome of the its login attempt. The notification comes via a buffered channel passed to renewToken().
renewToken() only writes to the channel on the first login attempt. If that first attempt fails then it seemed like a good opportunity to fail fast and bail out of proceeding with the whole token renewal loop.
Have tried this on a couple of K8s clusters and now only seeing the one "Authenticating to Vault" log message on startup. In the unhappy scenario where the intial Vault login attempt fails then the "Lifecycle management of Vault token will not be carried out" message gets logged and all subsequent reconcile attempts fail with the Vault error.
MarcialRosales
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been much nicer if vault.Client would have been an interface rather than a struct. That way we could have mock it up rather than mocking/replacing implementation methods such as ReadVaultClientSecretFunc or LoginToVaultFunc.
But I am afraid, it is what we have.
09d63f9 to
65ba095
Compare
| } | ||
|
|
||
| fakeSecretStoreClient = &internalfakes.FakeSecretStoreClient{} | ||
| fakeSecretStoreClient.ReadCredentialsReturns(fakeCredentialsProvider, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@georgeharley When I run make unit-tests this test fails because fakeCredentialsProvider is null at this line which means the fakeSecretStoreClient will return a null CredentialsProvider and test fails. This beforeEach method is invoked before JustBeforeEach method of the Describe.
The behaviour of JustBeforeEach is explained here. It has also trapped me a few times.
One easy way to fix it is by moving the initialization of the fakeCredentialsProvider to the BeforeEach of this test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When make unit-tests is run in my environment there doesn't appear to be any failing tests. Could be something weird in my setup or else the randomising of the test specs is somehow always falling in my favour? FWIW I'm using go version 1.17.2, Ginkgo version 1.16.5, and GNU Make version 3.8.1 all running on macOS Monterey 12.0.1.
I'll make the suggested fix to ensure the outcome is predictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test updated.
| } | ||
|
|
||
| // Reduce load on Vault server in a problem situation where repeated login attempts may be made | ||
| time.Sleep(2 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, this could be replaced by an exponential backoff delay. I'd release it just how it is and if necessary change it to exponential backoff delay.
MarcialRosales
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@georgeharley thanks for implementing this feature in the messging-topology-operator. We appreciate it a lot.
You have implemented all the changes very quickly.
I would say we are ready to merge it and we can make further improvements when necessary.
|
Thank you for all of your careful feedback on this @MarcialRosales, it is greatly appreciated and has really improved things. I have just pushed a further small change which relates to the Vault role that the operator uses when logging in to the remote Vault server. The change is to look for the role name from an environment variable called |
|
If the latest proposed is deemed aceptable then I would be very happy to contribute to the operator installation documentation regarding the Vault support. |
internal/vault_reader.go
Outdated
| return result | ||
| } | ||
|
|
||
| func login(vaultClient *vault.Client, vaultSpec *rabbitmqv1beta1.VaultSpec) (*vault.Secret, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@georgeharley I think I might have found a problem.
We do not want to login to Vault server per reconciliation request hence we login once and we reuse the Vault's authentication token. So far so good, but the problem is that to login with Vault we are first initialising the Vault's client with two settings that comes from the VaultSpec of the first cluster the topology operator has to interact with.
I am not sure if a production environment all RabbitMQ clusters are setup with the same vault.hashicorp.com/namespace and vault.hashicorp.com/auth-path annotations. If all shared the same annotations, we could use the same vault's client for all clusters. Else, we cannot use the same vault's client for all RabbitMq clusters.
Sorry for spotting it too late.
I detected this problem while doing a refactoring of your vault_reader. I was going to suggest a change but I refrained myself from asking you to make refactoring changes after all the work you have put on it. Anyway, the refactoring I was doing were these two:
- Avoid mocking implementation methods. Instead, mock up collaborator(s) such as Vault client
- Reduce the amount of global variable/methods and singleton. While removing the singleton I found out the issue I described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcialRosales @georgeharley
Interesting nuance....all a part of the journey. If I may try to reiterate the problem and make a suggestion. Apologies in advance for the over explanation, just confirming my understanding.
- We have two steps in play for Vault integration. Vault Auth vs Vault credential access.
- Vault Auth, utilizing the k8s backend, uses the following components:
- Vault Address: Established via topology operator environment variable.
- SA Token: Currently pulled from the topology-operator container.
- Auth Role: Currently defaults to
messaging-topology-operator. - Auth Path: Currently pulled from rabbit cluster CR.
- Vault Namespace: Currently pulled from rabbit cluster CR.
- Once auth'ed, the client will request and be granted access to credentials (path sourced from the CR) based on policies setup on the Auth Role.
- The Vault Namespace and Auth Path on the CR is related to the Auth Role & SA used for the Rabbit Cluster rather than the Topology Operator.
MVP Suggestion:
It would seem as if we're close to a configuration where the Topology Operator would be able to cache the Vault token. The changes needed would be to transition the Auth Path & Vault Namespace components from being CR sourced to being specific to the Auth Role, i.e.(env's of OPERATOR_AUTH_PATH, VAULT_NAMESPACE).
MVP Assumptions:
- Same Vault endpoint is used by all clusters.
- The same Vault Namespace, or no namespace, is used by all clusters.
- The Topology Operator role will have access to all of the credential paths.
Future Considerations
Not sure if this would be appropriate, however...
Have the Topology Operator leverage all of the components of the Rabbit Cluster to login to Vault. This would require the Topology Operator to pull the token from the SA secret and use it along with the components in the CR. This would be more of an "assume role" type of pattern and remove the need to manage the Topology Operator Role. This would get a bit more involved as we may then find the need to cache the Vault Token per Rabbit cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twhite0 , I appreciate a lot your clear explanation !
Your MVP suggestion and assumptions makes a lot of sense, specially for the MVP. The change we have to make is minimal. As you summarized it, we only need to source the auth path and vault namespace from an Environment variable along with the role's environment variable. It makes a lot of sense and I think it is more correct.
Using a dedicated role+auth_path+namespace settings for the topology operator has advantages and disadvantages.
The advantage is that you have a separate vault access control policy for the cluster operator and for the messaging topology operator. The disadvantage is the extra administrative burden in Vault.
Your future considerations is really what I had in mind from the beginning. It seemed very logical. A vault administrator sets the appropriate role, auth-path, namespace for a cluster and this configuration is used by both, cluster and topology operators.
The disadvantage is that the access control is all-or-nothing per cluster, i.e. either both operators can operate on a cluster or they cannot. And, as you pointed out, the messaging topology operator would have to cache a vault token per cluster. The topology operator would have to use a bounded cache with max-size and TTL capabilities to limit the amount of tokens to cache.
If both strategies made sense, we could configure the topology operator with additional setting, vault_auth_strategy with 2 values : assume_role where the topology operator assumes all the auth settings from the cluster, or dedicated_role where the topology operator uses its own dedicated auth settings.
|
@MarcialRosales @twhite0 For your consideration I've just pushed a small change to this PR branch so that the Vault login step looks for the Vault namespace and auth path values from environment variables ( A version of the operator built with these changes in place has behaved as expected when run alongside two separate RabbitMQ clusters. The Vault configuration changes required when the second RabbitMQ cluster was added amounted to the following steps:
I would be happy to provide more detailed configuration information or maybe examples as part of the operator documentation if this is the agreed direction. |
|
Happy New Year all! @yaronp68 & @MarcialRosales: Would like to gently bump this Draft PR to identify next steps for acceptance. |
|
Happy New Year all ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @georgeharley for making this last code change. We can add the second strategy assume_role when necessary.
The only thing left and given that you have kindly volunteered is to add an example.
I have barely drafted the README.md of the new example below. A suggestion for the example folder could be vault-support.
It is only a suggestion, please feel free to change the structure or even the scope of the test. The intention of the suggested test is to clearly demonstrate how to operate on a vault-enabled cluster in a secure manner (with granted access and without granted access). And also that the operator can operate indistinctly on vault-enabled and k8s-secret enabled clusters.
Here goes the the suggested README.md:
HashiCorp Vault Support Example
The RabbitMQ Cluster Operator supports storing RabbitMQ default user (admin) credentials and RabbitMQ server certificates in
HashiCorp Vault and in K8 Secrets. And accordingly, the RabbitMQ Messaging Topology operator can operate on clusters which are vault-enabled or k8s-secrets enabled.
As explained in this KubeCon talk there four different approaches in Kubernetes to consume external secrets:
- Direct API
- Controller to mirrors secrets in K8s
- Sidecar + MutatingWebhookConfiguration
- Secrets Store CSI Driver
The RabbitMQ Cluster Operator integrates with vault using the 3rd approach (Sidecar + MutatingWebhookConfiguration) whereas
the RabbitMQ Messaging Topology Operator uses the 1st approach (Direct API).
Vault-related configuration required
-> Minimum settings required (operator configuration and/or vault configuration) in order for the
messaging topology operator to be able to operate on any vault-enabled RabbitMQ cluster
-> Optional settings we can set in the messaging topology operator (OPERATOR_VAULT_NAMESPACE,
OPERATOR_VAULT_AUTH_PATH, OPERATOR_VAULT_ROLE) what they are for and their default value.
Usage
-> Here we explain how to run the test example and what it does
-> We need two scripts, similar to how we did it for the cluster operator, one for setting up the
prerequisites of the test example and another for the actual test case
-> Briefly explain that first we need run the setup.sh script and what prerequisites it fulfils
-> Then explain the scope of the test.sh.
-> The proposed scope is the following:
-> create a queue in vault-enabled cluster where the operator has been granted access to the cluster in Vault
-> failed to create a queue in a vault-enabled cluster where the operator has not yet been granted access to the cluster
-> create a queue in a standard k8s-secret-enabled cluster. This demonstrates that the two security models can coexist
-> (The script has to deploy three clusters)
-> (The script terminates tearing everything down similar to the test.sh in the examples/vault-default-user in the
cluster operator)
* Vault client initialized at startup and passed to each controller * Each controller passes Vault client in calls to ParseRabbitmqClusterReference() and receives a CredentialsProvider from which to pull username and password when accessing RabbitMQ client * Basic fix up of broken tests * More unit test code required * Bumped cluster-operator dependency v1.8.3 -> v1.9.0 * WIP
…ator * Intent is all controllers will be using the same Vault client when reading credentials during reconciliation phase * WIP
* During each controller's reconcile phase the call to ParseRabbitmqClusterReference will be responsible for creating a Vault connection and reading credentials which then get returned to controller in a CredentialsProvider * ParseRabbitmqClusterReference function will try and fetch credentials from Vault if cluster spec contains a Vault secret backend, otherwise will fallback to using K8s Secret in the cluster namespace * More unit testing required * WIP
* allow optional “vault.hashicorp.com/auth-path” to override auth path used when client logs into Vault * allow optional “vault.hashicorp.com/namespace” to guide whether or not client should specify a target namespace in the remote enterprise Vault server * speculative use of these annotations so this is all likely to change * WIP
* Add some additional checks on Vault secret returned from authenticated read of Vault path
* To be undone when a suitable role injection mechanism is implemented * A couple of unit tests have been made pending while the above is in place
* when K8s secrets not being used to supply cluster default user creds then cluster reader needs to get the cluster's K8s service name from somewhere else; based on observation that K8s service name appears to be the same as the cluster CR, going to just use the cluster CR name for now * make existing Vault reader unit test less flakey * WIP
* CredentialsProvider now has a single operation called Data() that returns a named credential attribute (e.g. username, password, etc) in a []byte * Makes the interface more extensible * WIP
* github.com/hashicorp/vault/api v1.1.1 -> v1.3.0 * github.com/onsi/gomega v1.16.0 -> v1.17.0
* SecretStoreClient created exactly once as part of first reconciliation * Vault token owned by SecretStoreClient is continually renewed in a background thread that manages its lifecycle * Lifecycle management thread is cognizant of Vault token's max TTL (defaults to 32 days) and will force a fresh Vault login when reached
* Vault logins are only made from the renewToken thread * renewToken notifies main thread when first login attempt has been made by means of a buffered channel that is passed into it as an argument * token lifecycle management only commences if the first Vault login attempt succeeds * small delay added between Vault login attempts if there are problems renewing a token (e.g. if Vault server cannot be reached because of a temporary problem on the network)
* Ensure fakeCredentialsProvider is properly created before accessed in Ginkgo It block
* When authenticating to Vault the operator will use the role specified by the OPERATOR_VAULT_ROLE environment variable set on the operator container spec (a peer of the OPERATOR_NAMESPACE and VAULT_ADDR environment variables) * If env var not set then default Vault role value of "messaging-topology-operator" will be used
* Vault login namespace now found from OPERATOR_VAULT_NAMESPACE env var in the operator container. If not set then use default Vault namespace * Vault login auth path now found from OPERATOR_VAULT_AUTH_PATH env var in the operator container. If not set then default to using the auth path "auth/kubernetes" * Above means that Vault login step no longer uses any information from the current RabbitMQ cluster spec
|
@twhite0 I am currently adding the OSS docs for the Vault support in the messaging topology operator at https://github.com/rabbitmq/rabbitmq-website/tree/vault-docs-topology-operator |
* setup.sh installs Vault to current cluster and configures it for the work carried out in the test.sh script * test.sh goes through a series of steps to demonstrate how the operator can use Vault for accessing admin credentials for a RabbitMQ cluster * README.md describes the basic flow of the example as well as stating that the RabbitMQ Cluster operator and Messaging Topology operators are pre-requisites for this example
b3e8a95 to
09e6360
Compare
* Replace sleeps of 30 seconds after the Message Topology operator is re-deployed with a function that waits until the operator deployment log contains an indication that its initialisation is complete and it is ready to handle reconciliations
|
@MarcialRosales I have added a strawman example under docs/examples/vault-support as suggested. I think it covers the main objectives you were looking for although only two RabbitMQ clusters are involved. Very happy to follow any recommendations or advice for improving this, including changing the whole test scenario if you think it fails to put across the necessary points. |
| necessary for the operator container to have one or more environment variables | ||
| set. | ||
|
|
||
| - `VAULT_ADDR` should be set to the URL of the Vault server API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the only mandatory settings that have/must be set, isn't it? the other settings are required if the default value is not applicable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct. I will ammend the wording in that README to make that clear. Thanks.
| Run the [setup.sh](./setup.sh) script to install Vault to the current | ||
| Kubernetes cluster. This will also create the necessary credentials, | ||
| policy, and configuration in preparation for running the example. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
- it creates the credentials for the cluster
cluster-vault-a - It creates a policy that grants read access to the credentials of the
cluster-vault-a - It creates the role
messaging-topology-operatorwhich grants the service accounts from the RabbitMQ clustercluster-vault-aand the messaging-topology-operator to read the cluster credentials.
docs/examples/vault-support/setup.sh
Outdated
| # bound_service_account_names values follow the pattern "<RabbitmqCluster name>-server”. | ||
| # bound_service_account_namespaces values must include rabbitmq-system (where the messaging topology operator is deployed) and the namespaces where the RabbitmqClusters are deployed. | ||
| #vault_exec "vault write auth/kubernetes/role/messaging-topology-operator bound_service_account_names=cluster-vault-a-server,cluster-vault-b-server,vault-tls-server,messaging-topology-operator bound_service_account_namespaces=$RABBITMQ_NAMESPACE,rabbitmq-system policies=cluster-vault-a-policy,cluster-vault-b-policy ttl=24h" | ||
| vault_exec "vault write auth/kubernetes/role/messaging-topology-operator bound_service_account_names=cluster-vault-a-server,vault-tls-server,messaging-topology-operator bound_service_account_namespaces=$RABBITMQ_NAMESPACE,rabbitmq-system policies=cluster-vault-a-policy ttl=24h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally prefer to have different roles for the RabbitMQ clusters to access their credentials and a different role for the messaging topology operator. It is especially relevant when we enable TLS in the cluster. In that particular case, the cluster needs write access to the vault's path whereas the messaging topology operator will only ever need read access.
It is more work because we need an extra policy and role defined in Vault and some adjustments to the cluster definitions. But I think it is worth the effort.
Also, I think the vault-tls-server service account is not necessary for this example when we define the role.
| loopback_users = none | ||
| secretBackend: | ||
| vault: | ||
| role: messaging-topology-operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment further down in relation to separating role for the cluster operator and the messaging topology operator
| # Create a policy that allows reading of the default user credentials for the RabbitMQ cluster cluster-vault-a | ||
| # The path must be referenced from the RabbitmqCluster cluster-vault-a CRD spec.secretBackend.vault.defaultUserPath | ||
| echo "Creating Vault policy named cluster-vault-a-policy for reading of cluster-vault-a credentials..." | ||
| vault_exec "vault policy write cluster-vault-a-policy - <<EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether it is useful to add a section that discusses authorization strategies.
In this example, as it stands, we are suggesting a strategy whereby each cluster gets its own policy and role. It offers a great deal of security but i am not sure if it is scalable.
Another approach is to use a single policy and role for a group of clusters using wildcard in policy's paths. For instance, have a policy which grants access to the path secret/data/rabbitmq/finance-group/*. A cluster whose name is "orders" would retrieve its credentials from this path secret/data/rabbitmq/finance-group/creds. And all the clusters in that group must use the same role, e.g. finance-group-role. Any time we create a cluster, we have to include the clusters service account in the role's bound_service_account_names` attribute.
docs/examples/vault-support/test.sh
Outdated
|
|
||
| # Update the Messaging Topology operator deployment, providing it with the Vault related environment variables | ||
| echo "Updating Messaging Topology operator to work with Vault..." | ||
| kubectl set env deployment/messaging-topology-operator -n rabbitmq-system --containers='manager' OPERATOR_VAULT_ROLE=messaging-topology-operator VAULT_ADDR=http://vault.default.svc.cluster.local:8200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor issue: VAUILT_ADDR assumes we ran setup.sh from the default namespace.
|
|
||
| - `VAULT_ADDR` should be set to the URL of the Vault server API | ||
| - `OPERATOR_VAULT_ROLE` should be set to the name of the Vault role that is used when accessing credentials. This defaults to "messaging-topology-operator" | ||
| - `OPERATOR_VAULT_NAMESPACE` may be set to the Vault namespace to use when the Messaging Topology operator is authenticating. If not set then the default Vault namespace is assumed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: add a link to the Vault namespace so that it is not confused with kubernetes namespaces.
| set. | ||
|
|
||
| - `VAULT_ADDR` should be set to the URL of the Vault server API | ||
| - `OPERATOR_VAULT_ROLE` should be set to the name of the Vault role that is used when accessing credentials. This defaults to "messaging-topology-operator" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The setup.sh script creates this role with the name messaging-topology-operator
MarcialRosales
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @georgeharley for your contribution ! Greatly appreciated
I have only small suggestions to make to your last changes.
Let me know if you are ok with then and we can proceed with the merge.
* Improve description of setup and test in README * Have the RabbitMQ cluster use a separate Vault role and policy to that used by the Messaging Topology operator * Cater for the setup and scripts being run in Kubernetes namespaces other than default
|
Thank you for the detailed feedback @MarcialRosales which definitely improves things. I have just pushed a further commit now that I hope addresses your suggestions 🤞 |
ChunyiLyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thank you for contributing!
|
@georgeharley We noticed that most of integration-tests are failing because the default-user's k8s secret has no username and/or password however we are validating them here (https://github.com/georgeharley/messaging-topology-operator/blob/239-vault-cluster-authentication/internal/cluster_reference.go#L115). Before this PR we only validated these fields in the RabbitMqClient implementation. However, the controller test cases only use a fake implementation of the RabbitmqClient hence the test cases are not affected by the fact that default-user's k8s secret has not username/password on it. We prefer to keep the validation in one place, in the RabbitmqClient rather than in two places. Of course, an alternative would have been to modify Is that ok? It is just a matter of removing the username and password validation if-statements from here. We prefer to merge after the whole pipeline is happy. |
* Instead of throwing errors if the cluster K8s secret does not contain username and/or password values just log the findings
|
@MarcialRosales that sounds fair enough. Just pushed an update now to log if the credentials are not found in the secret rather than raising errors. Hope that approach is OK? |
This relates to #239
Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed
Note to contributors: remember to re-generate client set if there are any API changes
Summary Of Changes
We've built a version of the topology operator from these draft changes and carried out some spike testing to verify that it is providing the desired behaviour. A quick inspection of the code will reveal that some tactical decisions were required in order to pull things together.
secretBackend.vault.defaultUserPathsecretBackend.vault(vault.hashicorp.com/auth-pathandvault.hashicorp.com/namespacerespectively)