Skip to content

Conversation

@tych0
Copy link
Contributor

@tych0 tych0 commented Nov 22, 2024

This adds an adjustment for seccomp policies. The intent is that people can wholesale replace policies, or parse them, make some changes, and then send them back. Sending them to NRI via containerd requires some containerd patches as well, those are here: https://github.com/tych0/containerd/commits/nri-seccomp/

Specifically, we are interested in making the listenerPath of the policy dynamic based on a k8s pod spec, so we can't use the Localhost custom policy (well, we can use most of it, except for listenerPath, which we have an NRI plugin to change based on this code).

This patch is a lot of boilerplate, which is unfortunate. There is a much smaller but similar patch:
tych0@a70547a but it involves directly serializing a runtime-spec string

Finally, note the comment in generate.go: the runtime-tools generate code does not have complete coverage for seccomp stuff, so I opted to not use any of it, vs. adding more stuff to runtime-tools. The fact that there are human and computer names is also confusing, it seems like we should stick to the computer names for this particular interface.

@tych0 tych0 force-pushed the seccomp-adjustment branch from ebd37f0 to ecf3a5b Compare November 22, 2024 10:43
Copy link
Member

@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit comments on some things, will do another round later today for the actual implementation

@tych0 tych0 force-pushed the seccomp-adjustment branch from ecf3a5b to 5d2ada7 Compare November 25, 2024 15:38
@tych0
Copy link
Contributor Author

tych0 commented Nov 25, 2024


  + '[' -z https://api.github.com/repos/containerd/nri/pulls/123/commits ']'
  ++ curl https://api.github.com/repos/containerd/nri/pulls/123/commits
  ++ jq -r '.[0].parents[0].sha + "..HEAD"'
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                   Dload  Upload   Total   Spent    Left  Speed
  
    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  100   278  100   278    0     0   2986      0 --:--:-- --:--:-- --:--:--  3021
  jq: error (at <stdin>:1): Cannot index object with number

seems like maybe curl needs a --fail there...

@tych0 tych0 force-pushed the seccomp-adjustment branch 2 times, most recently from fdd94b0 to 99d2437 Compare December 2, 2024 15:25
Copy link
Member

@klihub klihub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I think this is the first addition of something primarily/solely security-related to NRI (well, apart from #124, but I think that's in a way the flip-side of the same coin as I guess you guys need both of them). @mikebrow was already before these of the opinion that some of the features in NRI should be possible to lock down administratively, so I'd like him to chime in, too. I'd expect that discussion to be revived.

@mikebrow
Copy link
Member

mikebrow commented Dec 2, 2024

LGTM. I think this is the first addition of something primarily/solely security-related to NRI (well, apart from #124, but I think that's in a way the flip-side of the same coin as I guess you guys need both of them). @mikebrow was already before these of the opinion that some of the features in NRI should be possible to lock down administratively, so I'd like him to chime in, too. I'd expect that discussion to be revived.

Nod, fields currently managed by k8s over the cri api, esp. those configurable in kubelet and/or on the pod spec itself, these need some deep discussion around the cri contract. Had a good talk with Sam on these at kubecon. In a nutshell, I think we should consider admin config switches for controlling our "default" container runtime security profiles. And I believe we need larger discussions with the other side of the cri (the clients) when we are administratively adjusting the kubernetes security profile (the client's security profile). In the case where we are increasing the security profile I could see the client being ok with that, but if there is a case where we are reducing the security profile, increasing the attack surface, then I think we'd need some way to configure that as a limited / possibly client managed option.

@tych0
Copy link
Contributor Author

tych0 commented Dec 2, 2024

In the case where we are increasing the security profile I could see the client being ok with that, but if there is a case where we are reducing the security profile, increasing the attack surface, then I think we'd need some way to configure that as a limited / possibly client managed option.

I appreciate the paranoia. Can you elaborate on the threat model here?

IIUC, NRI plugins speak to the containerd socket, which effectively makes people root (they can ask for a privileged container, mknod the blockdev the host rootfs is on, etc. etc.). The ownership for the containerd socket today is root:root (though it may be relaxed in e.g. containerd/containerd#10454), so you have to be root to do anything here.

If that's the case, the NRI plugin already has root on the box, what is protected against by introducing access control at this level?

@mikebrow
Copy link
Member

mikebrow commented Dec 2, 2024

The plugin is an extension of the containerd/crio daemon, let's leave out the rootless conversation as I'm not talking about a concern for what the container runtime or plugin has access to. Same with the socket, which we can and may at some point secure to known/verified client/server components having predefined certs.

So the threat I'm concerned with is relaxing the requested security context/profiles of a pod/container. I see you have not hit adjustments for the podsandbox, yet.

The pods/containers, or course, are not an extension of the container runtime... they are merely invoked and managed by the runtime via the shim/conmon and possibly runc/crun engines down one more level. When the client, kubelet, requests a container that is not privileged it is expecting that to be the case. Same with the other kubernetes security context/profile options. These pods/containers may be "unknown" to the host/root admin, installed by users updated by vendors of the containers. Sure in some on metal enterprise/standard user environments they may want to take risks and allow all edits by their plugins. Still in other cloud like environments running multi-tenant or just single tenant clusters the containers may not be trusted fully.

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

The pod sandbox security context via CRI with these security profiles:
https://github.com/containerd/containerd/blob/main/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api.pb.go#L1040-L1081
The container security context:
https://github.com/containerd/containerd/blob/main/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api.pb.go#L4020-L4098

In the case of this seccomp adjustment there are certain profiles that we should be able to adjust out of hand, like "runtime/default" Here the plugin being an extension of the container runtime it may, IMO set the default for all or on a per container basis even. However if "unconfined" "" or "localhost/" there is the expectation by the kubelet that we will abide.

@mikebrow
Copy link
Member

mikebrow commented Dec 2, 2024

So in summary, would like to talk to the team on this and possibly allow adjustments to the seccomp profile if we are configured for allowing SecurityContextAdjustments = {"restricted", "allowed", "limited" // to adjust only runtime/defaults}

@tych0
Copy link
Contributor Author

tych0 commented Dec 2, 2024

Sure in some on metal enterprise/standard user environments they may want to take risks and allow all edits by their plugins.

But I am confused here: plugins require root on the host already, so plugins can necessarily mess with configuration (e.g. just overwriting a localhost/default.json profile with a new one they like better).

Still in other cloud like environments running multi-tenant or just single tenant clusters the containers may not be trusted fully.

I agree that we do not want to trust the containers, and any such modification done by an NRI plugin should be done with care. But this is about trusting the NRI plugins themselves, who are already root, IIUC.

When the client, kubelet, requests a container that is not privileged it is expecting that to be the case. Same with the other kubernetes security context/profile options. These pods/containers may be "unknown" to the host/root admin, installed by users updated by vendors of the containers.

Agreed: the host admin needs to be very careful what NRI plugins it allows to be installed in light of the fact that they're running untrusted code on their systems, and these NRI plugins effectively have root access and can grant it to other applications. But NRI plugins should surely have the capability to change these things if they want.

@mikebrow
Copy link
Member

mikebrow commented Dec 2, 2024

fyi.. https://github.com/containerd/containerd/tree/main/contrib/seccomp not sure if you saw this code.. note the per platform each having their own defaults.

@mikebrow
Copy link
Member

mikebrow commented Dec 2, 2024

But I am confused here: plugins require root on the host already, so plugins can necessarily mess with configuration (e.g. just overwriting a localhost/default.json profile with a new one they like better).

Yes some aspects of the security context of the pods/containers are modifiable via web hook controllers, install scripts, .... And gradually over time the editable files will be encrypted, such as secrets, maybe also security profiles, ...

Just because you "could" hack the system files with root access doesn't mean we should put a service api together that makes it even easier, and seemingly approved.

I agree that we do not want to trust the containers, and any such modification done by an NRI plugin should be done with care. But this is about trusting the NRI plugins themselves, who are already root, IIUC.

Negative on this is about trusting the nri plugins, I'm not talking about trusting the plugins or not, I presume only trusted plugins will be installed. I'm talking about whether the plugins/container runtime should be "modifying" the confined security context requested by the pod spec, and if it is modified, modified first by a pod spec mutating controller or kubelet itself. Things will get confusing if plugins are modifying a predefined, expected/required, security context, particularly if the modification is allowing additional sys calls.

Agreed: the host admin needs to be very careful what NRI plugins it allows to be installed in light of the fact that they're running untrusted code on their systems, and these NRI plugins effectively have root access and can grant it to other applications. But NRI plugins should surely have the capability to change these things if they want.

Admins should only install trusted plugins. I agree 100% that trusted plugins should be able to change these security context profiles, so long as the changes are allowed as a part of the security design for pods/containers.

FYI the e2e testing code and kubelet client should be able to verify adherence to requested profiles for the created pods/containers.

@klihub
Copy link
Member

klihub commented Dec 20, 2024

@tych0 @mikebrow We had a discussion related to this with @samuelkarp and @kad and there are a few ideas how to move this forward. I try to summarize my understanding, others can chime in as needed, especially in case I misinterpreted or misrepresent some of the ideas.

  1. There is a general consensus that we need to add to NRI the ability to administratively lock down some of the container parameters a plugin can mutate.
  2. There seems to be a consensus that, ideally this would not be a single global configuration that applies to all plugins. Instead we should allow different plugins, or rather plugins with different levels of trust, to have a different configuration in this regard, trusted ones allowed to mutate more than less trusted ones.
  3. This is not only about future extensions or pending PRs to NRI. Some of the existing capabilties, hook injection in particular, should be possible to lock down.
  4. It is not immediately obvious what would be the best mechanism to authenticate a plugin to establish its level of trust, although there are some initial ideas.

Based on this, my suggestion would be to

  1. First, implement global control for locking down parts of the container mutation capabilities.
  2. Of the current feature set at least, hook injection would become lockable, but it would continue to default to be unlocked for compatibility.
  3. Once we add per plugin (or trust level) configurability, the globals will apply to plugins which do not authenticate (or otherwise establish their identity whichever way we'll come up for that) to establish a non-default trust level.
  4. We make the new capabilities introduced by this PR and the corresponding namespace mutating one (api: add namespace adjustment #124) lockable by configuration.

@mikebrow Does this sound like an acceptable compromise to you ?
@tych0 Would this work for you ?
@samuelkarp @kad Is this an adequate summary of what we have discussed ?

@tych0
Copy link
Contributor Author

tych0 commented Dec 20, 2024

That works for me, though I don't currently have the bandwidth myself to implement any such security model.

@klihub
Copy link
Member

klihub commented Dec 25, 2024

That works for me, though I don't currently have the bandwidth myself to implement any such security model.

No worries, we can try to cook up something for that, if all involved parties agree that it is the way forward.

@mikebrow
Copy link
Member

mikebrow commented Jan 2, 2025

@tych0 @mikebrow We had a discussion related to this with @samuelkarp and @kad and there are a few ideas how to move this forward. I try to summarize my understanding, others can chime in as needed, especially in case I misinterpreted or misrepresent some of the ideas.

  1. There is a general consensus that we need to add to NRI the ability to administratively lock down some of the container parameters a plugin can mutate.
  2. There seems to be a consensus that, ideally this would not be a single global configuration that applies to all plugins. Instead we should allow different plugins, or rather plugins with different levels of trust, to have a different configuration in this regard, trusted ones allowed to mutate more than less trusted ones.
  3. This is not only about future extensions or pending PRs to NRI. Some of the existing capabilties, hook injection in particular, should be possible to lock down.
  4. It is not immediately obvious what would be the best mechanism to authenticate a plugin to establish its level of trust, although there are some initial ideas.

Based on this, my suggestion would be to

  1. First, implement global control for locking down parts of the container mutation capabilities.
  2. Of the current feature set at least, hook injection would become lockable, but it would continue to default to be unlocked for compatibility.
  3. Once we add per plugin (or trust level) configurability, the globals will apply to plugins which do not authenticate (or otherwise establish their identity whichever way we'll come up for that) to establish a non-default trust level.
  4. We make the new capabilities introduced by this PR and the corresponding namespace mutating one (api: add namespace adjustment #124) lockable by configuration.

@mikebrow Does this sound like an acceptable compromise to you ? @tych0 Would this work for you ? @samuelkarp @kad Is this an adequate summary of what we have discussed ?

Yes, with the understanding that configuration will import the concept of client override. Eg. a rule set: securityProfileEditRule = {defaultContainerRuntimeSecurityProfileEditsAllowed, noSecurityProfileEdits, allSecurityProfileConfinementsAllowed, allSecurityProfileEditsAllowed }

@samuelkarp
Copy link
Member

@klihub Thanks, your summary is pretty accurate from our discussion.

Some comments from the earlier posts in this thread (and thanks for bearing with me in my delayed response here):

Can you elaborate on the threat model here?

@tych0 @mikebrow I think there's some confusion as to what our threat model is, and how we should think about the different participants. I want to define a couple terms since I think these are unclear:

  • Kubernetes user - the author of a pod spec, responsible for causing a pod to run via the Kubernetes API (or kubectl, etc)
  • Kubernetes cluster administrator - the party responsible for configuring the cluster (including policy), nodes, container runtime, etc
  • container runtime - the privileged process responsible for actual container configuration/creation/lifecycle management at a low level on the node
  • NRI plugin - a privileged process that can mutate container configuration, invoked by the container runtime

Our existing documentation about security considerations for NRI plugins is here. Note that we describe:

  1. an NRI plugin is effectively "part of" the container runtime
  2. because of (1), NRI plugins are highly privileged, and the same considerations for protecting the container runtime's own configuration applies to NRI plugins

I take this to imply that NRI plugins should only be configured by a Kubernetes cluster administrator at this point and not by a Kubernetes user. From that perspective, the cluster administrator should be in charge of policy and responsible for whether a plugin is allowed to make a particular modification.

let's leave out the rootless conversation

@mikebrow I don't think rootless was mentioned; this seems like possibly a source of confusion for you?

When the client, kubelet, requests a container that is not privileged it is expecting that to be the case. Same with the other kubernetes security context/profile options.

@mikebrow To reframe this using the terms I defined, it sounds like you're asking about the expectation a Kubernetes user (pod spec author) has for the permissions a given pod is granted. I would assert that the cluster administrator should be able to decide whether that requested permission boundary is appropriate or should be modified.

However if "unconfined" "" or "localhost/" there is the expectation by the kubelet that we will abide.
[...]
Things will get confusing if plugins are modifying a predefined, expected/required, security context, particularly if the modification is allowing additional sys calls.

I'm not entirely sure I understand exactly what your concern is. There are a bunch of adjustments that NRI plugins can already make from the requested pod spec to the realized OCI bundle config (plus the OCI bundle isn't 1:1 with the pod spec anyway, as the pod spec does not model all possible configuration). Today, if the pod spec requests mounts, NRI plugins can remove mounts, modify mounts, or add mounts. I don't really know that this is different from security context. Either way, the cluster administrator must understand what actions the NRI plugin is taking and how that affects the requested pod spec.

Yes, with the understanding that configuration will import the concept of client override. Eg. a rule set: securityProfileEditRule = {defaultContainerRuntimeSecurityProfileEditsAllowed, noSecurityProfileEdits, allSecurityProfileConfinementsAllowed, allSecurityProfileEditsAllowed }

@mikebrow Are you suggesting that the Kubernetes user (pod spec author) can override, or that the cluster administrator should override?

@vinayakankugoyal
Copy link

Cluster administrators are often responsible for ensuring the security posture of workloads. Using NRI plugins to do that makes a lot of sense. IMHO NRI plugins are as privileged as container runtime because like @samuelkarp said they can modify the container beyond what the kubernetes user originally set. This is actually why they are so useful from a security and admin perspective because you can enforce the same security baselines no matter what a user sets.

@mikebrow
Copy link
Member

mikebrow commented Jan 6, 2025

Can you elaborate on the threat model here?

@tych0 @mikebrow I think there's some confusion as to what our threat model is, and how we should think about the different participants. I want to define a couple terms since I think these are unclear:

  • Kubernetes user - the author of a pod spec, responsible for causing a pod to run via the Kubernetes API (or kubectl, etc)
  • Kubernetes cluster administrator - the party responsible for configuring the cluster (including policy), nodes, container runtime, etc

Mike: nit... while kubernetes could be argued as the primary CRI client.. there are a raft of other CRI clients.

  • container runtime - the privileged process responsible for actual container configuration/creation/lifecycle management at a low level on the node
  • NRI plugin - a privileged process that can mutate container configuration, invoked by the container runtime

Mike: ...configuration. These plugins are invoked by and run as an extension of the container runtime where they are also responsible for adhering to the CRI pod/container security context and other contract obligations.

I take this to imply that NRI plugins should only be configured by a Kubernetes cluster administrator at this point and not by a Kubernetes user. From that perspective, the cluster administrator should be in charge of policy and responsible for whether a plugin is allowed to make a particular modification.

Mike: sort of .. If by administrator you mean mutating controller / web hook edits to the pod spec employed by administrators/cloud providers, other kubelet level controllers, and "user" provided pod spec configuration (note here that a pod spec may already have a security profile created by a user.

let's leave out the rootless conversation

@mikebrow I don't think rootless was mentioned;

Correct it wasn't mentioned.

this seems like possibly a source of confusion for you?

Mike: I merely point out that we are not always running as root. Of course @AkihiroSuda is our expert here, I'm not sure if he's considered yet, the NRI plugin setup in the case all the kubelet node components running as rootless. No need to boil the ocean though.

When the client, kubelet, requests a container that is not privileged it is expecting that to be the case. Same with the other kubernetes security context/profile options.

@mikebrow To reframe this using the terms I defined, it sounds like you're asking about the expectation a Kubernetes user (pod spec author) has for the permissions a given pod is granted. I would assert that the cluster administrator should be able to decide whether that requested permission boundary is appropriate or should be modified.

Mike: I'm saying the security context is a CRI contract. This context can be set by the pod user, administrator through controllers and kubelet/kind/rancher config, or by cloud providers. It really depends on the deployment and configuration of Kubernetes or the kubernetes like client tooling using the CRI.

Mike: Since the beginning of CRI the expectation/contract has been that the CRI implementor will adhere to the security context. When portions of said context defer to the container runtime, we have the opportunity to manage our container runtime defaults, thus why containerd and crio have different default profiles, with crio being generally more restrictive and containerd default generally following the trad. docker defaults.

However if "unconfined" "" or "localhost/" there is the expectation by the kubelet that we will abide.
[...]
Things will get confusing if plugins are modifying a predefined, expected/required, security context, particularly if the modification is allowing additional sys calls.

I'm not entirely sure I understand exactly what your concern is. There are a bunch of adjustments that NRI plugins can already make from the requested pod spec to the realized OCI bundle config (plus the OCI bundle isn't 1:1 with the pod spec anyway, as the pod spec does not model all possible configuration). Today, if the pod spec requests mounts, NRI plugins can remove mounts, modify mounts, or add mounts. I don't really know that this is different from security context. Either way, the cluster administrator must understand what actions the NRI plugin is taking and how that affects the requested pod spec.

Mike: we can't just dump it all on the admin to make a decision to override pod security or not support a device... We have no way to expose edit pattern information back through to the admin at this point or to explain away the complexities wrt what pod security context fields we will be overriding in a plugin.

Mike: On the mounts point/question, mounts are implemented in view of, affected by, the pod security context object settings provided via the CRI, high level description: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod
and lower level data structure fields here:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#podsecuritycontext-v1-core

Yes, with the understanding that configuration will import the concept of client override. Eg. a rule set: securityProfileEditRule = {defaultContainerRuntimeSecurityProfileEditsAllowed, noSecurityProfileEdits, allSecurityProfileConfinementsAllowed, allSecurityProfileEditsAllowed }

@mikebrow Are you suggesting that the Kubernetes user (pod spec author) can override, or that the cluster administrator should override?

Mike: The pod spec author can set/override the pod security context objects/pod specs, and yes the cluster administrator/cloud provider admins can also override pod security context via mutating controller/web hooks. Yes, while the default security context in kubernetes is to let the container runtime decide due to historical reasons, it is also highly recommended, that explicit security contexts are specified for the pods. What I'm saying here is that there exists a "default" container runtime pod security context that is used explicitly when the CRI client does not provide one, and when that happens is when we should be allowing nri plugins to do their thing wrt mutating our container runtime default security context "defaultContainerRuntimeSecurityProfileEditsAllowed." If we allow nri to mutate an explicitly provided security context we are going to certainly cause problems and we should try to restrict those problems to we tried to "tighten" the security profile and thus the application failed with an insufficient permission error.. "allSecurityProfileConfinementsAllowed."

@klihub
Copy link
Member

klihub commented Feb 7, 2025

Ditto here as in #124 (comment). PTAL.

@alban
Copy link

alban commented May 21, 2025

What happens if the NRI plugin modifies the seccomp profile in a way which is not supported by runc?

Should the NRI plugin call "runc features" or check the ociVersion field in config.json?

xref inspektor-gadget/inspektor-gadget#4458 (comment)
cc @rata @mauriciovasquezbernal

@rata
Copy link

rata commented May 21, 2025

I think the output of "runc features" should be provided to the plugin, as that is the only reliable way to know how to act. It's not only relevant for seccomp, but other features (like idmap mounts, etc.).

@klihub
Copy link
Member

klihub commented Jun 12, 2025

@tych0 We should rebase this on latest main/HEAD and add configurable lockdown of seccomp adjustment via the default validator. I have taken a look at what that would require and the result is available on this branch. If you'd like me to (and are okay with the content of that branch), I can update this PR by pushing that to your source branch.

@tych0
Copy link
Contributor Author

tych0 commented Jun 12, 2025

Yeah, your rebase + validator looks great, thanks for working on it!

@klihub klihub force-pushed the seccomp-adjustment branch 2 times, most recently from 2382bf8 to f494878 Compare June 12, 2025 14:23
@klihub klihub requested a review from djdongjin June 12, 2025 14:27
Copy link
Member

@djdongjin djdongjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment

Enable bool `yaml:"enable" toml:"enable"`
// RejectOCIHooks fails validation if any plugin injects OCI hooks.
RejectOCIHooks bool `yaml:"rejectOCIHooks" toml:"reject_oci_hooks"`
// RejectSeccompPolicy fails validation if any plugin modifies seccomp policy.
Copy link
Member

@mikebrow mikebrow Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reject we are looking for here for seccomp is going to be more subtle than on or off.

See: https://github.com/containerd/containerd/blob/main/internal/cri/sputil/seccomp_linux.go

Basically the only way a container runtime can set an seccomp policy is if the container is not privileged and then also based on a container runtimes configuration there may be a switch enabling / disabling whether seccomp policies are applied.

Further, with respect to client use of the CRI, the client (the kubelet for example) may either request the container runtime use it's own default container runtime seccomp policy or it may specify an override via a file path to an seccomp profile, *** or it may demand seccomp be unconfined.. see runtime.SecurityProfile_Unconfined.

So... if the kubelet demands seccomp be unconfined.. or if it demands the container be privileged or if the kubelet demands a custom profile be used.. then we need the "default" here to be reject modification of the containers seccomp policy.

The default allow case should be solely for when the kubelet has requested a runtime.SecurityProfile_RuntimeDefault. In that case the plugin should be able to extend the container runtime default, unless the configuration switch is on to RejectContainerRuntimeDefaultSeccompProfileEdits.. default false.
To cover the editing of kubelet requested custom seccomp profiles perhaps AllowEditsToCustomSeccompProfiles.. default false. But I think editing unconfined/priviledged is off limits, or at least would require boolean override to confine a container explicitly defined as unconfined.

... and we need to discuss the container runtime seccomp enabled/disabled flag.. can the plugin override that disable flag?

cc @klihub @samuelkarp @chrishenzie

Copy link
Member

@klihub klihub Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikebrow @chrishenzie @samuelkarp We have more fine-grained validation-based restrictions coming down the pipeline, once we sort out and agree about all the necessary relate details for authorization, so I was hoping that we could pull this in yet start simple: add for now global enable/disable control for this (and the accompanying Linux Namespace adjustment control), and then take a look at making it more fine-grained at the same time when this will become possible for all other restrictable controls.

Copy link
Member

@klihub klihub Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reject we are looking for here for seccomp is going to be more subtle than on or off.

See: https://github.com/containerd/containerd/blob/main/internal/cri/sputil/seccomp_linux.go

Basically the only way a container runtime can set an seccomp policy is if the container is not privileged and then also based on a container runtimes configuration there may be a switch enabling / disabling whether seccomp policies are applied.

Further, with respect to client use of the CRI, the client (the kubelet for example) may either request the container runtime use it's own default container runtime seccomp policy or it may specify an override via a file path to an seccomp profile, *** or it may demand seccomp be unconfined.. see runtime.SecurityProfile_Unconfined.

So... if the kubelet demands seccomp be unconfined.. or if it demands the container be privileged or if the kubelet demands a custom profile be used.. then we need the "default" here to be reject modification of the containers seccomp policy.

The default allow case should be solely for when the kubelet has requested a runtime.SecurityProfile_RuntimeDefault. In that case the plugin should be able to extend the container runtime default, unless the configuration switch is on to RejectContainerRuntimeDefaultSeccompProfileEdits.. default false. To cover the editing of kubelet requested custom seccomp profiles perhaps AllowEditsToCustomSeccompProfiles.. default false. But I think editing unconfined/priviledged is off limits, or at least would require boolean override to confine a container explicitly defined as unconfined.

@mikebrow Okay, so let me sum this up by paraphrasing to see if I get your view on this correctly:

  1. By default, a plugin should only be able to adjust seccomp policies if the CRI-requested seccomp profile (type) was RuntimeDefault.
  2. We might want to add a control config flag to disallow this.
  3. To allow adjusting any (other confined) profile, we should add a corresponding control config flag (with a default to false).

... and we need to discuss the container runtime seccomp enabled/disabled flag.. can the plugin override that disable flag?

I think a plugin should not be able to do that... not at that granularity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reject we are looking for here for seccomp is going to be more subtle than on or off.
See: https://github.com/containerd/containerd/blob/main/internal/cri/sputil/seccomp_linux.go
Basically the only way a container runtime can set an seccomp policy is if the container is not privileged and then also based on a container runtimes configuration there may be a switch enabling / disabling whether seccomp policies are applied.
Further, with respect to client use of the CRI, the client (the kubelet for example) may either request the container runtime use it's own default container runtime seccomp policy or it may specify an override via a file path to an seccomp profile, *** or it may demand seccomp be unconfined.. see runtime.SecurityProfile_Unconfined.

So... if the kubelet demands seccomp be unconfined.. or if it demands the container be privileged or if the kubelet demands a custom profile be used.. then we need the "default" here to be reject modification of the containers seccomp policy.
The default allow case should be solely for when the kubelet has requested a runtime.SecurityProfile_RuntimeDefault. In that case the plugin should be able to extend the container runtime default, unless the configuration switch is on to RejectContainerRuntimeDefaultSeccompProfileEdits.. default false. To cover the editing of kubelet requested custom seccomp profiles perhaps AllowEditsToCustomSeccompProfiles.. default false. But I think editing unconfined/priviledged is off limits, or at least would require boolean override to confine a container explicitly defined as unconfined.

@mikebrow Okay, so let me sum this up by paraphrasing to see if I get your view on this correctly:

  1. By default, a plugin should only be able to adjust seccomp policies if the CRI-requested seccomp profile (type) was RuntimeDefault.

Yes. (legacy note we also had docker default, deprecated, at one point and I believe that is no longer supported needs verification for both runtime, I think that variation is only left in tests now.. )

  1. We might want to add a control config flag to disallow this.

nod

  1. To allow adjusting any (other confined) profile, we should add a corresponding control config flag (with a default to false).

Agree

... and we need to discuss the container runtime seccomp enabled/disabled flag.. can the plugin override that disable flag?

nod needs discussion

I think a plugin should not be able to do that... not at that granularity.

Agree! Overriding config.toml settings, dynamically / statically - is a different type of granularity with repercussions on running objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reject we are looking for here for seccomp is going to be more subtle than on or off.
See: https://github.com/containerd/containerd/blob/main/internal/cri/sputil/seccomp_linux.go
Basically the only way a container runtime can set an seccomp policy is if the container is not privileged and then also based on a container runtimes configuration there may be a switch enabling / disabling whether seccomp policies are applied.
Further, with respect to client use of the CRI, the client (the kubelet for example) may either request the container runtime use it's own default container runtime seccomp policy or it may specify an override via a file path to an seccomp profile, *** or it may demand seccomp be unconfined.. see runtime.SecurityProfile_Unconfined.

So... if the kubelet demands seccomp be unconfined.. or if it demands the container be privileged or if the kubelet demands a custom profile be used.. then we need the "default" here to be reject modification of the containers seccomp policy.
The default allow case should be solely for when the kubelet has requested a runtime.SecurityProfile_RuntimeDefault. In that case the plugin should be able to extend the container runtime default, unless the configuration switch is on to RejectContainerRuntimeDefaultSeccompProfileEdits.. default false. To cover the editing of kubelet requested custom seccomp profiles perhaps AllowEditsToCustomSeccompProfiles.. default false. But I think editing unconfined/priviledged is off limits, or at least would require boolean override to confine a container explicitly defined as unconfined.

@mikebrow Okay, so let me sum this up by paraphrasing to see if I get your view on this correctly:

  1. By default, a plugin should only be able to adjust seccomp policies if the CRI-requested seccomp profile (type) was RuntimeDefault.

Yes. (legacy note we also had docker default, deprecated, at one point and I believe that is no longer supported needs verification for both runtime, I think that variation is only left in tests now.. )

  1. We might want to add a control config flag to disallow this.

nod

  1. To allow adjusting any (other confined) profile, we should add a corresponding control config flag (with a default to false).

Agree

... and we need to discuss the container runtime seccomp enabled/disabled flag.. can the plugin override that disable flag?

nod needs discussion

I think a plugin should not be able to do that... not at that granularity.

Agree! Overriding config.toml settings, dynamically / statically - is a different type of granularity with repercussions on running objects.

@tych0 Would that be good enough for you guys as a starter ?

@klihub klihub force-pushed the seccomp-adjustment branch from f494878 to 8f38023 Compare July 14, 2025 13:22
@klihub klihub requested a review from mikebrow July 14, 2025 13:22
@klihub
Copy link
Member

klihub commented Jul 14, 2025

@mikebrow I have added fine-grained validation for seccomp policy/profile adjustment as you requested.

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just a couple nits

one thing that probably needs better explanation is LOCALHOST vs Custom whenever custom is introduced or LOCALHOST .. needs to explain these are currently files located on the local host used by the container runtime to load a profile file.. that specifies the seccomp profile to be loaded for the container...

@mikebrow mikebrow added the v.next to be merged into the next release label Jul 14, 2025
klihub and others added 5 commits July 14, 2025 18:45
Pass information about CRI-requested container seccomp security
profile information as input to plugins.

Co-authored-by: Mike Brown <[email protected]>
Signed-off-by: Krisztian Litkey <[email protected]>
This adds an adjustment for seccomp policies. The intent is that people can
wholesale replace policies, or parse them, make some changes, and then send
them back. Sending them *to* NRI via containerd requires some containerd
patches as well, those are here: https://github.com/tych0/containerd/commits/nri-seccomp/

Specifically, we are interested in making the listenerPath of the policy
dynamic based on a k8s pod spec, so we can't use the Localhost custom
policy (well, we can use most of it, except for listenerPath, which we have
an NRI plugin to change based on this code).

This patch is a lot of boilerplate, which is unfortunate. There is a much
smaller but similar patch:
a70547a
but it involves directly serializing a runtime-spec string

Finally, note the comment in generate.go: the runtime-tools generate code
does not have complete coverage for seccomp stuff, so I opted to not use
any of it, vs. adding more stuff to runtime-tools. The fact that there are
human and computer names is also confusing, it seems like we should stick
to the computer names for this particular interface.

Signed-off-by: Tycho Andersen <[email protected]>
Implement configurable restrictions for linux seccomp policy
adjustment in the default validator.

Co-authored-by: Mike Brown <[email protected]>
Signed-off-by: Krisztian Litkey <[email protected]>
Add tests for seccomp policy adjustments and restricting them in
the default validator.

Signed-off-by: Krisztian Litkey <[email protected]>
Co-authored-by: Mike Brown <[email protected]>
Signed-off-by: Krisztian Litkey <[email protected]>
@klihub klihub force-pushed the seccomp-adjustment branch from 8f38023 to 8f2af44 Compare July 14, 2025 15:52
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mikebrow mikebrow merged commit 7de1160 into containerd:main Jul 14, 2025
7 checks passed
@tych0
Copy link
Contributor Author

tych0 commented Jul 14, 2025

late to the party, but this seems to work for us as well, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v.next to be merged into the next release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants