Skip to content

Conversation

@z63d
Copy link
Contributor

@z63d z63d commented Apr 18, 2025

Description

I think it is warn in runc.
https://github.com/opencontainers/runc/blob/159c67f8e2233b6cb84a203dc47ed182381b666c/libcontainer/capabilities/capabilities.go#L153

config.json
{
  "ociVersion": "1.0.2-dev",
  "root": {
    "path": "rootfs",
    "readonly": true
  },
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc"
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": ["nosuid", "strictatime", "mode=755", "size=65536k"]
    },
    {
      "destination": "/dev/pts",
      "type": "devpts",
      "source": "devpts",
      "options": [
        "nosuid",
        "noexec",
        "newinstance",
        "ptmxmode=0666",
        "mode=0620",
        "gid=5"
      ]
    },
    {
      "destination": "/dev/shm",
      "type": "tmpfs",
      "source": "shm",
      "options": ["nosuid", "noexec", "nodev", "mode=1777", "size=65536k"]
    },
    {
      "destination": "/dev/mqueue",
      "type": "mqueue",
      "source": "mqueue",
      "options": ["nosuid", "noexec", "nodev"]
    },
    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": ["nosuid", "noexec", "nodev", "ro"]
    },
    {
      "destination": "/sys/fs/cgroup",
      "type": "cgroup",
      "source": "cgroup",
      "options": ["nosuid", "noexec", "nodev", "relatime", "ro"]
    }
  ],
  "process": {
    "terminal": false,
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": ["sh"],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "TERM=xterm"
    ],
    "cwd": "/",
    "capabilities": {
      "bounding": ["CAP_KILL", "CAP_NET_BIND_SERVICE", "CAP_AUDIT_WRITE"],
      "effective": ["CAP_KILL", "CAP_NET_BIND_SERVICE", "CAP_AUDIT_WRITE"],
      "inheritable": ["CAP_KILL", "CAP_NET_BIND_SERVICE", "CAP_AUDIT_WRITE"],
      "permitted": ["CAP_KILL", "CAP_NET_BIND_SERVICE", "CAP_AUDIT_WRITE"],
      "ambient": [
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_CHOWN"
      ]
    },
    "rlimits": [
      {
        "type": "RLIMIT_NOFILE",
        "hard": 1024,
        "soft": 1024
      }
    ],
    "noNewPrivileges": true
  },
  "hostname": "youki",
  "annotations": {},
  "linux": {
    "resources": {
      "devices": []
    },
    "namespaces": [
      {
        "type": "pid"
      },
      {
        "type": "network"
      },
      {
        "type": "ipc"
      },
      {
        "type": "uts"
      },
      {
        "type": "mount"
      },
      {
        "type": "cgroup"
      }
    ],
    "maskedPaths": [
      "/proc/acpi",
      "/proc/asound",
      "/proc/kcore",
      "/proc/keys",
      "/proc/latency_stats",
      "/proc/timer_list",
      "/proc/timer_stats",
      "/proc/sched_debug",
      "/sys/firmware",
      "/proc/scsi"
    ],
    "readonlyPaths": [
      "/proc/bus",
      "/proc/fs",
      "/proc/irq",
      "/proc/sys",
      "/proc/sysrq-trigger"
    ]
  }
}

Before

$ sudo ./youki create -b mycontainer test
(...)
ERROR libcontainer::capabilities: failed to set ambient capabilities: failed to set capabilities: caps error: PR_CAP_AMBIENT_RAISE failure: Operation not permitted (os error 1)
(...)

$ echo $?
0

$ sudo ./youki list
DEBUG youki: started by user 0 with ArgsOs { inner: ["./youki", "list"] }
ID    PID    STATUS   BUNDLE                                  CREATED                    CREATOR
test  11933  Created  /home/kaita_nakamura/youki/mycontainer  2025-04-18T22:36:59+00:00  root

After

$ sudo ./youki create -b mycontainer test
(...)
 WARN libcontainer::capabilities: failed to set ambient capabilities: failed to set capabilities: caps error: PR_CAP_AMBIENT_RAISE failure: Operation not permitted (os error 1)
(...)

$ echo $?
0

$ sudo ./youki list
DEBUG youki: started by user 0 with ArgsOs { inner: ["./youki", "list"] }
ID    PID   STATUS   BUNDLE                                  CREATED                    CREATOR
test  7700  Created  /home/kaita_nakamura/youki/mycontainer  2025-04-18T22:10:13+00:00  root

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test updates
  • CI/CD related changes
  • Other (please describe):

Testing

  • Added new unit tests
  • Added new integration tests
  • Ran existing test suite
  • Tested manually (please provide steps)

Related Issues

#3132

Additional Context

Signed-off-by: Kaita Nakamura <[email protected]>
@YJDoc2
Copy link
Collaborator

YJDoc2 commented Apr 21, 2025

Hey @z63d , thanks for the PR. So two things -

  1. please remove the fixes part from desc, because otherwise it'll close the issue after merging this, and there still one part of the issue not addressed in this PR.
  2. can you add a comment before the warn log explaining why we do not error and instead only warn.

Also, I think we need to check this error in a bit more detail. The comment in runc mentions they are not erroring because backward compatibility, and ignoring einval for older kernels, but here we are getting operation not permitted. This can be a problem of youki where we are not getting correct permissions. I think we need to check the caps to the youki process (as shown here https://unix.stackexchange.com/questions/754881/after-executing-setcap-why-i-still-cant-use-tar-and-got-an-error-operation-no) to see if this error is because even with sudo we do not have correct caps.

Basically my intention behind fixing the issue was to check why we are getting the error in first place (is user's caps incorrect, is the config itself is incorrect and so on) and then depending on that change the log level. It is possible that config itself might be incorrect (and runc is ignoring it because of the comment you mentioned) in which case we should keep the level as error, but change the message to mention we are ignoring the error etc.

@z63d
Copy link
Contributor Author

z63d commented Apr 24, 2025

The reason runc is warn seems to be related to supporting kernels older than 4.3 (which don't support ambient caps).
ref:
https://github.com/opencontainers/runc/blob/v1.3.0-rc.2/libcontainer/SPEC.md#system-requirements-and-compatibility
opencontainers/runc#4358 (comment)
opencontainers/runc#4597

youki supports linux kernel ≥ 5.3, so error is appropriate instead of warn for this.
https://github.com/youki-dev/youki/tree/v0.5.3?tab=readme-ov-file#requires
operation not permitted means there is a problem with the bounding set, permitted, and inheritable settings.
For example, the following will of course be successful:

config.json
{
  "ociVersion": "1.0.2-dev",
  "root": {
    "path": "rootfs",
    "readonly": true
  },
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc"
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": ["nosuid", "strictatime", "mode=755", "size=65536k"]
    },
    {
      "destination": "/dev/pts",
      "type": "devpts",
      "source": "devpts",
      "options": [
        "nosuid",
        "noexec",
        "newinstance",
        "ptmxmode=0666",
        "mode=0620",
        "gid=5"
      ]
    },
    {
      "destination": "/dev/shm",
      "type": "tmpfs",
      "source": "shm",
      "options": ["nosuid", "noexec", "nodev", "mode=1777", "size=65536k"]
    },
    {
      "destination": "/dev/mqueue",
      "type": "mqueue",
      "source": "mqueue",
      "options": ["nosuid", "noexec", "nodev"]
    },
    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": ["nosuid", "noexec", "nodev", "ro"]
    },
    {
      "destination": "/sys/fs/cgroup",
      "type": "cgroup",
      "source": "cgroup",
      "options": ["nosuid", "noexec", "nodev", "relatime", "ro"]
    }
  ],
  "process": {
    "terminal": false,
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": ["sh"],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "TERM=xterm"
    ],
    "cwd": "/",
    "capabilities": {
      "bounding": [
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_CHOWN"
      ],
      "effective": ["CAP_KILL", "CAP_NET_BIND_SERVICE", "CAP_AUDIT_WRITE"],
      "inheritable": [
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_CHOWN"
      ],
      "permitted": [
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_CHOWN"
      ],
      "ambient": [
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_CHOWN"
      ]
    },
    "rlimits": [
      {
        "type": "RLIMIT_NOFILE",
        "hard": 1024,
        "soft": 1024
      }
    ],
    "noNewPrivileges": true
  },
  "hostname": "youki",
  "annotations": {},
  "linux": {
    "resources": {
      "devices": []
    },
    "namespaces": [
      {
        "type": "pid"
      },
      {
        "type": "network"
      },
      {
        "type": "ipc"
      },
      {
        "type": "uts"
      },
      {
        "type": "mount"
      },
      {
        "type": "cgroup"
      }
    ],
    "maskedPaths": [
      "/proc/acpi",
      "/proc/asound",
      "/proc/kcore",
      "/proc/keys",
      "/proc/latency_stats",
      "/proc/timer_list",
      "/proc/timer_stats",
      "/proc/sched_debug",
      "/sys/firmware",
      "/proc/scsi"
    ],
    "readonlyPaths": [
      "/proc/bus",
      "/proc/fs",
      "/proc/irq",
      "/proc/sys",
      "/proc/sysrq-trigger"
    ]
  }
}

If failed to set ambient capabilities occurs, change it so that the container creation fails.

@YJDoc2
Is that okay?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Apr 24, 2025

Hey @z63d thanks for the explanation. Let me think a bit on the implications of this. Currently it would be a breaking change for youki because configs which work fine with this error log would start failing, so this will have to be a breaking change. Additionally, if configs generated by docker/buildx automatically set ambient caps, they might all start to fail too. I'll get back on this.

@z63d
Copy link
Contributor Author

z63d commented Apr 24, 2025

Ideally, it would be better to raise an error, but I think it's okay to temporarily change it to warn to avoid breaking changes and confusing users with logs.
I'll wait for your conclusion.

@z63d
Copy link
Contributor Author

z63d commented May 7, 2025

@YJDoc2
Has there been any progress?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 8, 2025

Hey, so we have two options here -

  1. accept the PR, and just turn down the log level to warn
  2. before setting ambient caps, we first check if kernel we are running on supports ambient caps. If not, we skip setting them at all. If it supports and we encounter an error, we error out completely failing the container creation. The check can be like https://github.com/mhiramat/libcap/blob/master/progs/capsh.c#L319-L323

Can you check if option 2 would be possible, otherwise we can go with option 1.

@z63d
Copy link
Contributor Author

z63d commented May 9, 2025

we first check if kernel we are running on supports ambient caps.

As mentioned here, the kernels supported by youki support ambient caps.
Would it make sense to check that?

@YJDoc2 YJDoc2 added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label May 12, 2025
@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 12, 2025

ok, after thinking a bit, my conclusion is this - we should change the log level to warn and also add something like check if your kernel supports capabilities . The reason is , yes while we officially support pretty new kernels only, a lot of users of containers might not be running those, so actually if we can support them without having to let go of significant development (decreasing log level is ok) we should. Also some other crate which might depend on libcontainer might be supporting older kernels, and our change might leave them with no way to back out without replacing libcontainer completely.

Ideally checking if kernel support caps and then doing this is preferable, but given that our official support is for newer kernel only, and adding the check might require more work, I feel better option is just to lower log level here.

wdyt? If you also agree with my reasoning, I'll approve and merge, as the change is already there.

@z63d
Copy link
Contributor Author

z63d commented May 12, 2025

@YJDoc2
I agree, can you please merge it?

adding the check might require more work

If there is any work required I will contribute again :)

@YJDoc2 YJDoc2 enabled auto-merge May 13, 2025 04:27
@YJDoc2 YJDoc2 merged commit d550f3c into youki-dev:main May 13, 2025
42 of 45 checks passed
@github-actions github-actions bot mentioned this pull request May 13, 2025
@z63d z63d deleted the fix/libcontainer-ambient-capabilities-log branch May 13, 2025 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants