Skip to content

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jul 2, 2021

This does two things

  1. Adds minimal hybrid hierarchy support to cgroup v1 fs driver. It seems it's needed when systemd is used, also makes it possible for cgroup v1 drivers to use the new cgroup.kill kernel API (in case hybrid hierarchy is available). This part is a carry of cgroups/systemd: add cgroup-v2 path to the list when using hybrid mode #2087.

  2. Implements runc exec --cgroup to be able to run a process in a sub-cgroup of a container (as per [RFC] runc exec --cgroup #3040).

Fixes: #3040.
Closes: #2087.

Usage (option for runc exec):

--cgroup path | controller[,controller...]:path

Execute a process in a sub-cgroup. If the specified cgroup does not exist, an
error is returned. Default is empty path, which means to use container's top
level cgroup.

For cgroup v1 only, a particular controller (or multiple comma-separated
controllers) can be specified, and the option can be used multiple times to set
different paths for different controllers.

Note for cgroup v2, in case the process can't join the top level cgroup,
runc exec fallback is to try joining the cgroup of container's init.
This fallback can be disabled by using --cgroup /.

Proposed changelog entry

New features:
 - runc exec --cgroup: an option to specify a (non-top) in-container cgroup
   to use for the process being executed (#3040, #3059)
 - cgroup v1 controllers now support hybrid hierarchy (i.e. when on a cgroup v1
   machine a cgroup2 filesystem is mounted to /sys/fs/cgroup/unified, runc run/exec
   now adds the container to the appropriate cgroup under it) (#2087, #3059)

@kolyshkin

This comment has been minimized.

@kolyshkin kolyshkin force-pushed the cgroup-clean branch 5 times, most recently from 2532b02 to f850fd5 Compare July 12, 2021 23:26
@kolyshkin
Copy link
Contributor Author

# runc exec test_busybox sh -euc ! grep -v ':/$' /proc/self/cgroup (status=1):
# 0::/../../system.slice/runner-provisioner.service

This must be a unified path for hybrid cgroup v1+v2, which we do not handle well. Looks like I see what needs to be fixed (not a problem in this PR).

@kolyshkin
Copy link
Contributor Author

This must be a unified path for hybrid cgroup v1+v2, which we do not handle well. Looks like I see what needs to be fixed (not a problem in this PR).

This is presumably fixed by #2087. Let me check...

@kolyshkin
Copy link
Contributor Author

This is presumably fixed by #2087. Let me check...

Yes, but it requires criu 3.16+, so draft until it's released.

@kolyshkin
Copy link
Contributor Author

Rebased; made the cgroup v1 test case work without cgroupns.

@kolyshkin kolyshkin requested a review from AkihiroSuda August 24, 2021 21:26
@kolyshkin kolyshkin added this to the 1.2.0 milestone Aug 31, 2021
@kolyshkin
Copy link
Contributor Author

Looks like criu 3.16 won't be in time for runc 1.1.0 release, so moving the milestone to 1.2.0.

@kolyshkin
Copy link
Contributor Author

Looks like criu 3.16 won't be in time for runc 1.1.0 release, so moving the milestone to 1.2.0.

Or maybe it will -- if it is, I'll move the milestone back to 1.1.0, as this is quite important (not the feature per se, but the hybrid cgroup entering code from #2087).

@kolyshkin
Copy link
Contributor Author

Rebased; this is supposed to fail until criu PPA will have CRIU 3.16 (I hope soon).

kolyshkin and others added 4 commits September 23, 2021 19:26
No need to have an intermediate variable here.

Signed-off-by: Kir Kolyshkin <[email protected]>
No need to add a file name to the error messages, as errors from
OpenFile and (*os.File).Write both contain the file name already.

Signed-off-by: Kir Kolyshkin <[email protected]>
The function used here, cgroups.EnterPid, silently skips non-existing
paths, and it does not look like a good idea to do so for an existing
container with already configured cgroups.

Switch to cgroups.WriteCgroupProc which does not do that, so in case
a cgroup does not exist, we'll get an error.

Signed-off-by: Kir Kolyshkin <[email protected]>
Currently the parent process of the container is moved to the right
cgroup v2 tree when systemd is using a hybrid model (last line with 0::):

$ runc --systemd-cgroup run myid
/ # cat /proc/self/cgroup
12:cpuset:/system.slice/runc-myid.scope
11:blkio:/system.slice/runc-myid.scope
10:devices:/system.slice/runc-myid.scope
9:hugetlb:/system.slice/runc-myid.scope
8:memory:/system.slice/runc-myid.scope
7:rdma:/
6:perf_event:/system.slice/runc-myid.scope
5:net_cls,net_prio:/system.slice/runc-myid.scope
4:freezer:/system.slice/runc-myid.scope
3:pids:/system.slice/runc-myid.scope
2:cpu,cpuacct:/system.slice/runc-myid.scope
1:name=systemd:/system.slice/runc-myid.scope
0::/system.slice/runc-myid.scope

However, if a second process is executed in the same container, it is
not moved to the right cgroup v2 tree:

$ runc exec myid /bin/sh -c 'cat /proc/self/cgroup'
12:cpuset:/system.slice/runc-myid.scope
11:blkio:/system.slice/runc-myid.scope
10:devices:/system.slice/runc-myid.scope
9:hugetlb:/system.slice/runc-myid.scope
8:memory:/system.slice/runc-myid.scope
7:rdma:/
6:perf_event:/system.slice/runc-myid.scope
5:net_cls,net_prio:/system.slice/runc-myid.scope
4:freezer:/system.slice/runc-myid.scope
3:pids:/system.slice/runc-myid.scope
2:cpu,cpuacct:/system.slice/runc-myid.scope
1:name=systemd:/system.slice/runc-myid.scope
0::/user.slice/user-1000.slice/session-8.scope

This commit makes that processes executed with exec are placed into the
right cgroup v2 tree. The implementation checks if systemd is using a
hybrid mode (by checking if cgroups v2 is mounted in
/sys/fs/cgroup/unified), if yes, the path of the cgroup v2 slice for
this container is saved into the cgroup path list.

The fs group driver has a similar issue, in this case none of the runc
run or runc exec commands put the process in the right cgroups v2. This
commit also fixes that.

Having the processes of the container in its own cgroup v2 is useful
for any BPF programs that rely on bpf_get_current_cgroup_id(), like
https://github.com/kinvolk/inspektor-gadget/ for instance.

[@kolyshkin: rebased]

Signed-off-by: Mauricio Vásquez <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin marked this pull request as ready for review September 24, 2021 02:40
@kolyshkin kolyshkin modified the milestones: 1.2.0, 1.1.0 Sep 24, 2021
@h-vetinari
Copy link

Should this have a changelog entry?

@kolyshkin
Copy link
Contributor Author

Should this have a changelog entry?

@h-vetinari added, PTAL

Check that runc run and runc exec put the process on the same cgroups v2
when using hybrid mode.

Signed-off-by: Mauricio Vásquez <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
In some setups, multiple cgroups are used inside a container,
and sometime there is a need to execute a process in a particular
sub-cgroup (in case of cgroup v1, for a particular controller).
This is what this commit implements.

Signed-off-by: Kir Kolyshkin <[email protected]>
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My only minor concern is that someone will think that --cgroup applies to the host cgroup paths, but since it's well documented hopefully that won't happen.

@kolyshkin
Copy link
Contributor Author

I guess we can merge this one.

@kolyshkin kolyshkin merged commit f594ede into opencontainers:master Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] runc exec --cgroup

5 participants