While editing the capabilities page of the how containers work zine, I found myself trying to explain why strace
doesn’t work in a Docker container.
The problem here is – if you run strace
in a Docker container, this happens:
$ docker run -it ubuntu: 28 / bin / bash $ # ... install strace ... [email protected] : / # strace ls strace: ptrace (PTRACE_TRACEME, ...): Operation not permittedstrace works using the
ptrace
system call, so ifptrace
isn't allowed, it's definitely not gonna work! This is pretty easy to fix - on my machine, this fixes it:docker run --cap-add=SYS_PTRACE -it ubuntu: / bin / bash
But I wasn’t interested in fixing it, I wanted to know why it happens. So why does strace not work, and why does
- cap-add=SYS_PTRACE
fix it?hypothesis 1: container processes are missing the
CAP_SYS_PTRACE
capabilityI always thought the reason was that Docker container processes by default didn’t have the
CAP_SYS_PTRACE
capability. This is consistent with it being fixed by- cap-add=SYS_PTRACE
, right?But this actually doesn’t make sense for 2 reasons.
Reason 1 : Experimentally, as a regular user, I can strace on any process run by my user. But if I check if my current process has the
CAP_SYS_PTRACE
capability, I don't:$ getpcaps $$ Capabilities for `012491 ':=
Reason 2 :
man capabilities
says this aboutCAP_SYS_PTRACE
:CAP_SYS_PTRACE Trace arbitrary processes using ptrace (2);
So the point of
CAP_SYS_PTRACE
is to let you ptrace arbitrary processes owned by any user, the way that root usually can. You shouldn’t need it to just ptrace a regular process owned by your user.And I tested this a third way - I ran a Docker container with
docker run --cap-add=SYS_PTRACE -it ubuntu: / bin / bash
, dropped theCAP_SYS_PTRACE
capability, and I could still strace processes even though I didn’t have that capability anymore. What? Why?hypothesis 2: something about user namespaces ???
My next (much less well-founded) hypothesis was something along the lines of “um, maybe the process is in a different user namespace and strace doesn’t work because of ... reasons? ” This isn’t really coherent but here's what happened when I looked into it.
Is the container process in a different user namespace? Well, in the container:
[email protected] : / # ls / proc / $$ / ns / user -l ... / proc / 1 / ns / user -> 'user: [4026531837]'On the host:
[email protected] : ~ $ ls / proc / $$ / ns / user -l ... / proc / 012491 / ns / user -> 'user: [4026531837]'Because the user namespace ID (
is the same, the root user in the container is the exact same user as the root user on the host. So There’s definitely no reason it shouldn’t be able to strace processes that it created!
This hypothesis doesn’t make much sense but I hadn’t realized that the root user in a Docker container is the same as the root user on the host, so I thought that was interesting.
hypothesis 3: the ptrace system call is being blocked by a seccomp-bpf rule
I also knew that Docker uses seccomp-bpf to stop container processes from running a lot of system calls. And ptrace is in the
list of system calls blocked by Docker’s default seccomp profile ! (actually the list of allowed system calls is a whitelist, so it’s just that ptrace is not in the default whitelist. But it comes out to the same thing.)
That easily explains why strace wouldn't work in a Docker container - if the
ptrace
system call is totally blocked, then of course you can’t call it at all and strace would fail.Let’s verify this hypothesis - if we disable all seccomp rules, we can strace in a Docker container?
$ docker run --security-opt seccomp=unconfined -it ubuntu:
/ bin / bash $ strace ls execve ("/ bin / ls", ["ls"], 0x7ffc a
/ 8 vars /)=0 ... it works fine ...Yes! It works! Great. Mystery solved, except ...
why does
- cap-add=SYS_PTRACE
fix the problem?What we still haven’t explained is: why does
- cap-add=SYS_PTRACE
would fix the problem?
The man page for
docker run
explains the--cap-add
argument this way:- cap-add=[] Add Linux capabilitiesThat doesn’t have anything to do with seccomp rules! What’s going on?
Let's look at the Docker source code.When the documentation doesn’t help, the only thing to do is go look at the source.
The nice thing about Go is, because dependencies are often vendored in a Go repository, you can just grep the repository to figure out where the code that does a thing is. So I cloned
github.com/moby/moby
and grepped for some things, likerg CAP_SYS_PTRACE
.
Here's what I think is going on. In containerd’s seccomp implementation, in contrib / seccomp / seccomp_default.go , There’s a bunch of code that makes sure that if a process has a capability, then it’s also given access (through a seccomp rule) to use the system calls that go with that capability.
case "CAP_SYS_PTRACE": s.Syscalls=append (s.Syscalls, specs.LinuxSyscall { Names: [] string { "kcmp", "process_vm_readv", "process_vm_writev", "ptrace", }, Action: specs.ActAllow, Args: [] specs.LinuxSeccompArg {}, })
There’s some other code that seems to do something very similar in profiles / seccomp / seccomp.go in moby and the default seccomp profile , so it’s possible that that’s what’s doing it instead.
So I think we have our answer!
- cap-add
in Docker does a little more than what it saysThe upshot seems to be that
- cap-add
doesn’t do exactly what it says It does in the man page, it’s more like- cap-add-and-also-whitelist-some-extra-system-calls-if-required
. Which makes sense! If you have a capability likeCAP_SYS_PTRACE
which is supposed to let you use theprocess_vm_readv
system call but that system call is Blocked by a seccomp profile, that’s not going to help you much!So allowing the
process_vm_readv
andptrace
system calls when you give the containerCAP_SYS_PTRACE
seems like a reasonable choice.strace actually does work in newer versions of Docker
As of
this commit
(docker) . , Docker does actually allow the
ptrace
) system calls for kernel versions newer than 4.8.But the Docker version on my laptop is
22. 7
, so it predates that commit.
That's all!
This was a fun small thing to investigate, and I think it’s a nice example of how containers are made of lots of moving pieces that work together in not-completely-obvious ways.
Read More
(Full coverage and live updates on the Coronavirus (Covid -) )
GIPHY App Key not set. Please check settings