[LWN subscriber-only content]
Welcome to LWN.net
The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!
Free trial subscription
Try LWN for free for 1 month: no payment or credit card required.Activate your trial subscription nowand see why thousands of readers subscribe to LWN.net.
(November) , 2019
One of the many responsibilities of the operating system is to help processes keep secrets from each other. Operating systems often fail in this regard, sometimes due to factors – such as hardware bugs and user-space vulnerabilities – that are beyond their direct control. It is thus unsurprising that there is an increasing level of interest in ways to improve the ability to keep data secret, perhaps even from the operating system itself. The
patch set from Mike Rapoport is one example of the work that is being done in this area; it also shows that the development community has not yet really began to figure out how this type of feature should work.
MAP_EXCLUSIVEis a new flag for the(mmap))system call; its purpose is to request a region of memory that is mappedonlyfor the calling process and inaccessible to anybody else, including the kernel. It is a part of a largeraddress-space isolationeffort underway in the memory-management subsystem, most of which is based on the idea that unmapped memory is much harder for an attacker to access.
Mapping a memory range withMAP_EXCLUSIVEhas a number of effects. It automatically implies theMAP_LOCKEDandMAP_POPULATEflags, meaning that the memory in question will be immediately faulted into RAM and locked there – it should never find its way to a swap area, for example. TheMAP_PRIVATEandMAP_ANONYMOUSflags are required, andMAP_HUGETLBis not allowed. Pages that are mapped this way will not be copied if the process forks. They are also removed from the kernel’s direct mapping – the linear mapping of all of physical memory – making them inaccessible to the kernel in most circumstances.
The goal behindMAP_EXCLUSIVEseems to have support within the community, but the actual implementation has raised a number of questions about how this functionality should work. One area of concern is the removal of the pages from the direct mapping. The kernel uses huge pages for that mapping, since that gives a significant performance improvement through decreased translation lookaside buffer (TLB) pressure. Carving specific pages out of that mapping requires splitting the huge pages into normal pages, slowing things down for every process in the system. The splitting of the direct mapping in another context caused a 2% performance regression at Facebook,according to Alexei Starovoitovin October; that is not a cost that everybody is willing to pay.
Elena Reshetovaindicatedthat she has been working on similar functionality; rather than enhancingmmap (), her patch provides a newmadvise ()flag and requires that the secret areas be a multiple of the page size. Her version will eventually wipe any secret areas before returning the memory to general use in case the calling process doesn’t do that.
Reshetova also raised the idea of mapping this memory uncached. The benefit of doing so would be to protect its contents from a whole range of speculative-execution attacks, known and unknown. On the other hand, the effect on application performance would be something between “painful” and “crippling”, depending on how often the memory is accessed. Some users would likely welcome the extra protection; many others may well find that the performance penalty rules out this feature’s use entirely. Andy Lutomirskisaidthat uncached memory should only be provided if it is explicitly asked for, but Alan Coxrespondedthat users generally do not know whether they want uncached memory or not.
More to the point, Cox continued, there may be any of a number of things that the system might do to protect the contents of secret memory; those things will vary from one system to the next and users will not be in a position to know what any specific system should use. That makes it all the more important to nail down what theMAP_EXCLUSIVEflag really means:
IMHO the question is what is the actual semantic here. What are you asking for? Does it mean “at any cost”, what does it guarantee (100% or statistically), what level of guarantee is acceptable, what level is -EOPNOTSUPP or similar?
James Bottomleytook this argument even further, describingMAP_EXCLUSIVEas “a usability problem“. Protecting secret data might, on some systems, involve hardware technologies likeTME and SEV, for example, but developers cannot know that in a general way. Somehow, Bottomley suggested, the kernel should make the best choice it can for how to protect secret memory; one such choice could be to make the memory uncached only on systems where the speculative-execution mitigations are not active. LutomirskiWorriedthat this approach would not work, though; there are too many variables and ways in which things could go wrong.
There is only one truly clear conclusion from this discussion: a desire for memory with higher levels of secrecy exists, but the development community lacks a clear idea of how that secrecy should be implemented and how it should be presented to the user. That suggests that this feature will not be showing up in a mainline kernel anytime soon. Getting memory secrecy wrong risks saddling the community with the maintenance of a misdesigned interface and, possibly, giving application developers a false sense of security. It is better to go slow in the hope of getting things right.
Did you like this article?Please accept ourtrial subscription offerto be able to see more content like it and to participate in the discussion.
(Log into post comments)