Avoiding Retpolines with Static Calls, Hacker News

[LWN subscriber-only content]

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

By Jonathan Corbet

(March , 005164711

January 2018 was a sad time in the kernel community. The Meltdown and Specter vulnerabilities had finally been disclosed, and the required workarounds hurt kernel performance in a number of ways. One of those workarounds –
retpolines – continues to cause pain, with developers going out of their way to avoid indirect calls, since they must now be implemented with retpolines. In some cases, though, there may be a way to avoid retpolines and regain much of the lost performance; after a long gestation period, the “static calls” mechanism may finally be nearing the point where it can be merged upstream.

Indirect calls happen when the address of a function to be called is not known at compile time; instead, that address is stored in a pointer variable and used at run time. These indirect calls, as it turns out, are readily exploited by speculative-execution attacks. Retpolines defeat these attacks by turning an indirect call into a rather more complex (and expensive) code sequence that cannot be executed speculatively.

Retpolines solved the problem, but they also slow down the kernel, so Developers have been keenly interested in finding ways to avoid them. A number of approaches have been tried; a few of which were covered here (in late) . While some of Those techniques have been merged, static calls have remained outside of the mainline. They have recently returned in the form of this patch set posted by Peter Zijlstra; it contains the work of others as well, in particular Josh Poimboeuf, who posted the original static-call implementation.

An indirect call works from a location in writable memory where the destination of the jump can be found. Changing the destination of the call is a matter of storing a new address in that location. Static calls, instead, use a location in executable memory containing a jump instruction that points to the target function. Actually executing a static call requires “calling” to this special location, which will immediately jump to the real target. The static-call location is, in other words, a classic code trampoline. Since both jumps are direct – the target address is found directly in the executable code itself – no retpolines are needed and execution is fast.

Static calls must be declared before they can be used; There are two macros that can do that:

#include DEFINE_STATIC_CALL (name, target); DECLARE_STATIC_CALL (name, target);

DEFINE_STATIC_CALL () Creates a new static call with the given name that initially points at the function target () . DECLARE_STATIC_CALL () , instead, declares the existence of a static call that is defined elsewhere; in that case, target () is only used for type checking the calls.

Actually calling a static call is done with:

static_call (name) (args …);

Where name is the name used to define the call. This will cause a jump through the trampoline to the target function; if that function returns a value, static_call () will also return that value.

The target of a static call can be changed with:

static_call_update (name, target2);

Where target2 () is the new target for the static call. Changing the target of a static call requires patching the code of the running kernel, which is an expensive operation. That implies that static calls are only appropriate for settings where the target will change rarely.

One such setting can be found in the patch set: tracepoints. Activating a tracepoint itself requires code patching. Once that is done, the kernel responds to a hit on a tracepoint by iterating through a linked list of callback functions that have been attached there. In almost every case, though, there will only be one such function. This patch in the series optimizes that case by using a static call for the single-function case. Since the intent behind tracepoints is to minimize their overhead to the greatest extent possible, use of static calls makes sense there.

This patch set also contains a further optimization not found in the original. Jumping through the trampoline is much faster than using a retpoline, but it is still one more jump than is strictly necessary. So this patch

causes static calls to store the target address directly into the call site (s), eliminating the need for the trampoline entirely. Doing so may require changing multiple call sites, but most static calls are unlikely to have many of those. It also requires support in the objtool tool to locate those call sites during the kernel build process.

The end result of this work appears to be a significant reduction in the cost of the Specter mitigations when using tracepoints – a slowdown of just over 4% drops to about 1.6%. It has been through a number of revisions, as well as some improvements to the underlying text-patching code, and appears to be about ready. Chances are that static calls will go upstream in the near future.

Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.

(