BPF at Facebook and beyond, Hacker News

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 1 month: no payment or credit card required.Activate your trial subscription nowand see why thousands of readers subscribe to LWN.net.

It is no secret that much of the work on the in-kernel BPF virtual machine and associated user-space support code is being done at Facebook. But less is known about how Facebook is actually using BPF. At Kernel Recipes 2019, BPF developer Alexei Starovoitovdescribeda bit of that work, though even he admitted that he didn’t know what most of the BPF programs running there were doing. He also summarized recent developments with BPF and some near-future work.

Kernels at Facebook

Facebook, he began, has an upstream-first philosophy, taken to an extreme; the company tries not to carry any out-of-tree patches at all. All work done at Facebook is meant to go upstream as soon as it practically can. The company also runs recent kernels, upgrading whenever possible. The company can move to a new kernel in a matter of days; this process could be faster, he said, except that it still takes some time to reboot thousands of servers. As of just before the talk, most of the Facebook fleet was running 4. 16, with a few 4. 11 machines hanging around and some at 5.2.

He pointed out the lack of long-term-support kernels in the above list. Facebook does not plan to stay with any given kernel for a long time, so the company doesn’t care about long-term support. Instead, machines are simply upgraded to whichever kernel is available. Within a given version, though, there can be a fair amount of variation across the fleet; the kernel team evidently backports features into older kernels when the need arises. That can create challenges for applications and, especially, BPF-based applications.

The first rule of kernel development is “don’t break user space”; anything that might cause a user-space program to fail becomes part of the kernel ABI. Performance regressions are included in this rule. Performance problems are easy to create, so Facebook needs a team to track them down. Often, it seems, BPF is fingered as the cause of these problems.

Starovoitov asked the audience to guess how many BPF programs were running on their laptops, then to run this command:

    ls -la / proc / * / fd | grep bpf-prog | wc -l

The answer on your editor’s system is six, all running from systemd. He was surprised by the answer at Facebook: there are about 40 BPF programs running on each server, with another 100 that are demand loaded. There are many teams within the company writing and deploying these programs; the kernel team doesn’t even know about all of these BPF programs. These programs are about evenly split between those attached to kprobes, those attached to tracepoints, and network scheduling-class helpers; about 10% fall into other categories.

He gave a few examples of performance issues that, at least on the surface, were caused by BPF:

Facebook runs a packet-capture daemon that makes use of a network scheduling-class BPF program; it occasionally spits out a packet for inspection. On new kernels, running that daemon regressed overall system performance by about 1%. It turns out that this daemon uses another BPF program, attached to a kprobe, for a different purpose. The function that probe attaches to didn’t exist in the newer kernel, causing the daemon to conclude that BPF as a whole was broken; it then fell back to an older, slower method for packet capture. Kprobes are not a stable ABI, Starovoitov said, but when kernel developers change a function kprobe usage can still require somebody to investigate the resulting breakage.
The number-one performance-analysis tool at Facebook is a profiling daemon that attaches BPF programs to tracepoints and kprobes in the scheduler and beyond. On new kernels, it caused a 2% performance regression, manifesting as an increase in software-interrupt time. It turns out that, in the 5.2 kernel, setting a kprobe causes the text section to be remapped from 2MB huge pages to normal 4KB pages, with a resulting increase in TLB misses and decrease in performance.
There is a security monitoring daemon that sets BPF programs on three kprobes and one kretprobe. It runs at low priority, waking up every few seconds and consuming about 0. 01% of the CPU. This daemon was causing huge latency spikes for the high-priority database application. Some tracing work showed that, on occasion, amemcpy ()call in the database could stall for as much as 1/4 second while this daemon was reading its/ proc / (PID) / environfile. Much more tracing showed that this daemon was acquiring themmap_sem lock when reading that/ procfile, then being scheduled out for long periods of time, blocking page faults in the main application. The root cause was a basic priority-inversion issue; raising the security daemon’s priority prevents this problem.

The takeaway from all of these episodes – and especially the last one – is that the best tool for tracking down BPF-related performance regressions is BPF.

Current and future BPF improvements

Another kind of problem results from how BPF programs are built. A user-space application will contain one or more BPF programs to be loaded into the kernel. These programs are written in C and compiled to the BPF virtual machine instruction set; this compilation happens on the target system. To ensure that the compilation can be done consistently, a version of the LLVM compiler is embedded in the application itself. This makes the applications big, and the compilation process can perturb the main workload on the target system. The compilation can also take a long time, since it is done at a low priority; several minutes to compile a 100 – line program is not unusual. The system headers needed to understand kernel data structures may be missing from the target system, creating compilation failures. It is a pain point, he said.

The solution to this problem is to be found in the “compile once, run everywhere “work that reached a milestone with the 5.4 kernel BPF type format (BTF) data describing kernel data structures that was created for just this purpose. With BTF provided by the kernel, there is no longer any need to ship kernel headers around; instead, the bpftool utility just extracts the BTF data and creates a “monster header file” on the target system. An LLVM built-in function has been added to preserve the offsets into structures used by BPF programs; those offsets are then “relocated” at load time to match the version of the structure used in the target kernel.

A number of other interesting projects have made progress in 2019, he said. Support forbounded loopsin the verifier was added to 5.3 after two years of work. BPF programs can nowmanage concurrency with spinlocks, with the verifier proving that these programs will not deadlock. Dead-code elimination has been added, and scalar precision tracking as well.

Starovoitov said that people often complain that the BPF verifier is painful to deal with. But, he said, it is far smarter than the LLVM compiler, and a number of advantages come from that, starting with the ability to prove that a program is safe to load into the kernel. The verifier is also able to perform far better dead-code elimination than LLVM can.

In the future, the verifier is set to get better by making more use of the available BTF data. Every program type, for example, must implement its own boilerplate functions to provide (and check) access to the context object passed to the programs themselves. This code bloats the kernel, he said, and tends to be prone to bugs. With BTF, those functions will no longer be necessary; the verifier can use the BTF data to check programs directly. That will enable the removal of 4, 000 lines of code, he said.

He concluded by saying that BPF development is “100% driven by use cases “; the way to shape its future direction is to show the ways in which new features can be useful. Even better, of course, is to hack new extensions and to share them with the community.

[Your editor thanks the Linux Foundation, LWN’s travel sponsor, forsupporting his travel to this event.]

Did you like this article?Please accept ourtrial subscription offerto be able to see more content like it and to participate in the discussion.

(Log into post comments)

BPF at Facebook and beyond, Hacker News

Welcome to LWN.net

Free trial subscription

Kernels at Facebook

Current and future BPF improvements

What do you think?

China's AI regulatory strategy: balancing innovation and control

Nation-state actors exploited two zero-days in ASA and FTD firewalls to breach government networks

GDPR, this is why the new procedural rules are a crucial moment for data protection

The Family of Safe Golang Libraries is Growing!

France seeks new EU sanctions to target Russian disinformation

Introducing the nanoMIPS Architecture Plugin for Binary Ninja

After Facebook, Reliance Industries in talks with other investors for similar-size stake sale – The Financial Express, Financialexpress.com

Facebook is in heavy use during the coronavirus pandemic but business is down, Recode

Facebook Messenger Rooms looks a lot like Zoom, Recode

Join us live and play The TechRadar Quiz on Facebook and YouTube today! – TechRadar India, TechRadar

Facebook deal may keep RIL on road to zero-debt plan – Times of India, The Times of India

Why Facebook is spending billions for a minority stake in Reliance Jio – Business Standard, Business-standard.com

Leave a ReplyCancel reply

Cheats For Little Alchemy

3TB Of Mega.nz Links For Free Courses And E-Books 2022 (Updated)

Udemy Coupon [100% OFF] QuickBooks Online 2020

How to Earn Money from FreeCash.com, Playing Games, Testing Apps, and Taking Surveys

Amazon FBA Product Research & Find Products for Amazon FBA

Rubot v6.6.7.0 – Twitch Views Bot 2022

Apple (AAPL) Stock Sets New All-Time High – Will It Stay There ?, Crypto Coins News

Apple’s call to pull the HKMapLive app from its App Store shows how much it needs China, Recode

Welcome to LWN.net

Free trial subscription

Kernels at Facebook

Current and future BPF improvements

What do you think?

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections