개요

BPF(Berkely packet filter)는 초기에는 패킷 필터의 기능을 하기 위한 기능으로 추가되었다. 그러나 그 강력함으로 인해서 후기에 점차 확대되어 eBPF(extended berkely packet filter)로 진화하였다. eBPF는 운영체제 level에서의 Isolation을 가능하게 해준다. 즉 user level에서 제공한 BPF프로그램을 VM을 이용하여 운영체제에서 안전하게 돌릴 수 있도록 해준다. 이는 JAVA의 JVM과 같이 machine instruction을 BPF instruction에서 보호하는 여러 기법을 통해서 분리시키게 된다.

커널에 설치되는 BPF프로그램은 어떤 Hook point에서 실행될지에 따라서 서로 다른 helper function을 가진다. helper function은 커널이 BPF프로그램에 제공하는 함수로써, BPF프로그램은 helper function을 통해서 커널의 critical한 영역에 안전하게 접근 할 수 있다.

Abstract

BPF (Berkeley Packet Filter) is an interface that allows users to offload a simple function to be executed by the kernel. Historically, BPF was introduced in 1992 to implement runtime for efficient user-defined packet filter~\cite{mccanne_bsd_1993}. However classic BPF suffers from restricted functionality and limited instructions and is used only in processing packet filters. To overcome this limitation, Linux kernel community develop extended BPF which extends the register sets and instruction sets from classic BPF with add additional features such as Map or helper function to control kernel context and IPC. As a safe and effective kernel extension, eBPF is usually used for networking, performance monitoring, kernel tracking, and security exceptions. Lots of recent works further innovate the usability of eBPF in various fields of kernel including storage~\cite{zhong_bpf_2021}, NIC~\cite{brunella_hxdp_2022}, and so on.

Overall Design

eBPF programs are executed in predefined kernel hook points including system calls, kernel and user functions, kernel tracepoints, and several others. eBPF also attaches to any execution points by dynamic hooking using kernel probe (kprobe) or user probe (uprobe). Various tools such as \texttt{bcc} or \texttt{bpftrace} provide an abstraction on top of the bytecode for simplifying the use of the BPF. After then, Low-Level Virtual Machine (llvm) compiles the abstracted BPF program to universal 64bit BPF instruction sets. BPF bytecode pass to the kernel by bpf system call. Kernel verify the bytecode is safe to run in kernel context without any harmful effects or concurrency issues. In static time, verifier checks every code path of the BPF and verifies the memory safety, helper functions, and logical errors. After static verification, Just in time (JIT) compiler translates the verified BPF bytecode to machine native-specific instructions. With static verification and JIT compilation, BPF could be run as fast as natively compiled kernel code like kernel module. Cause BPF is executed in kernel context, there is a need for sharing collected data at BPF with user space program. This mechanism is done by BPF Maps which are predefined data structures for storing and retrieving data between User and BPF or BPF to BPF. Maps are accessed from user space safely by BPF system call. Cause the operation for the map is protected by kernel synchronization methods like RCU, there is no concurrency issues from accessing Maps. To protect and regulate the kernel state access and update, BPF cannot access to arbitrary kernel memory or execute kernel functions. Instead, BPF user helper function is a method for safely executing kernel functions from BPF.

BPF Maps

To communicate with user and BPF each other, BPF provides various types of Maps for storing and retrieving data to kernel context. BPF Maps are usually defined as dictionary types. They have key and value and are accessed by key-value pairs. However, BPF Maps also provides an array or ring buffer semantic for various user needs. Example of BPF Maps includes hash, array, map of map, map of has, and so on. When BPF is executed in an event handler, sometime BPF generates data that should be read at user space or reused in the other context. This collected information is stored in the kernel for security. User and BPF could update and fetch data safely with kernel enforced synchronization mechanisms such as read copy update(RCU). To access Map data in user space, the user should fire a BPF system call with an appropriate file descriptor to the Map. Each Map has a lifetime and can be pinned to the debug file system (debugfs) to access from different user processes.

Helper function

Accessing kernel state is protected by helper function. Helper function is a safe and stable kernel API for reading and updating kernel components. BPF cannot call any other kernel functions without a helper function to guarantee the safety of the kernel. Depending on the BPF program type, the usable subset of the helper functions changed. For example, bpf\_printk which prints the message to the tracing pipe could be used in any BPF program type, but bpf\_redirect only can be used in the XDP BPF program type. The arguments and return value is predefined in the kernel. The type of arguments must be matched with the input registers. For instance, bpf\_map\_lookup\_elem requires the first argument as a pointer to the map value.

BPF supports 64-bit machine-independent instruction sets. Usually, BPF instruction sets are generated by llvm compiler from code written in high-level languages such as c (libbpf) or python (bcc). However, programmers also can write BPF byte code directly like classic BPF. The llvm compiler is not part of the TCB of BPF because llvm runs at user level; thus BPF needs verification. To protect from malicious usage of registers, BPF has eleven registers and each register has specific usages. For example, register 0 (R0) is only used for saving a returning, and exiting value of a helper function. R1 to R5 is used for scratch and saving arguments for helper functions. R6 to R10 are callee saved registers and R10 is a read-only frame pointer to access the BPF stack. Using a register out of its designated purposes rejects the verification of the BPF program. BPF instruction consists of 8-bit opcode, 4-bit dest, 4-bit src registers, 16-bit signed offset, and 32-bit signed immediate constant. Opcode specifies the type of operations like ALU operation with immediate constant. If opcode requires a specific register or constant values, then, llvm marks the required field to the BPF instruction. For example, to call a bpf\_trace\_printk, LLVM marks opcode to BPF\_CALL and BPF\_JMP and signed immediate constant to index of bpf\_trace\_printk.

Static verification

Static verification is the key concept of the BPF subsystem in Linux kernel. To execute a BPF program inside the kernel context, Linux kernel tries to verify the BPF byte code inside the kernel. If verifier rejects the execution of the BPF, then the BPF prints a fail message to the user space and rejects the execution. There exist three primary points for BPF safety checks, memory access, helper functions, and code paths. In these three points, static verifier tries to find harmful access to kernel memory, invalid access to kernel context, and logical error which could ruin kernel execution such as a deadlock or infinite loop.

Like Rust's Type system, BPF verifier specifies the type of each register and updates memory states such as allowed size and offset. Each BPF register is marked with bpf\_reg\_type by verifier. Register types determine the offset and size of kernel memory to which the register could allow access. If a register type is NOT\_INIT then access to the register is rejected. Memory access is checked by the type and metadata of the variable in verifier.

Every helper function requires a specific type of argument for input. BPF verifier verifies that the input argument matched with BPF registers. For example, if a helper function requires an argument for a stack pointer, then the input argument register must be a pointer to the stack. Also, BPF verifier checks the privileged level of the helper function and determines the sets that a user could use in a given context. Also, verifier tries to find the invalid access to the helper function with the wrong index.

BPF verifier also checks every code path of the BPF to check the termination condition of BPF. BPF has max instruction limits per BPF program that could be executed. This is due to BPF context should not preempt the kernel resources for execution. For example, if BPF falls into an infinite loop in the kernel interrupt handler, it will dramatically slow down all hand entire kernel. To stop this, BPF checks all instructions to ensure unreachable codes and unterminated loops are not included. This guarantees no infinite loop and finite execution time. The verifier checks this by constructing a Directed Acyclic Graph (DAG) of BPF and expanding all the loops inside BPF. Technically, all BPF loop is unrolled by verifier and linearized.

Just in time compilation

If verifier passes the BPF program, then the kernel compiles BPF bytecode to machine code by Just In Time Compiler (JIT). Depending on the variation of the CPU, Kernel decided what to do and makes native machine code from the bytecode. Kernel marks memory section holding BPF program as read-only and patches mitigation against Spectre for further hardening of BPF. By JIT compilation, BPF code can be run as fast as native code just like kernel modules.


같이 보기

  1. bcc


참고

  1. 한글 요약 문서: https://www.bhral.com/post/linux-kernel-bpf-%EC%8A%A4%ED%84%B0%EB%94%94-%EB%85%B8%ED%8A%B8
  2. eBPF의 단점: https://tmpout.sh/2/4.html
  3. 헬퍼 함수: https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
  4. eBPF perf: https://www.brendangregg.com/ebpf.html#perf
  5. eBPF 공식 문서: https://ebpf.io/what-is-ebpf
  6. eBPF PROG_TYPE 설명: https://blogs.oracle.com/linux/post/bpf-a-tour-of-program-types
  7. eBPF 튜토리얼: https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md
  8. libbpf 설명 1: https://facebookmicrosites.github.io/bpf/blog/2020/02/20/bcc-to-libbpf-howto-guide.html
  9. libbpf 설명 2: https://nakryiko.com/posts/libbpf-bootstrap/