(새 문서: 분류: eBPF == 개요 == eBPF에서 pointer에 Rust의 ownership모델을 더하여서, 포인터의 활용성을 높이는 kptr이라는 새로운 타입을 말한다. == Referenced pointer == eBPF는 기존의 Helper function을 넘어서 더욱 많은 커널에 존재하는 기능을 사용하기 위해서 kfunc라는 새로운 unstable한 helper function API를 커널에 만들 수 있는 기법을 도입하였다. 이 kfunc는 reference pointer라는 새로운 po...)
 
편집 요약 없음
 
1번째 줄: 1번째 줄:
[[분류: 운영체제 기술]]
[[분류: 운영체제 보안]]
[[분류: eBPF]]
[[분류: eBPF]]


== 개요 ==
== 개요 ==
eBPF에서 pointer에 Rust의 ownership모델을 더하여서, 포인터의 활용성을 높이는 kptr이라는 새로운 타입을 말한다.
BPF(Berkely packet filter)는 초기에는 패킷 필터의 기능을 하기 위한 기능으로 추가되었다. 그러나 그 강력함으로 인해서 후기에 점차 확대되어 eBPF(extended berkely packet filter)로 진화하였다. eBPF는 운영체제 level에서의 Isolation을 가능하게 해준다. 즉 user level에서 제공한 BPF프로그램을 VM을 이용하여 운영체제에서 안전하게 돌릴 수 있도록 해준다. 이는 JAVA의 JVM과 같이 machine instruction을 BPF instruction에서 보호하는 여러 기법을 통해서 분리시키게 된다.  


== Referenced pointer ==
커널에 설치되는 BPF프로그램은 어떤 Hook point에서 실행될지에 따라서 서로 다른 helper function을 가진다. helper function은 커널이 BPF프로그램에 제공하는 함수로써, BPF프로그램은 helper function을 통해서 커널의 critical한 영역에 안전하게 접근 할 수 있다.  
eBPF는 기존의 Helper function을 넘어서 더욱 많은 커널에 존재하는 기능을 사용하기 위해서 [[kfunc]]라는 새로운 unstable한 helper function API를 커널에 만들 수 있는 기법을 도입하였다. 이 kfunc는 reference pointer라는 새로운 pointer를 만들었는데, 이는 acquire와 release라는 조합으로 프로그래밍되어서, acquire과 release사이에 이 포인터가 dereference되지 않다는 보장만 해주면, verifier가 owner ship을 추적하면서 safety를 보장해주는 semantic이 발생하였다.


== Reference pointer and usability ==
== Abstract ==
Ref.P (Reference pointer)를 사용하게 되면서, bpf_probe_read_user와 같은 probe함수를 사용하지 않아도 되었다. 또한 Ref.P가 helper functions이나 kfunc와 같은 다른 함수의 Argument로 사용될때의 Safety를 쉽게 파악할 수 있게 되었다. 그러나 Ref.P를 MAP에 넣을 수 없었기 때문에 kfunc로 Ref.P를 얻고 release하는 작업을 매 Ref.P의 사용마다 하여야 하기 때문에 불편함과 성능상의 불이익이 있었다.
BPF (Berkeley Packet Filter) is an interface that allows users to offload a simple function to be executed by the kernel. Historically, BPF was introduced in 1992 to implement runtime for efficient user-defined packet filter~\cite{mccanne_bsd_1993}. However classic BPF suffers from restricted functionality and limited instructions and is used only in processing packet filters. To overcome this limitation, Linux kernel community develop extended BPF which extends the register sets and instruction sets from classic BPF with add additional features such as Map or helper function to control kernel context and IPC. As a safe and effective kernel extension, eBPF is usually used for networking, performance monitoring, kernel tracking, and security exceptions. Lots of recent works further innovate the usability of eBPF in various fields of kernel including storage~\cite{zhong_bpf_2021}, NIC~\cite{brunella_hxdp_2022}, and so on.


== kptr ==
== Overall Design ==
kptr은 이 문제를 해결하기 위해서 등장하였다. kptr은 eBPF MAP에 저장될 수 있는 포인터 타입으로써, kfunc나 helper functions을 통해서 획득 가능하다. kptr은 unreference혹은 reference타입을 가지는데, unreference는 기존의 eBPF pointer와 비슷하게 validate를 체크하지 않기 때문에 probe함수로만 접근하여야 한다. kptr의 reference type은 Rust의 ownership모델이나 c++의 unique ptr처럼 단 하나의 reference 소유자를 가지게 된다. kptr을 다른 context로 전송하기 위해서, bpf_kptr_xchg라는 helper function을 통해서 kptr의 ownership을 swap할 수 있게 하였다. 만약 프로그램이 kptr을 map에 저장하고 release를 명시적으로 호출 하지 않으면, 자동으로 연관된 release를 호출함으로써, kptr를 release하도록 한다.  
eBPF programs are executed in predefined kernel hook points including system calls, kernel and user functions, kernel tracepoints, and several others. eBPF also attaches to any execution points by dynamic hooking using kernel probe (kprobe) or user probe (uprobe). Various tools such as \texttt{bcc} or \texttt{bpftrace} provide an abstraction on top of the bytecode for simplifying the use of the BPF. After then, Low-Level Virtual Machine (llvm) compiles the abstracted BPF program to universal 64bit BPF instruction sets. BPF bytecode pass to the kernel by bpf system call. Kernel verify the bytecode is safe to run in kernel context without any harmful effects or concurrency issues. In static time, verifier checks every code path of the BPF and verifies the memory safety, helper functions, and logical errors. After static verification, Just in time (JIT) compiler translates the verified BPF bytecode to machine native-specific instructions. With static verification and JIT compilation, BPF could be run as fast as natively compiled kernel code like kernel module. Cause BPF is executed in kernel context, there is a need for sharing collected data at BPF with user space program. This mechanism is done by BPF Maps which are predefined data structures for storing and retrieving data between User and BPF or BPF to BPF. Maps are accessed from user space safely by BPF system call. Cause the operation for the map is protected by kernel synchronization methods like RCU, there is no concurrency issues from accessing Maps. To protect and regulate the kernel state access and update, BPF cannot access to arbitrary kernel memory or execute kernel functions. Instead, BPF user helper function is a method for safely executing kernel functions from BPF.


== References ==
== BPF Maps ==
# https://lwn.net/Articles/900749/
To communicate with user and BPF each other, BPF provides various types of Maps for storing and retrieving data to kernel context. BPF Maps are usually defined as dictionary types. They have key and value and are accessed by key-value pairs. However, BPF Maps also provides an array or ring buffer semantic for various user needs. Example of BPF Maps includes hash, array, map of map, map of has, and so on. When BPF is executed in an event handler, sometime BPF generates data that should be read at user space or reused in the other context. This collected information is stored in the kernel for security. User and BPF could update and fetch data safely with kernel enforced synchronization mechanisms such as read copy update(RCU). To access Map data in user space, the user should fire a BPF system call with an appropriate file descriptor to the Map. Each Map has a lifetime and can be pinned to the debug file system (debugfs) to access from different user processes.
== Helper function ==
Accessing kernel state is protected by helper function. Helper function is a safe and stable kernel API for reading and updating kernel components. BPF cannot call any other kernel functions without a helper function to guarantee the safety of the kernel. Depending on the BPF program type, the usable subset of the helper functions changed. For example, bpf\_printk which prints the message to the tracing pipe could be used in any BPF program type, but bpf\_redirect only can be used in the XDP BPF program type. The arguments and return value is predefined in the kernel. The type of arguments must be matched with the input registers. For instance, bpf\_map\_lookup\_elem requires the first argument as a pointer to the map value.
 
BPF supports 64-bit machine-independent instruction sets. Usually, BPF instruction sets are generated by llvm compiler from code written in high-level languages such as c (libbpf) or python (bcc). However, programmers also can write BPF byte code directly like classic BPF. The llvm compiler is not part of the TCB of BPF because llvm runs at user level; thus BPF needs verification. To protect from malicious usage of registers, BPF has eleven registers and each register has specific usages. For example, register 0 (R0) is only used for saving a returning, and exiting value of a helper function. R1 to R5 is used for scratch and saving arguments for helper functions. R6 to R10 are callee saved registers and R10 is a read-only frame pointer to access the BPF stack. Using a register out of its designated purposes rejects the verification of the BPF program. BPF instruction consists of 8-bit opcode, 4-bit dest, 4-bit src registers, 16-bit signed offset, and 32-bit signed immediate constant. Opcode specifies the type of operations like ALU operation with immediate constant. If opcode requires a specific register or constant values, then, llvm marks the required field to the BPF instruction. For example, to call a bpf\_trace\_printk, LLVM marks opcode to BPF\_CALL and BPF\_JMP and signed immediate constant to index of bpf\_trace\_printk.
 
== Static verification ==
Static verification is the key concept of the BPF subsystem in Linux kernel. To execute a BPF program inside the kernel context, Linux kernel tries to verify the BPF byte code inside the kernel. If verifier rejects the execution of the BPF, then the BPF prints a fail message to the user space and rejects the execution. There exist three primary points for BPF safety checks, memory access, helper functions, and code paths. In these three points, static verifier tries to find harmful access to kernel memory, invalid access to kernel context, and logical error which could ruin kernel execution such as a deadlock or infinite loop.
 
Like Rust's Type system, BPF verifier specifies the type of each register and updates memory states such as allowed size and offset. Each BPF register is marked with bpf\_reg\_type by verifier. Register types determine the offset and size of kernel memory to which the register could allow access. If a register type is NOT\_INIT then access to the register is rejected. Memory access is checked by the type and metadata of the variable in verifier.
 
Every helper function requires a specific type of argument for input. BPF verifier verifies that the input argument matched with BPF registers. For example, if a helper function requires an argument for a stack pointer, then the input argument register must be a pointer to the stack. Also, BPF verifier checks the privileged level of the helper function and determines the sets that a user could use in a given context. Also, verifier tries to find the invalid access to the helper function with the wrong index.
 
BPF verifier also checks every code path of the BPF to check the termination condition of BPF. BPF has max instruction limits per BPF program that could be executed. This is due to BPF context should not preempt the kernel resources for execution. For example, if BPF falls into an infinite loop in the kernel interrupt handler, it will dramatically slow down all hand entire kernel. To stop this, BPF checks all instructions to ensure unreachable codes and unterminated loops are not included. This guarantees no infinite loop and finite execution time. The verifier checks this by constructing a Directed Acyclic Graph (DAG) of BPF and expanding all the loops inside BPF. Technically, all BPF loop is unrolled by verifier and linearized.
 
== Just in time compilation ==
If verifier passes the BPF program, then the kernel compiles BPF bytecode to machine code by Just In Time Compiler (JIT). Depending on the variation of the CPU, Kernel decided what to do and makes native machine code from the bytecode. Kernel marks memory section holding BPF program as read-only and patches mitigation against Spectre for further hardening of BPF. By JIT compilation, BPF code can be run as fast as native code just like kernel modules.
 
 
== 같이 보기 ==
# bcc
 
 
== 참고 ==
# 한글 요약 문서: https://www.bhral.com/post/linux-kernel-bpf-%EC%8A%A4%ED%84%B0%EB%94%94-%EB%85%B8%ED%8A%B8
# eBPF의 단점: https://tmpout.sh/2/4.html
# 헬퍼 함수: https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
# eBPF perf: https://www.brendangregg.com/ebpf.html#perf
# eBPF 공식 문서: https://ebpf.io/what-is-ebpf
# eBPF PROG_TYPE 설명: https://blogs.oracle.com/linux/post/bpf-a-tour-of-program-types
# eBPF 튜토리얼: https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md
# libbpf 설명 1: https://facebookmicrosites.github.io/bpf/blog/2020/02/20/bcc-to-libbpf-howto-guide.html
# libbpf 설명 2: https://nakryiko.com/posts/libbpf-bootstrap/

2023년 2월 3일 (금) 06:23 기준 최신판


개요

BPF(Berkely packet filter)는 초기에는 패킷 필터의 기능을 하기 위한 기능으로 추가되었다. 그러나 그 강력함으로 인해서 후기에 점차 확대되어 eBPF(extended berkely packet filter)로 진화하였다. eBPF는 운영체제 level에서의 Isolation을 가능하게 해준다. 즉 user level에서 제공한 BPF프로그램을 VM을 이용하여 운영체제에서 안전하게 돌릴 수 있도록 해준다. 이는 JAVA의 JVM과 같이 machine instruction을 BPF instruction에서 보호하는 여러 기법을 통해서 분리시키게 된다.

커널에 설치되는 BPF프로그램은 어떤 Hook point에서 실행될지에 따라서 서로 다른 helper function을 가진다. helper function은 커널이 BPF프로그램에 제공하는 함수로써, BPF프로그램은 helper function을 통해서 커널의 critical한 영역에 안전하게 접근 할 수 있다.

Abstract

BPF (Berkeley Packet Filter) is an interface that allows users to offload a simple function to be executed by the kernel. Historically, BPF was introduced in 1992 to implement runtime for efficient user-defined packet filter~\cite{mccanne_bsd_1993}. However classic BPF suffers from restricted functionality and limited instructions and is used only in processing packet filters. To overcome this limitation, Linux kernel community develop extended BPF which extends the register sets and instruction sets from classic BPF with add additional features such as Map or helper function to control kernel context and IPC. As a safe and effective kernel extension, eBPF is usually used for networking, performance monitoring, kernel tracking, and security exceptions. Lots of recent works further innovate the usability of eBPF in various fields of kernel including storage~\cite{zhong_bpf_2021}, NIC~\cite{brunella_hxdp_2022}, and so on.

Overall Design

eBPF programs are executed in predefined kernel hook points including system calls, kernel and user functions, kernel tracepoints, and several others. eBPF also attaches to any execution points by dynamic hooking using kernel probe (kprobe) or user probe (uprobe). Various tools such as \texttt{bcc} or \texttt{bpftrace} provide an abstraction on top of the bytecode for simplifying the use of the BPF. After then, Low-Level Virtual Machine (llvm) compiles the abstracted BPF program to universal 64bit BPF instruction sets. BPF bytecode pass to the kernel by bpf system call. Kernel verify the bytecode is safe to run in kernel context without any harmful effects or concurrency issues. In static time, verifier checks every code path of the BPF and verifies the memory safety, helper functions, and logical errors. After static verification, Just in time (JIT) compiler translates the verified BPF bytecode to machine native-specific instructions. With static verification and JIT compilation, BPF could be run as fast as natively compiled kernel code like kernel module. Cause BPF is executed in kernel context, there is a need for sharing collected data at BPF with user space program. This mechanism is done by BPF Maps which are predefined data structures for storing and retrieving data between User and BPF or BPF to BPF. Maps are accessed from user space safely by BPF system call. Cause the operation for the map is protected by kernel synchronization methods like RCU, there is no concurrency issues from accessing Maps. To protect and regulate the kernel state access and update, BPF cannot access to arbitrary kernel memory or execute kernel functions. Instead, BPF user helper function is a method for safely executing kernel functions from BPF.

BPF Maps

To communicate with user and BPF each other, BPF provides various types of Maps for storing and retrieving data to kernel context. BPF Maps are usually defined as dictionary types. They have key and value and are accessed by key-value pairs. However, BPF Maps also provides an array or ring buffer semantic for various user needs. Example of BPF Maps includes hash, array, map of map, map of has, and so on. When BPF is executed in an event handler, sometime BPF generates data that should be read at user space or reused in the other context. This collected information is stored in the kernel for security. User and BPF could update and fetch data safely with kernel enforced synchronization mechanisms such as read copy update(RCU). To access Map data in user space, the user should fire a BPF system call with an appropriate file descriptor to the Map. Each Map has a lifetime and can be pinned to the debug file system (debugfs) to access from different user processes.

Helper function

Accessing kernel state is protected by helper function. Helper function is a safe and stable kernel API for reading and updating kernel components. BPF cannot call any other kernel functions without a helper function to guarantee the safety of the kernel. Depending on the BPF program type, the usable subset of the helper functions changed. For example, bpf\_printk which prints the message to the tracing pipe could be used in any BPF program type, but bpf\_redirect only can be used in the XDP BPF program type. The arguments and return value is predefined in the kernel. The type of arguments must be matched with the input registers. For instance, bpf\_map\_lookup\_elem requires the first argument as a pointer to the map value.

BPF supports 64-bit machine-independent instruction sets. Usually, BPF instruction sets are generated by llvm compiler from code written in high-level languages such as c (libbpf) or python (bcc). However, programmers also can write BPF byte code directly like classic BPF. The llvm compiler is not part of the TCB of BPF because llvm runs at user level; thus BPF needs verification. To protect from malicious usage of registers, BPF has eleven registers and each register has specific usages. For example, register 0 (R0) is only used for saving a returning, and exiting value of a helper function. R1 to R5 is used for scratch and saving arguments for helper functions. R6 to R10 are callee saved registers and R10 is a read-only frame pointer to access the BPF stack. Using a register out of its designated purposes rejects the verification of the BPF program. BPF instruction consists of 8-bit opcode, 4-bit dest, 4-bit src registers, 16-bit signed offset, and 32-bit signed immediate constant. Opcode specifies the type of operations like ALU operation with immediate constant. If opcode requires a specific register or constant values, then, llvm marks the required field to the BPF instruction. For example, to call a bpf\_trace\_printk, LLVM marks opcode to BPF\_CALL and BPF\_JMP and signed immediate constant to index of bpf\_trace\_printk.

Static verification

Static verification is the key concept of the BPF subsystem in Linux kernel. To execute a BPF program inside the kernel context, Linux kernel tries to verify the BPF byte code inside the kernel. If verifier rejects the execution of the BPF, then the BPF prints a fail message to the user space and rejects the execution. There exist three primary points for BPF safety checks, memory access, helper functions, and code paths. In these three points, static verifier tries to find harmful access to kernel memory, invalid access to kernel context, and logical error which could ruin kernel execution such as a deadlock or infinite loop.

Like Rust's Type system, BPF verifier specifies the type of each register and updates memory states such as allowed size and offset. Each BPF register is marked with bpf\_reg\_type by verifier. Register types determine the offset and size of kernel memory to which the register could allow access. If a register type is NOT\_INIT then access to the register is rejected. Memory access is checked by the type and metadata of the variable in verifier.

Every helper function requires a specific type of argument for input. BPF verifier verifies that the input argument matched with BPF registers. For example, if a helper function requires an argument for a stack pointer, then the input argument register must be a pointer to the stack. Also, BPF verifier checks the privileged level of the helper function and determines the sets that a user could use in a given context. Also, verifier tries to find the invalid access to the helper function with the wrong index.

BPF verifier also checks every code path of the BPF to check the termination condition of BPF. BPF has max instruction limits per BPF program that could be executed. This is due to BPF context should not preempt the kernel resources for execution. For example, if BPF falls into an infinite loop in the kernel interrupt handler, it will dramatically slow down all hand entire kernel. To stop this, BPF checks all instructions to ensure unreachable codes and unterminated loops are not included. This guarantees no infinite loop and finite execution time. The verifier checks this by constructing a Directed Acyclic Graph (DAG) of BPF and expanding all the loops inside BPF. Technically, all BPF loop is unrolled by verifier and linearized.

Just in time compilation

If verifier passes the BPF program, then the kernel compiles BPF bytecode to machine code by Just In Time Compiler (JIT). Depending on the variation of the CPU, Kernel decided what to do and makes native machine code from the bytecode. Kernel marks memory section holding BPF program as read-only and patches mitigation against Spectre for further hardening of BPF. By JIT compilation, BPF code can be run as fast as native code just like kernel modules.


같이 보기

  1. bcc


참고

  1. 한글 요약 문서: https://www.bhral.com/post/linux-kernel-bpf-%EC%8A%A4%ED%84%B0%EB%94%94-%EB%85%B8%ED%8A%B8
  2. eBPF의 단점: https://tmpout.sh/2/4.html
  3. 헬퍼 함수: https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
  4. eBPF perf: https://www.brendangregg.com/ebpf.html#perf
  5. eBPF 공식 문서: https://ebpf.io/what-is-ebpf
  6. eBPF PROG_TYPE 설명: https://blogs.oracle.com/linux/post/bpf-a-tour-of-program-types
  7. eBPF 튜토리얼: https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md
  8. libbpf 설명 1: https://facebookmicrosites.github.io/bpf/blog/2020/02/20/bcc-to-libbpf-howto-guide.html
  9. libbpf 설명 2: https://nakryiko.com/posts/libbpf-bootstrap/