Internet DRAFT - draft-elkins-ebpf-pdm-ebpf
draft-elkins-ebpf-pdm-ebpf
Network Working Group N. Elkins
Internet-Draft Inside Products, Inc.
Intended status: Informational C. Sharma
Expires: 23 August 2024 A. Umesh
B. V
M. P. Tahiliani
NITK Surathkal
20 February 2024
Implementation and Performance Evaluation of PDM using eBPF
draft-elkins-ebpf-pdm-ebpf-00
Abstract
RFC8250 describes an optional Destination Option (DO) header embedded
in each packet to provide sequence numbers and timing information as
a basis for measurements. As kernel implementation can be complex
and time-consuming, this document describes the implementation of the
Performance and Diagnostic Metrics (PDM) extension header using eBPF
in the Linux kernel's Traffic Control (TC) subsystem. The document
also provides a performance analysis of the eBPF implementation in
comparison to the traditional kernel implementation.
About This Document
This note is to be removed before publishing as an RFC.
The latest revision of this draft can be found at
https://ChinmayaSharma-hue.github.io/pdm-ebpf-draft/draft-elkins-
ebpf-pdm-ebpf.html. Status information for this document may be
found at https://datatracker.ietf.org/doc/draft-elkins-ebpf-pdm-
ebpf/.
Source for this draft and an issue tracker can be found at
https://github.com/ChinmayaSharma-hue/pdm-ebpf-draft.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Elkins, et al. Expires 23 August 2024 [Page 1]
Internet-Draft pdm-ebpf February 2024
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 23 August 2024.
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1. PDM . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2. eBPF . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Using tc-bpf to add IPv6 extension headers . . . . . . . . . 4
2.1. tc-bpf . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Adding IPv6 extension headers in tc . . . . . . . . . . . 4
2.2.1. Ingress tc-bpf program . . . . . . . . . . . . . . . 5
2.2.2. Egress tc-bpf program . . . . . . . . . . . . . . . . 6
3. Implementation of PDM extension header in tc-bpf . . . . . . 6
3.1. Egress tc-bpf program for PDM . . . . . . . . . . . . . . 7
3.2. Ingress tc-bpf program for PDM . . . . . . . . . . . . . 8
3.3. Implementation of PDM initiation . . . . . . . . . . . . 8
3.4. Implementation of PDM termination . . . . . . . . . . . . 9
4. Advantages of using eBPF to add extension headers . . . . . . 9
5. Performance Analysis . . . . . . . . . . . . . . . . . . . . 10
5.1. Experiment Setup . . . . . . . . . . . . . . . . . . . . 10
5.2. CPU Performance . . . . . . . . . . . . . . . . . . . . . 10
5.2.1. CPU Usage in cycles . . . . . . . . . . . . . . . . . 11
5.2.2. CPU usage as a percentage of total CPU cycles . . . . 11
5.3. Memory Usage . . . . . . . . . . . . . . . . . . . . . . 12
5.4. Network Throughput . . . . . . . . . . . . . . . . . . . 12
5.5. Packet Processing Latency . . . . . . . . . . . . . . . . 14
6. Security Considerations . . . . . . . . . . . . . . . . . . . 15
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
Elkins, et al. Expires 23 August 2024 [Page 2]
Internet-Draft pdm-ebpf February 2024
8. Normative References . . . . . . . . . . . . . . . . . . . . 15
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16
1. Introduction
1.1. Background
1.1.1. PDM
The Performance and Diagnostic Metrics (PDM) Extension Header,
designated in [RFC8250], introduces a method to discern server
processing delays from round trip network delays within IPv6
networks. This extension is a type of Destination Options header, a
component of the IPv6 protocol.
The PDM header incorporates several fields, notably Packet Sequence
Number This Packet (PSNTP), Packet Sequence Number Last Received
(PSNLR), Delta Time Last Received (DTLR), Delta Time Last Sent
(DTLS), and scaling factors for these delta times. These elements,
when correlated with a unique 5-tuple identifier, facilitate the
precise measurement of network and server delays. The PDM header's
utility lies in its ability to provide concrete data on network and
server performance. By differentiating between the delays caused by
network round trips and server processing, it enables quick
identification of performance bottlenecks.
Implementations of the PDM header must keep track of sequence numbers
and timestamps for both incoming and outgoing packets, associated
with each 5-tuple. The header's design emphasizes flexibility in its
activation, accuracy in timestamp recording, and configurable
parameters for information lifespan and memory allocation as detailed
in Section 3.5 of RFC 8250.
1.1.2. eBPF
eBPF, an extensible programming framework within the Linux kernel,
operates as a virtual machine allowing users to run isolated programs
in kernel space, thereby customizing network processing, monitoring,
and security without needing kernel recompilation. These user-
defined programs are first compiled into eBPF bytecode, followed by a
verification process that assures termination and checks for
potential errors such as invalid pointers or array bounds, adding an
extra layer of security. Due to their optimized bytecode, eBPF
programs run efficiently within the kernel's virtual machine. eBPF
offers various hook points within the kernel, such as in the
networking stack, enabling users to attach their programs based on
specific requirements, like network monitoring or packet
Elkins, et al. Expires 23 August 2024 [Page 3]
Internet-Draft pdm-ebpf February 2024
modification. This flexibility allows for a tailored kernel behavior
to suit different use cases, enhancing the system's functionality and
security.
2. Using tc-bpf to add IPv6 extension headers
2.1. tc-bpf
The cls_bpf component within tc is a classifier that uses BPF,
including both classic BPF (cBPF) and extended BPF (eBPF), for packet
filtering and classification. eBPF can be used to directly perform
actions on the socket buffer (skb), such as packet mangling or
updating checksums. One of the features of cls_bpf classifier is its
ability to facilitate efficient, non-linear classification. Unlike
traditional tc classifiers that may require multiple parsing passes
(one each per classifier), cls_bpf, with the help of eBPF, can tailor
a single program for diverse skb types, avoiding redundant parsing.
cls_bpf operates in two distinct modes: originally calling into the
full tc action engine, tcf_exts_exec and a more efficient 'direct
action' (da) mode for immediate return after bpf run. The da mode
allows cls_bpf to simply return a tc opcode and perform tc actions
without the need for traversing multiple layers in the tc action
engine.
In direct-action(da) mode, eBPF can store class identifiers (classid)
in skb->tc_classid and return the action opcode, suitable even for
simple cBPF operations like drop actions. cls_bpf's flexibility also
allows administrators to use multiple classifiers in mixed modes (da
and non-da) based on specific use cases. However, for high-
performance workloads, a single tc eBPF cls_bpf classifier in da mode
is generally sufficient and recommended due to its efficiency.
2.2. Adding IPv6 extension headers in tc
Adding an extension header to the packet requires creating space for
the header followed by inserting the data and padding. This task
utilizes eBPF helper functions specific to packet manipulation with
skb, such as bpf_skb_adjust_room for creating space,
bpf_skb_load_bytes for loading data from skb, and bpf_skb_store_bytes
for storing bytes in the adjusted skb.
The tc-bpf hookpoint caters to both ingress and egress traffic, vital
in scenarios where measurements in ingress are needed or when packet
data in ingress is used for calculating extension headers in egress.
Elkins, et al. Expires 23 August 2024 [Page 4]
Internet-Draft pdm-ebpf February 2024
The traffic control subsystem is located in the lower levels of the
network stack, which implies minimal packet processing after this
stage. Adding an extension header after the packet is fully formed
can result in the packet exceeding the Maximum Transmission Unit
(MTU), leading to potential packet drops. It's important to check
the packet size to ensure it doesn't exceed the MTU with the added
extension header. The packet size can be verified against the
exceeding MTU of net device (based on ifindex) using the
bpf_check_mtu helper function.
tc-bpf programs can also utilize the bpf_redirect helper to redirect
packets to the ingress or egress TC hook points of any interface in
the host, useful for routing purposes. An additional benefit of
using TC or any other eBPF hook point is the simplicity in exporting
data received in extension headers for logging and monitoring. This
is facilitated through eBPF maps, accessible from both kernel and
user space. BPF maps like BPF_MAP_TYPE_PERF_EVENT_ARRAY and
BPF_MAP_TYPE_RINGBUF are used for streaming real-time data from the
extension headers, providing precise control over poll/epoll
notifications to userspace about new data in the buffers.
2.2.1. Ingress tc-bpf program
A BPF program can be attached to the ingress of the clsact qdisc for
a specific network interface. This program executes for every packet
received on this interface. The purpose of attaching a BPF program
at the ingress is to conduct specific measurements necessary for
calculating certain fields in the extension header. Should the need
arise to categorize information from incoming packets based on the
5-tuple, a hashmap BPF map can be employed. The ability to access
BPF maps across different eBPF programs is beneficial, particularly
for utilizing data recorded in the ingress BPF program within the
egress BPF program.
It's possible to define actions at ingress based on data from
incoming packets in direct action mode. For instance, the ingress
BPF program might decide to drop a packet based on its received
extension header, returning TC_ACT_SHOT, or to forward the packet by
returning TC_ACT_OK. Additional actions in the classifier-action
subsystem, like TC_ACT_REDIRECT, are available for use with
bpf_redirect and other relevant functions.
Elkins, et al. Expires 23 August 2024 [Page 5]
Internet-Draft pdm-ebpf February 2024
2.2.2. Egress tc-bpf program
A BPF program is attachable to the egress point of the clsact qdisc
designated for a specific network interface, functioning for every
packet exiting this interface. The role of this egress BPF program
includes preparing space for the extension header in the skb,
assembling the extension header tailored for the particular outbound
packet, and appending the extension header to the packet.
In cases where the extension header is stateless, an egress BPF
program alone might be adequate, as no flow-related measurements are
required. The data to be integrated into the extension header solely
depends on the current outgoing packet. If the extension header
fields depend on the data from incoming packets or previously sent
packets, utilizing BPF maps becomes necessary to store and
subsequently utilize this data for computing specific fields in the
extension headers.
The egress BPF program also has access to a similar set of actions.
For instance, if a packet is discovered to be malformed, the program
has the capacity to drop the packet using TC_ACT_SHOT before it is
transmitted. Successful addition of the extension header
necessitates the return of TC_ACT_OK, propelling the packet to the
subsequent phase in the network stack.
The additional advantage of using TC or any other eBPF hook point is
that if the data received in the extension headers were of interest
in terms of logging and monitoring, the exporting of this data is
made really simple through the use of eBPF maps which are accessible
from both kernel space and user space. BPF maps of types
BPF_MAP_TYPE_PERF_EVENT_ARRAY and BPF_MAP_TYPE_RINGBUF can be used
for streaming of the real time data obtained from the extension
headers. They give fine grain control to the eBPF program for poll/
epoll notifications to any userspace consumer about new data
availability in the buffers.
3. Implementation of PDM extension header in tc-bpf
PDM is implemented using both ingress and egress tc-bpf programs.
The ingress program's chief responsibility lies in the interpretation
of incoming packets adorned with the PDM extension header and
recording the reception time of these packets. The egress program
assumes the role of appending the extension header, leveraging the
ingress timestamp to compute the elapsed time since the last packet
was received and sent within the same flow. These timestamps are
effectively communicated and preserved between the two programs via a
BPF map, specifically of the BPF_MAP_TYPE_HASH variety. The mapping
key is constituted by the 5-tuple flow, which includes ipv6 source
Elkins, et al. Expires 23 August 2024 [Page 6]
Internet-Draft pdm-ebpf February 2024
and destination addresses, TCP/UDP source and destination ports, and
the Transport layer protocol. In scenarios involving ICMP packets,
the source and destination ports are assigned a value of zero.
3.1. Egress tc-bpf program for PDM
The egress eBPF program should first conduct essential validations on
the sizes of the ethernet and IP headers, and ascertain whether the
packet in question is IPv6. Should the packet be non-IPv6, it
returns with the action TC_ACT_OK and the packet proceeds unaltered.
The program subsequently examines if the packet's next header field
indicates the presence of an extension header. In instances where
any form of extension header exists, the addition of PDM is withheld.
This restraint stems from the complexity involved in integrating an
extension header, requiring the parsing of existing ones and
accurately positioning the PDM header. The challenge is compounded
by the limitation of bpf_skb_adjust_room, which permits augmenting
the packet size only subsequent to the fixed-length IPv6 header, thus
necessitating a reorganization of the other extension headers within
the eBPF program.
The egress eBPF program extracts the IPv6 source and destination
addresses, and in cases involving TCP/UDP, it also parses the source
and destination ports from the transport layer. This data is used in
the formulation of a 5-tuple key utilized for accessing the eBPF Map.
The program retrieves timestamps and packet sequence number of the
last received packet and last sent packet from the eBPF map.
The extension header fields are then computed using the current
timestamp, acquired through bpf_ktime_get_ns. This current timestamp
is then stored back in the eBPF map under the packet last sent field,
for future reference. The Delta Time Last Received (DTLR) field is
calculated by determining the difference between the Time Last Sent
and Time Last Received of the latest entry. The Delta Time Last Sent
(DTLS) is computed as the difference between the Time Last Received
of the latest entry and the Time Last Sent of the preceding entry.
The Packet Sequence Number This Packet (PSNTP) is calculated by
incrementing the sequence number of the last sent packet. The Packet
Sequence Number Last Received (PSNLR) is taken directly from the map.
These methodologies are in accordance with Section 3.2.1 of RFC 8250.
Given that PDM is categorized as a destination options extension
header, the next header is set accordingly. The space requirement
for storing PDM stands at 12 bytes, with an additional 2 bytes for
the destination options header and another 2 bytes for padding.
Following the execution of bpf_skb_adjust_room to augment the skb
Elkins, et al. Expires 23 August 2024 [Page 7]
Internet-Draft pdm-ebpf February 2024
size by 16 bytes, the program employs bpf_skb_store_bytes to store
the structured destination options header and the PDM header. Upon
successful insertion of the header, the egress BPF program finishes
its operation by returning TC_ACT_OK.
3.2. Ingress tc-bpf program for PDM
The ingress eBPF program should first conduct essential validations
on the sizes of the ethernet and IP headers, and ascertain whether
the packet in question is IPv6. Should the packet be non-IPv6, it
returns with the action TC_ACT_OK and the packet proceeds unaltered.
It also checks if the packet has a destination options header and if
it does, it checks if the header is a PDM header.
The calculation of the fields "Delta Time Last Sent" and "Delta Time
Last Received," along with their respective scaling factors, is
contingent on the "Time Last Received" field located in the BPF map,
pertaining to the relevant 5-tuple. The ingress BPF program is
responsible for capturing the timestamp when a packet, corresponding
to a specific 5-tuple, is received. This capture is executed using
the function bpf_ktime_get_ns, and the result is subsequently stored
in the map.
In the context of outgoing packets during egress, the "Packet
Sequence Number Last Received" is derived from the "Packet Sequence
Number This Packet" field located in the PDM header of the received
packet. After the successful storage of both these values in the BPF
map, the ingress BPF program finishes its operation by returning
TC_ACT_OK.
3.3. Implementation of PDM initiation
The process of adding Performance and Diagnostic Metrics (PDM)
involves verifying the existence of an entry for the corresponding
5-tuple within the BPF map. If no such entry exists, the program
initiates PDM for this flow by creating a new one..This action is
prompted each time an IPv6 packet is either received or transmitted.
The structure of the entries in the BPF map consists of the 5-tuple
serving as the key and the value encompassing various elements such
as the Packet Sequence Number Last Sent (PSNLS), Packet Sequence
Number Last Received (PSNLR), Time Last Received (TLR), and Time Last
Sent (TLS).
During the initial phase, the Packet Sequence Number Last Sent
(PSNLS) is assigned a random value, achieved through the use of the
helper function bpf_get_prandom_u32, which generates a random 32-bit
integer. Additionally, for the first packet, the Packet Sequence
Elkins, et al. Expires 23 August 2024 [Page 8]
Internet-Draft pdm-ebpf February 2024
Number Last Received (PSNLR) and Time Last Received (TLR) are set to
zero, as the ingress BPF program has not yet been executed for the
specific 5-tuple.
3.4. Implementation of PDM termination
Stale entries corresponding to a flow are to be removed after a
certain amount of time, as new flows with the same 5-tuple can use
the stale data stored for the same 5-tuple a long time ago. This
should be done through a configurable maximum lifetime limit for the
entries.
One way to remove stale entries is through constant polling of the
map to check for entries that have not been updated for the
configured period, which identifies the entries as stale entries.
This can be done using userspace programs as BPF maps are accessible
from both the kernel space and user space. All the entries in the
map are checked, and stale entries are removed using the
bpf_map_delete_elem helper function.
Another way is to handle this mechanism completely in eBPF by
calculating the differences between Time Last Sent (TLS) and Time
Last Received (TLR) with the current timestamp for every single
packet in both ingress and egress and if both these differences are
above a configured maximum limit, then the map entry fields are reset
and the PDM flow for that 5 tuple is reinitialized.
4. Advantages of using eBPF to add extension headers
eBPF offers the capability for dynamic loading and unloading of BPF
programs, facilitating the ease of activating or deactivating the
insertion of extension headers into outgoing packets. The
utilization of tc and xdp hook points enhances the precision of
timestamps for wire arrival time, due to their location at the lower
layers of the network stack. Additionally, eBPF simplifies memory
management in high traffic scenarios, as it allows for the
configuration of the maximum number of entries in eBPF maps via its
API.
eBPF programs are also very portable and can be used across different
kernel versions as long as it is compatible. This is beneficial as
it allows for the easy migration of the PDM implementation across
different kernel versions, ensuring that the PDM implementation
remains consistent across different kernel versions.
Implementing extension header insertion within the kernel can
introduce development challenges, such as potential memory leaks due
to inadequate memory deallocation processes. The configurability of
Elkins, et al. Expires 23 August 2024 [Page 9]
Internet-Draft pdm-ebpf February 2024
the maximum number of entries in a BPF map addresses this issue,
preventing memory overflow. The presence of the BPF verifier is
instrumental in ensuring both security and simplicity of
implementation. It conducts essential checks, including pointer
validation, buffer overflow prevention, and loop avoidance in the
code, thereby mitigating the risks of crashes or security
vulnerabilities. To safeguard against misuse, eBPF imposes resource
constraints on programs, such as limits on the number of executable
instructions, thereby upholding system stability and integrity.
5. Performance Analysis
5.1. Experiment Setup
Two Virtual Machines with 8 cores, 16 GB of Ram and 64 GB of disk
space were used to run the following tests. The Virtual Machines are
running Ubuntu 22.04 server operating system running linux kernel of
version 5.15.148 which was compiled using the same kernel
configuration as the prepackaged kernel 5.15.94. Both the VMs are
running on the same physical server using Qemu/KVM as hypervisor. We
compared the performance of the eBPF implementation of PDM with a
traditional kernel implementation of PDM (add reference). The
performance metrics used for comparison are CPU Performance, Memory
Usage, Network Throughput and Packet Processing Latency.
5.2. CPU Performance
Profiling of CPU cycles consumed by eBPF programs and the kernel
implementation has been performed to evaluate the computational
overhead introduced by these functions. The perf tool was used to
capture CPU cycle events and configured with a polling frequency of
10,000 Hz.
Each experiment was structured to run an iperf3 server session using
TCP for a duration of 600 seconds or five minutes, simulating a
consistent and controlled traffic load. Iperf was also configured to
use an MSS value of 1000 bytes across all tests while the MTU of the
interface and path was 1500 bytes. This allowed us to avoid
accounting for packet size becoming greater than the MTU in the eBPF
program.
This procedure was replicated across fifty individual trials per
implementation. The repetition of these trials under uniform
conditions and for a long duration allowed for the collection of a
comprehensive profile of CPU cycle usage, which is useful for
evaluating the efficiency and scalability of the eBPF processing in
real-world networking scenarios.
Elkins, et al. Expires 23 August 2024 [Page 10]
Internet-Draft pdm-ebpf February 2024
For the eBPF program, perf is able to record data for egress and
ingress programs separately. For the kernel implementation, the
pdm_insert function call duration was measured for each iperf3 server
session. This represents the overhead in egress in the kernel
implementation.
5.2.1. CPU Usage in cycles
+===================+==============+==============+=============+
| CPU Usage(cycles) | Mean | Median | St. Dev. |
+===================+==============+==============+=============+
| eBPF Egress | 8.60e10 cyc. | 8.54e10 cyc. | 9.08e9 cyc. |
+-------------------+--------------+--------------+-------------+
| eBPF Ingress | 1.53e10 cyc. | 1.57e10 cyc. | 8.71e9 cyc. |
+-------------------+--------------+--------------+-------------+
| PDM Kernel Egress | 2.29e9 cyc. | 2.13e9 cyc. | 6.49e8 cyc. |
+-------------------+--------------+--------------+-------------+
Table 1
5.2.2. CPU usage as a percentage of total CPU cycles
+===================+=========+=========+==========+
| CPU Usage(%) | Mean | Median | St. Dev. |
+===================+=========+=========+==========+
| eBPF Egress | 0.41% | 0.40% | 0.10% |
+-------------------+---------+---------+----------+
| eBPF Egress | 0.07% | 0.07% | 0.03% |
+-------------------+---------+---------+----------+
| PDM Kernel Egress | 0.0110% | 0.0100% | 0.0030% |
+-------------------+---------+---------+----------+
Table 2
The CPU cycles consumed by the PDM Kernel Implementation is lower
than the eBPF counterpart. This denotes a measurably higher
computational demand for eBPF operations. However, it's noteworthy
that the kernel approach, despite its limited flexibility compared to
eBPF, demonstrates a lower overhead, signifying its streamlined
efficiency.
On a test run with call stack enabled in perf, the percentage
overheads of some of the symbols invoked by eBPF egress function were
obtained. The major portion of egress overhead is bpf map read/write
operations, and memcpy operation for the copy of packet data to and
from kernel memory.
Elkins, et al. Expires 23 August 2024 [Page 11]
Internet-Draft pdm-ebpf February 2024
It would be interesting to examine the effect of lowering the number
of bpf_skb_store_bytes and bpf_skb_load_bytes by loading the entire
packet into the eBPF program, modifying the packet in the eBPF
program and then storing the modified packet into skb. The current
implementation invokes bpf_skb_store_bytes and bpf_skb_load_bytes
many times for disjoint parts of the packet. This could be a
potential optimization for the eBPF program.
5.3. Memory Usage
This PDM implementation using eBPF uses memory while storing the
state of the 5 tuple flows. The memory management is handled by eBPF
maps. Each map entry stores a value of size 20 bytes - 2 bytes each
for Packet Sequence Number This Packet (PSNTP) and Packet Sequence
Number Last Received (PSNLR) and 8 bytes each for Time Last Sent
(TLS) and Time Last Received (TLR).
The BPF maps have been configured to a maximum limit of 65,536
entries. This means the implementation can handle 65,536 flows at
once. While handling the maximum of these flows we will expect the
total data to be stored in the eBPF maps to be 1310720 Bytes or 1.3
MB. There is additional overhead added by the eBPF maps structures
themselves but the effect on this total is not very large.
If more than 65,536 flows are encountered then new flows replace
older entries in the maps. The BPF_MAP_TYPE_LRU_HASH variant of the
BPF Hash Map is used in the implementation so the older flows are
replaced in a least recently used fashion.
5.4. Network Throughput
+===========================+============+============+===========+
| Network Throughput | Mean | Median | St. Dev. |
+===========================+============+============+===========+
| Without PDM | 18.80 Gbps | 18.58 Gbps | 2.19 Gbps |
+---------------------------+------------+------------+-----------+
| PDM Kernel Implementation | 18.52 Gbps | 18.33 Gbps | 2.21 Gbps |
+---------------------------+------------+------------+-----------+
| eBPF Implementation | 18.03 Gbps | 17.22 Gbps | 2.51 Gbps |
+---------------------------+------------+------------+-----------+
Table 3
Elkins, et al. Expires 23 August 2024 [Page 12]
Internet-Draft pdm-ebpf February 2024
Profiling of Network Throughput consumed by attaching PDM extension
header has been done to determine the throughput overhead. Each
experiment was structured to run an iperf3 server session using TCP
for a duration of 600 seconds or five minutes, simulating a
consistent and controlled traffic load. There was no perf running in
any of these tests.
This procedure was replicated across twenty five individual trials.
The repetition of these trials were conducted under uniform
conditions. The network throughput was measured for the case when
PDM is not attached, when PDM is attached using the kernel
implementation and when PDM is attached using the eBPF
implementation.
When PDM is not attached, the network throughput is the highest as
expected. A slight decrease is observed in the kernel
implementation, with a further decrease in the eBPF implementation.
This indicates that while both methods impact network performance,
the eBPF implementation has a slightly more pronounced effect. The
standard deviation across these measurements suggests some
variability in the test network conditions. This might be a result
to consider while implementing extension headers in eBPF.
+===========================+========+========+==========+
| TCP Retransmits | Mean | Median | St. Dev. |
+===========================+========+========+==========+
| Without PDM | 2.125 | 2.0 | 1.832 |
+---------------------------+--------+--------+----------+
| PDM Kernel Implementation | 44.125 | 41.5 | 13.531 |
+---------------------------+--------+--------+----------+
| eBPF Implementation | 37.565 | 36.0 | 10.133 |
+---------------------------+--------+--------+----------+
Table 4
The TCP retransmits were extracted for the test runs conducted for
network throughput. The number of TCP retransmits is higher when PDM
is attached using the kernel implementation and the eBPF
implementation. This might be due to a fault in the implementation
itself or packet drops happening due to extension header addition.
Elkins, et al. Expires 23 August 2024 [Page 13]
Internet-Draft pdm-ebpf February 2024
5.5. Packet Processing Latency
+===============================+==========+==========+==========+
| Packet Processing Latency | Mean | Median | St. Dev. |
+===============================+==========+==========+==========+
| PDM Kernel Implementation | 0.707 µs | 0.641 µs | 0.414 µs |
+-------------------------------+----------+----------+----------+
| eBPF Egress Program Attached | 5.808 µs | 6.142 µs | 0.986 µs |
+-------------------------------+----------+----------+----------+
| eBPF Egress Program Detached | 4.528 µs | 4.668 µs | 0.785 µs |
+-------------------------------+----------+----------+----------+
| eBPF Ingress Program Attached | 3.634 µs | 3.977 µs | 0.906 µs |
+-------------------------------+----------+----------+----------+
| eBPF Ingress Program Detached | 3.082 µs | 3.321 µs | 1.246 µs |
+-------------------------------+----------+----------+----------+
Table 5
Functions within the kernel involved in packet processing can be
profiled using ftrace to determine the exact duration taken in
processing packets. The PDM insertion function (which is a part of
the PDM Kernel Implementation) call duration was measured for a
duration of 15 minutes while running an iperf3 server session.
For egress eBPF program, the duration of dev_queue_xmit() function
call in the kernel was measured with and without the eBPF egress
program attached for a duration of 15 minutes while running an iperf3
server session. Similarly, for the ingress eBPF program, the
duration of netif_receive_skb_list_internal() function call in the
kernel was measured with and without the eBPF ingress program
attached for a duration of 15 minutes while running an iperf3 server
session.
The profiling of eBPF egress program with respect to packet
processing latency is done by calculating the difference in the
duration of dev_queue_xmit() function call in the kernel with and
without the eBPF egress program attached. This indicates that the
eBPF egress program introduces a latency of approximately 1.280 µs.
The profiling of eBPF ingress program with respect to packet
processing latency is done by calculating the difference in the
duration of netif_receive_skb_list_internal() function call in the
kernel with and without the eBPF ingress program attached. This
indicates that the eBPF ingress program introduces a latency of
approximately 0.552 µs. It should be noted however that ftrace is
affected by context switches and scheduling latencies in the kernel
and the scheduling of the VM itself on the host.
Elkins, et al. Expires 23 August 2024 [Page 14]
Internet-Draft pdm-ebpf February 2024
6. Security Considerations
BPF utilizes maps to store various data elements, including 5-tuple
information about network flows. These maps have a configurable
limit on the number of entries they can hold, which is crucial for
efficient memory usage and performance optimization. However, this
characteristic also opens up a potential vulnerability to resource
exhaustion attacks.
An attacker, by intentionally sending packets with numerous distinct
5-tuples, could overrun the BPF maps. As these maps reach their
maximum capacity, legitimate new entries cannot be added, or lead to
existing entries being replaced by the new flows, potentially leading
to incorrect packet processing or denial of service as critical flows
might be untracked or misclassified. This scenario is particularly
concerning in high-throughput environments where the rate of new flow
creation is significant.
To mitigate such attacks, it is essential to implement a robust
mechanism that not only monitors the usage of BPF maps but also
employs intelligent strategies to handle map overruns. This could
include techniques like early eviction of least-recently-used
entries, dynamic resizing of maps based on traffic patterns, or even
alert mechanisms for anomalous growth in map entries.
Additionally, rate-limiting strategies could be enforced at the
network edge to prevent an overwhelming number of new flows from
entering the network, thus offering a first line of defense against
such resource exhaustion attacks.
7. IANA Considerations
This document has no IANA actions.
8. Normative References
[RFC8250] Elkins, N., Hamilton, R., and M. Ackermann, "IPv6
Performance and Diagnostic Metrics (PDM) Destination
Option", RFC 8250, DOI 10.17487/RFC8250, September 2017,
<https://www.rfc-editor.org/rfc/rfc8250>.
Acknowledgments
The Authors extend their gratitude to Ameya Deshpande for providing
the kernel implementation of PDM, which served as a basis for
comparison with the eBPF implementation.
Elkins, et al. Expires 23 August 2024 [Page 15]
Internet-Draft pdm-ebpf February 2024
Authors' Addresses
Nalini Elkins
Inside Products, Inc.
United States
Email: nalini.elkins@insidethestack.com
Chinmaya Sharma
NITK Surathkal
India
Email: chinmaysharma1020@gmail.com
Amogh Umesh
NITK Surathkal
India
Email: amoghumesh02@gmail.com
Balajinaidu V
NITK Surathkal
India
Email: balajinaiduhanur@gmail.com
Mohit P. Tahiliani
NITK Surathkal
India
Email: tahiliani@nitk.edu.in
Elkins, et al. Expires 23 August 2024 [Page 16]