Introduction to bpftrace

bpftrace is a high-level tracing language for Linux eBPF. It provides a concise way to write tracing programs that can efficiently extract information from the Linux kernel and userspace. Some key features of bpftrace include:

Simple and familiar tracing language based on C and awk
Low overhead probes with little performance impact
Support for probing the Linux kernel dynamically (kprobes) and statically (tracepoints)
User-level dynamic tracing (uprobes)
Built-in support for common tracing functionality like histograms, counting, aggregations, etc.
Easy interoperation with other Linux tracing/monitoring tools

bpftrace programs consists of probe definitions, actions, and variables. Let’s look at each part in more detail.

Probes

Probes define the points in userspace/kernel execution where bpftrace programs can be attached. Some examples:

kprobe: Attach to kernel functions
kretprobe: Attach to kernel function returns
uprobe: Attach to userspace functions
tracepoint: Attach to static kernel tracepoints
profile: Time-based sampling

For example:

kprobe:do_sys_open 
{
  // actions
}

Attaches a probe to the do_sys_open kernel function. Multiple probes can be defined in the same program.

Actions

Actions make up the probe’s body, and are executed when the probe is hit. For example:

kprobe:do_sys_open
{
  @opens = count();  // count number of calls
  printf("%s opened %s\n", comm, str(arg2)); // print file open details
}

Actions can call built-in and user-defined functions, modify state with variables, print information, etc.

Variables

bpftrace includes various built-in variables that are available for use in actions:

pid – Process ID
tid – Thread ID
uid – User ID
cpu – CPU number
comm – Process name

For example:

printf("Process %s (PID %d) called do_sys_open()\n", comm, pid);

You can also define custom global variables (@) and per-thread variables (@thread[tid]).

Maps

Special variables called maps are used for aggregations and histograms:

@opens = count(); // count calls 
@bytes = hist(arg2); // histogram of bytes

Built-in Functions

Many helpful functions are built-in for formatting, output, kernel data structures, and more:

printf(): formatted output
time(): formatted time
join(): join map keys
str(): convert pointers to string
ksym(), kaddr(): kernel symbol translation
reg(): read registers
system(): run shell command

For example:

printf("%s read %d bytes\n", comm, retval);

Examples

Here’s a simple bpftrace one-liner that traces read() calls:

bpftrace -e 'kprobe:sys_read { printf("%s read %d bytes\n", comm, arg2); }'

And a program file example that summarizes disk size by process:

# cat disks.bt
kprobe:sys_open
{
  @bytes[comm] = 0;
}

kprobe:sys_read /fd >= 0/
{
  @bytes[comm] += arg3;
}

END
{
  clear(@bytes);
}

This summarizes the number of bytes read from disk per process.

Advanced Usage

bpftrace can also be used to build more complex tools and custom instrumentation. Some examples:

Custom userspace static tracing with USDT probes
Kernel-level histogram snapshots on interval timers
Custom tools with multiple probe types and complex functionality
Interoperation with other tracing tools like perf via kernel probes

Bpftrace provides a high-level abstraction on top of eBPF/Linux tracing capabilities. The language is easy to use while allowing efficient programs to be written that extract key information. Hope this guide provides a good overview of how to use bpftrace for tracing Linux systems! For more info follow this link.