bpftrace is a high-level tracing language for Linux eBPF. It provides a concise way to write tracing programs that can efficiently extract information from the Linux kernel and userspace. Some key features of bpftrace include:
- Simple and familiar tracing language based on C and awk
- Low overhead probes with little performance impact
- Support for probing the Linux kernel dynamically (kprobes) and statically (tracepoints)
- User-level dynamic tracing (uprobes)
- Built-in support for common tracing functionality like histograms, counting, aggregations, etc.
- Easy interoperation with other Linux tracing/monitoring tools
bpftrace programs consists of probe definitions, actions, and variables. Let’s look at each part in more detail.
Probes
Probes define the points in userspace/kernel execution where bpftrace programs can be attached. Some examples:
kprobe
: Attach to kernel functionskretprobe
: Attach to kernel function returnsuprobe
: Attach to userspace functionstracepoint
: Attach to static kernel tracepointsprofile
: Time-based sampling
For example:
kprobe:do_sys_open
{
// actions
}
Attaches a probe to the do_sys_open
kernel function. Multiple probes can be defined in the same program.
Actions
Actions make up the probe’s body, and are executed when the probe is hit. For example:
kprobe:do_sys_open
{
@opens = count(); // count number of calls
printf("%s opened %s\n", comm, str(arg2)); // print file open details
}
Actions can call built-in and user-defined functions, modify state with variables, print information, etc.
Variables
bpftrace includes various built-in variables that are available for use in actions:
pid
– Process IDtid
– Thread IDuid
– User IDcpu
– CPU numbercomm
– Process name
For example:
printf("Process %s (PID %d) called do_sys_open()\n", comm, pid);
You can also define custom global variables (@
) and per-thread variables (@thread[tid]
).
Maps
Special variables called maps are used for aggregations and histograms:
@opens = count(); // count calls
@bytes = hist(arg2); // histogram of bytes
Built-in Functions
Many helpful functions are built-in for formatting, output, kernel data structures, and more:
printf()
: formatted outputtime()
: formatted timejoin()
: join map keysstr()
: convert pointers to stringksym()
,kaddr()
: kernel symbol translationreg()
: read registerssystem()
: run shell command
For example:
printf("%s read %d bytes\n", comm, retval);
Examples
Here’s a simple bpftrace one-liner that traces read()
calls:
bpftrace -e 'kprobe:sys_read { printf("%s read %d bytes\n", comm, arg2); }'
And a program file example that summarizes disk size by process:
# cat disks.bt
kprobe:sys_open
{
@bytes[comm] = 0;
}
kprobe:sys_read /fd >= 0/
{
@bytes[comm] += arg3;
}
END
{
clear(@bytes);
}
This summarizes the number of bytes read from disk per process.
Advanced Usage
bpftrace can also be used to build more complex tools and custom instrumentation. Some examples:
- Custom userspace static tracing with USDT probes
- Kernel-level histogram snapshots on interval timers
- Custom tools with multiple probe types and complex functionality
- Interoperation with other tracing tools like perf via kernel probes
Bpftrace provides a high-level abstraction on top of eBPF/Linux tracing capabilities. The language is easy to use while allowing efficient programs to be written that extract key information. Hope this guide provides a good overview of how to use bpftrace for tracing Linux systems! For more info follow this link.