Introduction to bpftrace

bpftrace is a high-level tracing language for Linux eBPF. It provides a concise way to write tracing programs that can efficiently extract information from the Linux kernel and userspace. Some key features of bpftrace include:

  • Simple and familiar tracing language based on C and awk
  • Low overhead probes with little performance impact
  • Support for probing the Linux kernel dynamically (kprobes) and statically (tracepoints)
  • User-level dynamic tracing (uprobes)
  • Built-in support for common tracing functionality like histograms, counting, aggregations, etc.
  • Easy interoperation with other Linux tracing/monitoring tools

bpftrace programs consists of probe definitions, actions, and variables. Let’s look at each part in more detail.

Probes

Probes define the points in userspace/kernel execution where bpftrace programs can be attached. Some examples:

  • kprobe: Attach to kernel functions
  • kretprobe: Attach to kernel function returns
  • uprobe: Attach to userspace functions
  • tracepoint: Attach to static kernel tracepoints
  • profile: Time-based sampling

For example:

kprobe:do_sys_open 
{
  // actions
}

Attaches a probe to the do_sys_open kernel function. Multiple probes can be defined in the same program.

Actions

Actions make up the probe’s body, and are executed when the probe is hit. For example:

kprobe:do_sys_open
{
  @opens = count();  // count number of calls
  printf("%s opened %s\n", comm, str(arg2)); // print file open details
}

Actions can call built-in and user-defined functions, modify state with variables, print information, etc.

Variables

bpftrace includes various built-in variables that are available for use in actions:

  • pid – Process ID
  • tid – Thread ID
  • uid – User ID
  • cpu – CPU number
  • comm – Process name

For example:

printf("Process %s (PID %d) called do_sys_open()\n", comm, pid);

You can also define custom global variables (@) and per-thread variables (@thread[tid]).

Maps

Special variables called maps are used for aggregations and histograms:

@opens = count(); // count calls 
@bytes = hist(arg2); // histogram of bytes

Built-in Functions

Many helpful functions are built-in for formatting, output, kernel data structures, and more:

  • printf(): formatted output
  • time(): formatted time
  • join(): join map keys
  • str(): convert pointers to string
  • ksym()kaddr(): kernel symbol translation
  • reg(): read registers
  • system(): run shell command

For example:

printf("%s read %d bytes\n", comm, retval);

Examples

Here’s a simple bpftrace one-liner that traces read() calls:

bpftrace -e 'kprobe:sys_read { printf("%s read %d bytes\n", comm, arg2); }'

And a program file example that summarizes disk size by process:

# cat disks.bt
kprobe:sys_open
{
  @bytes[comm] = 0;
}

kprobe:sys_read /fd >= 0/
{
  @bytes[comm] += arg3;
}

END
{
  clear(@bytes);
}

This summarizes the number of bytes read from disk per process.

Advanced Usage

bpftrace can also be used to build more complex tools and custom instrumentation. Some examples:

  • Custom userspace static tracing with USDT probes
  • Kernel-level histogram snapshots on interval timers
  • Custom tools with multiple probe types and complex functionality
  • Interoperation with other tracing tools like perf via kernel probes

Bpftrace provides a high-level abstraction on top of eBPF/Linux tracing capabilities. The language is easy to use while allowing efficient programs to be written that extract key information. Hope this guide provides a good overview of how to use bpftrace for tracing Linux systems! For more info follow this link.

Share your love
Varnesh Gawde
Varnesh Gawde
Articles: 59

Leave a Reply

Your email address will not be published. Required fields are marked *