beyla反对通过ebpf,主动采集应用程序的trace信息。

对于golang程序,beyla还反对trace context progagation,即微服务之间的trace上下文流传,这样服务之间调用的链条就连起来了,达到了一般的侵入式tracing同样的成果。

以golang的nethttp为例,讲述beyla对trace context propagation的实现原理。

一. 整体原理

Trace context propagation会监听对外部服务的HTTP调用,在HTTP header中减少traceparent字段。

通过http header中traceparent字段,实现了trace context propagation。

在golang利用中,对外部服务的Http调用,会顺次调用:

  • net/http.(*Transport).roundTrip
  • net/http.Header.writeSubset

监听roundTrip时:

  • 以key=goroutine_addr, value=trace,写入ongoing_http_client_requests对象;
  • 以key=header_addr,value=goroutine_addr,写入header_req_map对象;

监听header_writeSubset时:

  • 依据header_addr,查找header_req_map对象,找到goroutine_addr;
  • 依据goroutine_addr,查找ongoing_http_client_requests对象,找到trace信息;
  • 最初,应用bpf辅助函数:bpf_probe_write_user(),将trace信息中的traceparent写入http的header;

二. 监听uprobe/roundTrip

解决流程:

  • 首先,将goroutine及其trace信息,写入ongoing_http_client_requests对象;
  • 而后:

    • 若以后request header中没有traceparent,则将key=header_addr,value=goroutine_addr写入header_req_map对象;
    • 若以后request header中有traceparent,则不须要做什么,因为http header中曾经有trace信息了;
// bpf/go_nethttp.cSEC("uprobe/roundTrip")int uprobe_roundTrip(struct pt_regs *ctx) {    roundTripStartHelper(ctx);    return 0;}
// bpf/go_nethttp.c/* HTTP Client. We expect to see HTTP client in both HTTP server and gRPC server calls.*/static __always_inline void roundTripStartHelper(struct pt_regs *ctx) {    ....    // 将gorouinte及其trace信息,写入ongoing_http_client_requests对象    if (bpf_map_update_elem(&ongoing_http_client_requests, &goroutine_addr, &invocation, BPF_ANY)) {        bpf_dbg_printk("can't update http client map element");    }// 若反对header propagation#ifndef NO_HEADER_PROPAGATION    if (!existing_tp) { // request中没有traceparent        void *headers_ptr = 0;        bpf_probe_read(&headers_ptr, sizeof(headers_ptr), (void*)(req + req_header_ptr_pos));        bpf_dbg_printk("goroutine_addr %lx, req ptr %llx, headers_ptr %llx", goroutine_addr, req, headers_ptr);                if (headers_ptr) {            bpf_map_update_elem(&header_req_map, &headers_ptr, &goroutine_addr, BPF_ANY);    // 写入header_req_map对象        }    }#endif}

header_req_map对象的定义:

struct {    __uint(type, BPF_MAP_TYPE_HASH);    __type(key, void *); // key: pointer to the request header map    __type(value, u64); // the goroutine of the transport request    __uint(max_entries, MAX_CONCURRENT_REQUESTS);} header_req_map SEC(".maps");

三. 监听uprobe/header_writeSubset

解决流程:

  • 首先,依据header_addr,查找header_req_map对象,失去goroutine_addr;
  • 而后,依据goroutine_addr,查找ongoing_http_client_requests对象,失去trace信息;
  • 最初,将trace信息,组装给traceparent,通过bpf_probe_write_user()函数,写入http header中;
// beyla/go_nethttp.c#ifndef NO_HEADER_PROPAGATION// Context propagation through HTTP headersSEC("uprobe/header_writeSubset")int uprobe_writeSubset(struct pt_regs *ctx) {    void *header_addr = GO_PARAM1(ctx);    void *io_writer_addr = GO_PARAM3(ctx);    // 首先,依据header_addr,查找header_req_map对象,失去goroutine_addr    u64 *request_goaddr = bpf_map_lookup_elem(&header_req_map, &header_addr);    // 而后,依据goroutine_addr,查找ongoing_http_client_requests对象,失去trace信息    u64 parent_goaddr = *request_goaddr;    http_func_invocation_t *func_inv = bpf_map_lookup_elem(&ongoing_http_client_requests, &parent_goaddr);    ...    unsigned char buf[TRACEPARENT_LEN];    make_tp_string(buf, &func_inv->tp); // trace写入buf    ...    // 最初,应用bpf_probe_write_user()函数将buf中的信息写入http header    if (len < (size - TP_MAX_VAL_LENGTH - TP_MAX_KEY_LENGTH - 4)) { // 4 = strlen(":_") + strlen("\r\n")        char key[TP_MAX_KEY_LENGTH + 2] = "Traceparent: ";        char end[2] = "\r\n";        bpf_probe_write_user(buf_ptr + (len & 0x0ffff), key, sizeof(key));        len += TP_MAX_KEY_LENGTH + 2;        bpf_probe_write_user(buf_ptr + (len & 0x0ffff), buf, sizeof(buf));        len += TP_MAX_VAL_LENGTH;        bpf_probe_write_user(buf_ptr + (len & 0x0ffff), end, sizeof(end));        len += 2;        bpf_probe_write_user((void *)(io_writer_addr + io_writer_n_pos), &len, sizeof(len));    }    return 0;}#else...#endif

对于nethttp,这里最终写入的是bufio.Write的buf字段:

// go/src/bufio/bufio.gotype Writer struct {    err error    buf []byte    n   int    wr  io.Writer}

四. bpf辅助函数bpf_probe_write_user

bpf_probe_write_user()因为会批改用户态的内存,对内核有一些要求:

In order to write the traceparent value in outgoing HTTP/gRPC request headers, Beyla needs to write to the process memory using the bpf_probe_write_user eBPF helper.
Since kernel 5.14 (with fixes backported to the 5.10 series) this helper is protected (and unavailable to BPF programs) if the Linux Kernel is running in integrity lockdown mode. Kernel integrity mode is typically enabled by default if the Kernel has Secure Boot enabled, but it can also be enabled manually.

而对于内核lockdown的配置:

Beyla will automatically check if it can use the bpf_probe_write_user helper, and enable context propagation only if it's allowed by the kernel configuration. Verify the Linux Kernel lockdown mode by running the following command:

cat /sys/kernel/security/lockdown

If that file exists and the mode is anything other than [none], Beyla will not be able to perform context propagation and distributed tracing will be disabled.

在代码实现中:

  • 若内核版本<5.*,则反对context propagation;
  • 若内核版本<5.10,则反对context propagation;
  • 否则:

    • 查看读取内核平安锁定文件/sys/kernel/security/lockdown,查看是否启用内核平安锁定;
    • 若未启动内核平安锁定(KernelLockDownNone),则反对;
// pkg/internal/ebpf/common/common.gofunc SupportsContextPropagation(log *slog.Logger) bool {    kernelMajor, kernelMinor := KernelVersion()    if kernelMajor < 5 || (kernelMajor == 5 && kernelMinor < 10) {        log.Debug("Found Linux kernel earlier than 5.10, trace context propagation is supported", "major", kernelMajor, "minor", kernelMinor)        return true    }    // 读文件/sys/kernel/security/lockdown    lockdown := KernelLockdownMode()    // 若内容=none,则返回true    if lockdown == KernelLockdownNone {        log.Debug("Kernel not in lockdown mode, trace context propagation is supported.")        return true    }    return false}

参考

1.https://github.com/grafana/beyla/issues/521
2.https://github.com/grafana/beyla/blob/main/docs/sources/distributed-traces.md