Implement trace-log correlation#942
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #942 +/- ##
==========================================
- Coverage 46.49% 46.07% -0.42%
==========================================
Files 267 270 +3
Lines 28044 28410 +366
==========================================
+ Hits 13040 13091 +51
- Misses 14127 14433 +306
- Partials 877 886 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
38b68a9 to
2da8e85
Compare
1cf9a17 to
46ecdcc
Compare
9ca4e7a to
080bacc
Compare
rafaelroquetto
left a comment
There was a problem hiding this comment.
Another pass. We should run some scalability tests here before being able to merge this (e.g. deploy it in a big cluster and gather performance data). I am worried about a few points:
- pressure on the ringbuffer: the shared ringbuffer will now be flooded by lines of logs of every tracked process - that alone is concerning, and it gets worse if we take into account the same ring buffer is used for other kinds of events
- overhead of context-switching for each
_writekprobe / CPU usage and ksoftirq handlers - you can gather performance data usingperffor that
The handler for the logs should probably grab the event, and put it into a processing queue to be picked up by another thread or thread pool - doing heavy work like deserialising json or opening and closing files (i.e. I/O) is very slow and will block the ring buffer. That handler should be doing the minimum work possible and return (e.g. copy the event, enqueue it and return).
We should also track the target process stdout (or log sink), open it once for writing, and keep the file descriptor open until either the process or OBI itself terminates, as @grcevski suggested.
| // Loop 1, follow the dentry chain (up to a maximum of | ||
| // k_path_resolver_max_components) and store pointers to each dentry in | ||
| // dentry_arr | ||
| for (int i = 0; i < k_path_resolver_max_components; i++) { |
There was a problem hiding this comment.
I wonder if we could simplify this into a single loop by simply walking backwards and write directly to buf. Something like this (pseudo-code):
static __always_inline size_t
resolve_path(char *buf, size_t buflen, struct path *path, const struct task_struct *task) {
/* ... setup task_struct, etc ... */
/* these are the main pointers, we iterate backwards into dentry, and then
* once we hit the current mnt_root, we swap the pointers, rinse and
* repeat
*/
struct vfsmount *mnt;
struct dentry *dentry;
char *end = buf + buflen;
*--end = '\0';
/* so here we start from the leaf path part */
mnt = path->mnt;
dentry = path->dentry;
for (u32 i = 0; i < k_path_resolver_max_components; ++i) {
struct dentry *parent;
/* while we are in the samen mnt, we keep prepending the path parts
* for each dentry
*/
while (dentry != mnt->mnt_root) {
int len = dentry->d_name.len;
/* account for '/' + name */
if (end - buf < len + 1) {
/* fail returning 0 bytes written */
return 0u;
}
/* copy name backwards - this is pseudo / regular C, need to read
* the len, qstr, etc... with bpf_probe_read_kernel() or
* BPF_CORE_READ()
*/
end -= len;
/* read the path part and preprent '/' */
bpf_probe_read_kernel(end, len, dentry->d_name.name);
*--end = '/';
/* move to parent dentry */
parent = dentry->d_parent;
if (parent == dentry)
break;
dentry = parent;
}
/* if we getting here, we got to the current's dentry chain mnt_root -
* so all we do is rinse and repeat and switch the dentry and mnt
* pointers to the next level
*/
if (mnt == mnt->mnt_parent)
break;
dentry = mnt->mnt_mountpoint;
mnt = mnt->mnt_parent;
}
// If we didn't write anything but the terminator, path is "/"
if (*end == '\0') {
if (end == buf)
return 0;
*--end = '/';
}
/* return the number of bytes read, you can use 0 to check if anything was
* written
*/
return buflen - end + buf;
}
or something like that - then in userspace you can simply iterate bytes_written event from the end of the buffer to extract the path. This makes it simpler and does ithout the need for the map to collect dentry pointers
There was a problem hiding this comment.
I'd like to avoid changing this code too much, as written on top of the file it's been taken from elastic/ebpf and ideally I'd use it as-is as I have the confidence it just works. This function has a lot of edge-cases handling and I don't wanna risk breaking it.
If the code is too much/too complex I can use the fd-based path construction the way I do for pipe_write, I preferred this as there are less moving parts (and less probes)
There was a problem hiding this comment.
I don't know... the original code loops over the N components twice, and requires additional loading/storing from an array per part (plus a map lookup every time this is invoked) - whereas the approach I am proposing does without the array altogether and loops once (and no map lookups) in a single pass. This is something that we can test for.
The pseudo-code is lacking a few corner cases (truncation, root path / when no other parts exist, for instance) but these could be addressed.
Having said that, I understand this takes (human) time and see where you are coming from. I'd personally give it a shot (and potentially fail miserably) but ultimately it's up to you - so feel free to leave it as is.
An aside point, since this code comes from elastic/ebpf and is dual-licensed (GPL/BSD2) you may want to confirm whether we need to keep the copyright and license notice or somenthing like that.
There was a problem hiding this comment.
For the first part, I'll give it a shot after load testing. For the second part, I think we should keep copyright, but I don't know how to do it. I added a comment at the top of the file with the source, should I copy something else? /cc @MrAlias
There was a problem hiding this comment.
@mmat11 yeah that sounds good - come to think of it, you are right, let's do the testing first and see if that even matters.
I am using a dedicated ring buffer (see
Will do that 👍🏻
I didn't want to add additional complexity without having measured some metrics or observed some bottlenecks, maybe we can do this after that step?
👍🏻 |
54af8d8 to
a89aba2
Compare
| } | ||
|
|
||
| static __always_inline bool tty_driver_is_pty(struct tty_struct *tty) { | ||
| if (bpf_core_enum_value_exists(enum tty_driver_type__new, TTY_DRIVER_TYPE_PTY)) { |
There was a problem hiding this comment.
This is actualy not enough, you need
if (!bpf_core_field_exists(struct tty_driver, type)) {
return false;
}
otherwise it will fail in kernels that don't have that.
There was a problem hiding this comment.
the type field is always present
| } | ||
|
|
||
| static __always_inline bool tty_driver_is_master(struct tty_struct *tty) { | ||
| if (bpf_core_enum_value_exists(enum tty_driver_subtype__new, PTY_TYPE_MASTER)) { |
There was a problem hiding this comment.
same here for subtype (if that also can vary amongst kernel versions)
4f7b4c3 to
5e8e0f1
Compare
5e8e0f1 to
6311be5
Compare
|
@rafaelroquetto @grcevski I did add a couple of optimizations, did some tests on a real cluster and applied all the other suggestions (I just left out the pathresolver.h changes). Let me know if I'm missing something |
Thanks @mmat11 ! Reviewing this one more time is on my todo for today. I just noticed there are some conflicts with main. |
rebased! |
grcevski
left a comment
There was a problem hiding this comment.
This looks great @mmat11!
I had one question remaining and it's probably me not being able to see how this is done. I see we have another criteria definition for further filtering of the services we want to include for logging, how is this wired in?
I would've thought we needed maybe an extension to the pipeline to have Matcher filter out pids we don't want.
| ebpfcommon.ForwardRingbuf( | ||
| &p.cfg.EBPF, | ||
| p.bpfObjects.LogEvents, | ||
| eventCtx.CommonPIDsFilter, |
There was a problem hiding this comment.
Do we need to pass the log pids filter here to the forward ring buff? The common pids filter is for the other bpf programs. Maybe that makes sense actually, you want to filter all logs for non instrumented services. In a sense, tracking logs doesn't make sense if the pid is not also instrumented for tracing.
There was a problem hiding this comment.
IIRC I passed the common pids filter there because that filtering is happening at span level after data gets to userspace -- all I needed was to keep the ebpf map with pids aligned with the tracked processes (see AllowPID/BlockPID). Maybe I can add another check in ebpf which just calls valid_pid() on the pids map?
There was a problem hiding this comment.
Yeah I think it's fine, I think it will work, since we require that the logged pids are also traced. I think the main issue is that I still don't see how the logs filtering works, how the additional rules affect certain apps. If you can point me to the code that does the logs criteria matching, I must be missing it. Perhaps I need to download your branch.
There was a problem hiding this comment.
It works similarly to how generictracer's filter works, I add the pids key (ns + pid) to a map and check via ebpf the current pid is in that map (see pid_tracked() in logenricher.c)
the userspace tracer is responsible to reconcile this map (log_enricher_pids) with the discovered processes based on config
There was a problem hiding this comment.
Sorry I wasn't clear, so imagine we have instrumented two ports for tracing, OTEL_EBPF_OPEN_PORT="8080,8081". I see you have LogEnricherServicesConfig, so let's say I have a config:
log_enricher:
open_port: 8080
I'm wondering where is that extra discovery config used to avoid enrichment of logs for 8081?
There was a problem hiding this comment.
I think I'm using the main discovery pids for filtering 🤦🏻♂️ , I'll fix it
There was a problem hiding this comment.
Heh no problem, I just thought I was missing something. I think it looks great to be honest, with adding the correct filtering we are good to go IMO.
There was a problem hiding this comment.
I had to change it at "matcher" level and pass the info all along the pipeline, the changes are in the last commit
5cf2592 to
8dcb895
Compare
grcevski
left a comment
There was a problem hiding this comment.
LGTM! Thanks for making this happen @mmat11 ! I just have one minor comment about adding a log statement if a user picks logging criteria broader than the instrumentation criteria. We can document as much as we want, but there will always be user errors and it would be good to expose them.
|
|
||
| return &ProcessMatch{Criteria: criteria, Process: proc} | ||
| return &ProcessMatch{Criteria: criteria, LogEnricherCriteria: logEnricherCriteria, Process: proc} | ||
| } |
There was a problem hiding this comment.
I would just ask one last thing :), if you don't mind. Before returning nil here can you add an if statement to check if the logEnricherCriteria is not empty and with info log tell us about it.
Something like:
if len(logEnricherCriteria) > 0 {
m.Log.Info("avoiding log instrumentation for process, since it wasn't matched for instrumentation. Log enrichment criteria must be a subset of instrumentation matching criteria", "pid", proc.Pid...
}
There was a problem hiding this comment.
yes, good suggestion!
Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
8dcb895 to
e9bcde0
Compare
Checklist