-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Add "prepare" and "check" watchers #793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes and questions inline for reviewers. Thanks in advance!
cc @htuch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this looks good, but requires some modifications:
-
comments
-
windows build is broken
-
the test fails on ci
test-prepare-check: /home/travis/build/libevent/libevent/test/test-prepare-check.c:73: prepare_callback_1: Assertion `info->timeout->tv_sec == 1' failed.
can you relax this check? i.e. allow small difference like 50ms or so (take a look at
test_timeval_diff_eq
)also maybe it is better to move this into
regress_watchers.c
to use tinytest asserts there (to know the expected/actual values)
Also it will be great to have some good example (under sample/
), like the histograms that had been mentioned in the #710 (and then you refer to it from the patches to the book)
@azat I still haven't added the example under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments after a brief look in the review (I will take a closer look before merge)
Thinking a bit about overhead... I'm wondering if we should gate this new feature behind a compilation flag, like:
The overhead should be neglible (one more lock/unlock, BTW we can get
rid off the impact by checking if there are any watchers and if not we
can skip lock/foreack/unlock, but again I don't think that even this
simple condition worth it) since event_base_loop() should not be called
tons of times (by those who cares about performance).
Also introducing yet another compilation option increases coverage
matrix (and it is already huge enough).
|
Hmmm. I don't feel strongly about this, but let me argue a little bit: I was thinking that the `const` provides enough safety -- if somebody casts away the `const`, it's their own fault. And maybe it's not actually a bad thing for users to modify the timeout? I don't know :)
I'm not sure that we should allow changing this time, and even if we
should, then explicitly via some API, like
event_prepare_set_{loop/base}_timeout() not via hackish casts
(personally I, don't care about architectures other then x86/amd64, but
this const-cast-away can break something for them).
But maybe I'm too suspicious about this...
int
event_prepare_get_timeout(const struct event_prepare_cb_info *info, struct timeval *timeout) {
if (info->timeout) {
*timeout = *(info->timeout);
return 1;
}
return 0;
}
Looks good.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So everything looks good!
Can you do the final bits then:
- squash all of them into one
- write description for the commit
- add entry to the
whatsnews-2.2.txt
Thanks for rapid reaction!
Ok... I've added an example of how we might use prepare and check watchers to collect timing statistics. There's a fun implementation of constant-space histograms in there... it might be nice to turn that into a proper library at some point. @htuch, @mattklein123 - maybe you'd be interested in having a look at the example as well. I haven't finished addressing all the style comments yet. I'll make another pass tomorrow. |
Hi @azat, thanks for all your feedback! I think I've addressed all the style issues, I've added a "what's new" entry, and I've updated the issue description. I just want to double-check -- you want me to |
I think I've addressed all the style issues, I've added a "what's new" entry, and I've updated the issue description.
Thanks! Will take a closer look today or at least tomorrow
I just want to double-check -- you want me to git rebase -i locally to squash all the commits, using this issue description as the commit message, and then force-push the branch -- is that right?
Yep git rebase locally, squash all the commits, update commit
description and git push --force
|
7b1755b
to
0a4e5e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at the comments, thanks!
+#pragma GCC diagnostic ignored "-Wfloat-equal"
I get the following warning without it:
sample/watcher-timing.c:43:11: warning: comparing floating point with == or != is unsafe [-Wfloat-equal]
return a == b;
~ ^ ~
Got it, thanks! (yet another difference between autotools and cmake,
will fix this later).
|
0a4e5e0
to
ec1b31b
Compare
@azat updated and rebased. The lock/unlock was a good catch. Thanks again for the thorough review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick reaction!
There some minor issues with the sample but other then this everything looks pretty good!
ec1b31b
to
71a7898
Compare
Hi @azat, everything's updated. Let me know if there's anything else! |
@mergeconflict everything looks good! And while thinking about this one more time, I think that user-passed argument will not be superfluous, i.e. |
Sure, no problem. I was also actually thinking it might be handy. I'll put the |
71a7898
to
956251d
Compare
So everything looks great, will apply this once the CI will be passed (or almost passed) |
One more thing pops up
Maybe this can be suppressed with some kind of pragma, but I think that better option will be to add one field there instead, like the time the dispatch took |
And commit message should be formatted appropriate, i.e.:
since otherwise git will think that the whole commit message is the subject (without empty line) (first link from google https://chris.beams.io/posts/git-commit/#seven-rules) |
Interesting, that's actually very helpful for us -- it would mean that we don't have to save the prepare and check timestamps externally. I'm thinking to rename the prepare timeout |
I like the idea in general, but don't like such long fields. I guess it is fine here |
Ok... I've implemented timing, but I haven't squashed the commits yet, since I'm not sure if you'll be ok with it. Specifically, to get accurate timing, I've had to add two Of course I'd be very happy to keep this implementation, since it supports my use case directly, but I understand if it's not good for others. If that's the case, then we'll solve the Win32 empty struct issue by simply adding a dummy unused field instead. |
Hmm... I just noticed I have a link error on Windows:
Trying a fix... |
Well if we are talking about modern linux (but maybe I missed something, since this is 5min experience) then I don't think that cached version is worth it, and just out of curiosity I wrote simple test and this assumption is correct (actually non cached version is a little bit faster) But we have lots of other supported architectures, so I let's use cached version here and this will be addressed later (because since we do have this |
The issue with using the cached time is that it's too inaccurate, at least for the main use case we care about in Envoy. The goal of measuring the actual vs. expected polling duration is that if you compare the two, the actual duration should be very close to the expected duration - within one kernel timeslice usually. If the delay is larger, it's an indication that your kernel is swamped. Using the cached time will give the misleading impression that something's wrong... My suggestion is, let's leave this out for now. In the future, if we get rid of the cached time (since your experimental results seem good), then we can put the accurate timing here. Does that sound ok? |
My suggestion is, let's leave this out for now. In the future, if we get rid of the cached time (since your experimental results seem good), then we can put the accurate timing here. Does that sound ok?
Agree
|
Adds two new callbacks: "prepare" watchers, which fire immediately before we poll for I/O, and "check" watchers, which fire immediately after we finish polling and before we process events. This allows other event loops to be embedded into libevent's, and enables certain performance monitoring. Closes: libevent#710
8d8a8f1
to
2f184f8
Compare
Ok, cool. I've updated:
|
Applied, thanks for the effort (endless rebases and so forth)! |
Thanks @azat! |
Add per-thread dispatcher statistics for loop duration and poll delay, based on new "prepare" and "check" watchers added in libevent (libevent/libevent#793). See discussion in #4952. Risk Level: medium Testing: Added unit test, all existing tests pass, and running locally yields sane results. Docs Changes: Added a new page on "performance" discussing event loop. Release Notes: Added an entry noting new stats. Signed-off-by: Dan Rosen <mergeconflict@google.com>
@htuch discovered a race condition in my libevent watcher implementation in the process of enabling TSAN for dependencies (#6610). Update libevent to pull in the fix (libevent/libevent#793). Risk Level: low Testing: bazel test //test/server:worker_impl_test -c dbg --config=clang-tsan --runs_per_test=1000 (with @htuch's patch applied). Signed-off-by: Dan Rosen <mergeconflict@google.com>
The previous attempt (https://crrev.com/c/2844573) to fix deadlocks when the browser is using Wayland EGL was not successful. While it did fix the deadlock, it brough other problems (see crbug.com/1280424 and crbug.com/1287664). It was decided to revisit the implementation and discuss this problem with Wayland developers (see the "Wayland fd watching" thread in the Wayland dev ML at https://bit.ly/3nBagQW). The presented solution still fixes the deadlocks, and also presents a better and faster integration as it avoids constant context switches. TL;DR: Prepare/read/dispatch dance[1] must be integrated in the libglib and libevent in such a way that they are called in the correct order whenever they enter into a new loop cycle. That is, Libglib provides several callbacks via GSourceFuncs (such as prepare/check/dispatch/finalize), which are all called in a defined order. Failing to do correct calls might result in deadlocks if there are other clients like Wayland EGL who also want to read (please note Wayland EGL is only used when there are either no drm render nodes or no libgbm support or Wayland does not support necessary protocols, and --in-process-gpu flag is passed). Libevent also provides watcher callbacks (see [2]), but given Chromium still has been using old version of libevent, it is impossible to do correct integration now (see crbug.com/1288181). Instead, I have made it use old integration that we have been using for several years before adding a polling thread. Please note that LaCros uses Libevent and Linux uses Libglib by default. As a bonus, this CL also fixes 1172305 [1] https://bit.ly/3tCjobF [2] libevent/libevent#793 Bug: 1280424, 1287664, 1172305 Change-Id: I79d061efe64b89c14c8c898e4e61dbf8821efa4c Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3386963 Reviewed-by: Robert Kroeger <rjkroege@chromium.org> Commit-Queue: Maksim Sisov <msisov@igalia.com> Cr-Commit-Position: refs/heads/main@{#961827}
Adds two new callbacks: "prepare" watchers, which fire immediately before we poll for I/O, and "check" watchers, which fire immediately after we finish polling and before we process events. This allows other event loops to be embedded into libevent's, and enables certain performance monitoring.
Closes: #710