Skip to content

Conversation

mergeconflict
Copy link
Contributor

@mergeconflict mergeconflict commented Mar 26, 2019

Adds two new callbacks: "prepare" watchers, which fire immediately before we poll for I/O, and "check" watchers, which fire immediately after we finish polling and before we process events. This allows other event loops to be embedded into libevent's, and enables certain performance monitoring.

Closes: #710

Copy link
Contributor Author

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes and questions inline for reviewers. Thanks in advance!

@mergeconflict
Copy link
Contributor Author

cc @htuch

@coveralls
Copy link

coveralls commented Mar 26, 2019

Coverage Status

Coverage increased (+0.06%) to 81.326% when pulling 2f184f8 on mergeconflict:prepare_and_check_watchers into 47d348a on libevent:master.

Copy link
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks good, but requires some modifications:

  • comments

  • windows build is broken

  • code and declaration mixes

  • the test fails on ci

    test-prepare-check: /home/travis/build/libevent/libevent/test/test-prepare-check.c:73: prepare_callback_1: Assertion `info->timeout->tv_sec == 1' failed.

    can you relax this check? i.e. allow small difference like 50ms or so (take a look at test_timeval_diff_eq)

    also maybe it is better to move this into regress_watchers.c to use tinytest asserts there (to know the expected/actual values)

Also it will be great to have some good example (under sample/), like the histograms that had been mentioned in the #710 (and then you refer to it from the patches to the book)

@mergeconflict
Copy link
Contributor Author

@azat I still haven't added the example under sample/ you asked for -- I'll work on that next...

Copy link
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments after a brief look in the review (I will take a closer look before merge)

@azat
Copy link
Member

azat commented Mar 28, 2019 via email

@azat
Copy link
Member

azat commented Mar 28, 2019 via email

Copy link
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So everything looks good!

Can you do the final bits then:

  • squash all of them into one
  • write description for the commit
  • add entry to the whatsnews-2.2.txt

Thanks for rapid reaction!

@mergeconflict
Copy link
Contributor Author

Ok... I've added an example of how we might use prepare and check watchers to collect timing statistics. There's a fun implementation of constant-space histograms in there... it might be nice to turn that into a proper library at some point. @htuch, @mattklein123 - maybe you'd be interested in having a look at the example as well.

I haven't finished addressing all the style comments yet. I'll make another pass tomorrow.

@mergeconflict
Copy link
Contributor Author

Hi @azat, thanks for all your feedback! I think I've addressed all the style issues, I've added a "what's new" entry, and I've updated the issue description.

I just want to double-check -- you want me to git rebase -i locally to squash all the commits, using this issue description as the commit message, and then force-push the branch -- is that right?

@azat
Copy link
Member

azat commented Mar 29, 2019 via email

@mergeconflict mergeconflict force-pushed the prepare_and_check_watchers branch 3 times, most recently from 7b1755b to 0a4e5e0 Compare March 29, 2019 16:38
Copy link
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the comments, thanks!

@azat
Copy link
Member

azat commented Apr 1, 2019 via email

@mergeconflict mergeconflict force-pushed the prepare_and_check_watchers branch from 0a4e5e0 to ec1b31b Compare April 1, 2019 17:13
@mergeconflict
Copy link
Contributor Author

@azat updated and rebased. The lock/unlock was a good catch. Thanks again for the thorough review!

Copy link
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick reaction!
There some minor issues with the sample but other then this everything looks pretty good!

@mergeconflict mergeconflict force-pushed the prepare_and_check_watchers branch from ec1b31b to 71a7898 Compare April 1, 2019 21:01
@mergeconflict
Copy link
Contributor Author

Hi @azat, everything's updated. Let me know if there's anything else!

@azat
Copy link
Member

azat commented Apr 2, 2019

@mergeconflict everything looks good!

And while thinking about this one more time, I think that user-passed argument will not be superfluous, i.e. evwatch_prepare_new(struct event_base *base, evwatch_prepare_cb callback, void *arg), can you add this? (and I think that this is the last bit, sorry for not noticing this before)

@mergeconflict
Copy link
Contributor Author

And while thinking about this one more time, I think that user-passed argument will not be superfluous, i.e. evwatch_prepare_new(struct event_base *base, evwatch_prepare_cb callback, void *arg), can you add this? (and I think that this is the last bit, sorry for not noticing this before)

Sure, no problem. I was also actually thinking it might be handy. I'll put the *arg as an additional argument in the callbacks, after the evwatch_*_cb_info to stay consistent with how the evbuffer callbacks are designed.

@mergeconflict mergeconflict force-pushed the prepare_and_check_watchers branch from 71a7898 to 956251d Compare April 2, 2019 19:56
@azat
Copy link
Member

azat commented Apr 2, 2019

So everything looks great, will apply this once the CI will be passed (or almost passed)
Thanks for the quick reaction!

@azat
Copy link
Member

azat commented Apr 2, 2019

One more thing pops up evwatch_check_cb_info is empty, and while this is accepted by gcc it is not strictly valid and VS do not pass this:

c:\projects\libevent\event-internal.h(221): error C2016: C requires that a struct or union has at least one member [C:\projects\libevent\build-cmake\event_core_shared.vcxproj]

Maybe this can be suppressed with some kind of pragma, but I think that better option will be to add one field there instead, like the time the dispatch took

@azat
Copy link
Member

azat commented Apr 2, 2019

And commit message should be formatted appropriate, i.e.:

subject
<   blank line   >
body

since otherwise git will think that the whole commit message is the subject (without empty line)

(first link from google https://chris.beams.io/posts/git-commit/#seven-rules)

@mergeconflict
Copy link
Contributor Author

Maybe this can be suppressed with some kind of pragma, but I think that better option will be to add one field there instead, like the time the dispatch took

Interesting, that's actually very helpful for us -- it would mean that we don't have to save the prepare and check timestamps externally. I'm thinking to rename the prepare timeout expected_timeout and call the check timeout actual_timeout, so that it's obvious to users that you can compare the actual and expected values. What do you think?

@azat
Copy link
Member

azat commented Apr 2, 2019

Interesting, that's actually very helpful for us -- it would mean that we don't have to save the prepare and check timestamps externally. I'm thinking to rename the prepare timeout expected_timeout and call the check timeout actual_timeout, so that it's obvious to users that you can compare the actual and expected values. What do you think?

I like the idea in general, but don't like such long fields. I guess it is fine here

@mergeconflict
Copy link
Contributor Author

Ok... I've implemented timing, but I haven't squashed the commits yet, since I'm not sure if you'll be ok with it. Specifically, to get accurate timing, I've had to add two evutil_gettimeofday calls before and after evsel->dispatch, which might be unacceptable overhead in some cases. I tried using the cached time, but it was too inaccurate (like 3000us compared to about 1000us expected).

Of course I'd be very happy to keep this implementation, since it supports my use case directly, but I understand if it's not good for others. If that's the case, then we'll solve the Win32 empty struct issue by simply adding a dummy unused field instead.

@mergeconflict
Copy link
Contributor Author

Hmm... I just noticed I have a link error on Windows:

LINK : fatal error LNK1104: cannot open file 'm.lib' [C:\projects\libevent\build-cmake\watch-timing.vcxproj]

Trying a fix...

@azat
Copy link
Member

azat commented Apr 3, 2019

to get accurate timing, I've had to add two evutil_gettimeofday calls before and after evsel->dispatch, which might be unacceptable overhead in some cases

Well if we are talking about modern linux (but maybe I missed something, since this is 5min experience) then I don't think that cached version is worth it, and just out of curiosity I wrote simple test and this assumption is correct (actually non cached version is a little bit faster)

But we have lots of other supported architectures, so I let's use cached version here and this will be addressed later

(because since we do have this event_base_gettimeofday_cached() now we should use it, if we do not want it, it should be removed completely in one pass)

@mergeconflict
Copy link
Contributor Author

But we have lots of other supported architectures, so I let's use cached version here and this will be addressed later

The issue with using the cached time is that it's too inaccurate, at least for the main use case we care about in Envoy. The goal of measuring the actual vs. expected polling duration is that if you compare the two, the actual duration should be very close to the expected duration - within one kernel timeslice usually. If the delay is larger, it's an indication that your kernel is swamped. Using the cached time will give the misleading impression that something's wrong...

My suggestion is, let's leave this out for now. In the future, if we get rid of the cached time (since your experimental results seem good), then we can put the accurate timing here. Does that sound ok?

@azat
Copy link
Member

azat commented Apr 3, 2019 via email

Adds two new callbacks: "prepare" watchers, which fire immediately
before we poll for I/O, and "check" watchers, which fire immediately
after we finish polling and before we process events. This allows other
event loops to be embedded into libevent's, and enables certain
performance monitoring.

Closes: libevent#710
@mergeconflict mergeconflict force-pushed the prepare_and_check_watchers branch from 8d8a8f1 to 2f184f8 Compare April 3, 2019 16:45
@mergeconflict
Copy link
Contributor Author

Ok, cool. I've updated:

  • Revert the timing thing
  • Fix the empty struct issue by adding a dummy unused field
  • Fix the git commit message
  • Squashed and rebased against the latest changes in master

@azat azat merged commit 2f184f8 into libevent:master Apr 3, 2019
@azat
Copy link
Member

azat commented Apr 3, 2019

Applied, thanks for the effort (endless rebases and so forth)!

@mergeconflict
Copy link
Contributor Author

Thanks @azat!

@mergeconflict mergeconflict deleted the prepare_and_check_watchers branch April 3, 2019 18:35
htuch pushed a commit to envoyproxy/envoy that referenced this pull request Apr 15, 2019
Add per-thread dispatcher statistics for loop duration and poll delay, based on new "prepare" and "check" watchers added in libevent (libevent/libevent#793). See discussion in #4952.

Risk Level: medium
Testing: Added unit test, all existing tests pass, and running locally yields sane results.
Docs Changes: Added a new page on "performance" discussing event loop.
Release Notes: Added an entry noting new stats.

Signed-off-by: Dan Rosen <mergeconflict@google.com>
htuch pushed a commit to envoyproxy/envoy that referenced this pull request Apr 18, 2019
 @htuch discovered a race condition in my libevent watcher implementation in the process of enabling TSAN for dependencies (#6610). Update libevent to pull in the fix (libevent/libevent#793).

Risk Level: low
Testing: bazel test //test/server:worker_impl_test -c dbg --config=clang-tsan --runs_per_test=1000 (with @htuch's patch applied).

Signed-off-by: Dan Rosen <mergeconflict@google.com>
aarongable pushed a commit to chromium/chromium that referenced this pull request Jan 21, 2022
The previous attempt (https://crrev.com/c/2844573) to
fix deadlocks when the browser is using Wayland EGL was
not successful. While it did fix the deadlock, it brough
other problems (see crbug.com/1280424 and crbug.com/1287664).

It was decided to revisit the implementation and discuss this
problem with Wayland developers (see the "Wayland fd watching"
thread in the Wayland dev ML at https://bit.ly/3nBagQW).

The presented solution still fixes the deadlocks, and also
presents a better and faster integration as it avoids constant
context switches.

TL;DR:

Prepare/read/dispatch dance[1] must be integrated in the libglib
and libevent in such a way that they are called in the correct
order whenever they enter into a new loop cycle. That is,
Libglib provides several callbacks via GSourceFuncs (such as
prepare/check/dispatch/finalize), which are all called in a
defined order. Failing to do correct calls might result in
deadlocks if there are other clients like Wayland EGL who also
want to read (please note Wayland EGL is only used when there
are either no drm render nodes or no libgbm support or Wayland
does not support necessary protocols, and --in-process-gpu flag
is passed).

Libevent also provides watcher callbacks (see [2]), but given
Chromium still has been using old version of libevent, it is
impossible to do correct integration now (see crbug.com/1288181).
Instead, I have made it use old integration that we have been using
for several years before adding a polling thread.

Please note that LaCros uses Libevent and Linux uses Libglib by
default.

As a bonus, this CL also fixes 1172305

[1] https://bit.ly/3tCjobF
[2] libevent/libevent#793

Bug: 1280424, 1287664, 1172305
Change-Id: I79d061efe64b89c14c8c898e4e61dbf8821efa4c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3386963
Reviewed-by: Robert Kroeger <rjkroege@chromium.org>
Commit-Queue: Maksim Sisov <msisov@igalia.com>
Cr-Commit-Position: refs/heads/main@{#961827}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants