Skip to content

Conversation

mostrows2
Copy link
Contributor

Make the logic of pollset_set_del_fd mirror that of
polsset_set_add_fd. The latter has 3 distinct stanzas where an FD can
be referenced (in fds, pollset, pollset_sets). Prior to this change
pollset_set_del_fd only removed references from fds and pollset_sets.

@nicolasnoble

@linux-foundation-easycla
Copy link

CLA Check
One or more committers are not authorized under a signed CLA as indicated below. Please click here to be authorized.

Copy link
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to the changes. Pending on the test result.

@mostrows2
Copy link
Contributor Author

@lidizheng I've looked into those Kokoro failures, at least some of them. I found one test in particular that failed (https://source.cloud.google.com/results/invocations/909fa8cf-f935-467f-b1dc-c044706c9eda/targets/github%2Fgrpc%2Frun_tests%2Fc_macos_dbg_native/tests;group=c_macos_dbg_native;test=bins%2Fdbg%2Fserver_test%20GRPC_POLL_STRATEGY%3Dpoll;row=89).

I was able to run this test myself, and it timed out.

I then reverted my change .... and the test still timed out.

So... that leads me to conclude one of 2 things is true:

  1. There's a problem with the tests overall.
  2. A lot of code is dependent on the buggy behavior being fixed here.

Can you or somebody else please help make sense out of this?

@mostrows2 mostrows2 force-pushed the issue-19208 branch 2 times, most recently from a0a598f to 26cdf29 Compare October 7, 2019 17:23
@mostrows2
Copy link
Contributor Author

This change has been reworked. The original version accounted for the extra references by the poller, but those were due to be cleaned up after notification post-orphaning. The problem there was that orphaning is supposed to notify the poller to release its references. This change now takes a different approach of deferring the file-descriptor close until the ref count reaches 0.

@lidizheng
Copy link
Contributor

@yashykt Can you take a look at the failed C++ test? @mostrows2 wasn't able to make it pass locally, is it a known issue in master branch?

@mostrows2
Copy link
Contributor Author

@yashykt Can you take a look at the failed C++ test? @mostrows2 wasn't able to make it pass locally, is it a known issue in master branch?

I'd hold off on that. I can reasonably expect that failure given what prompted me to redo the fix. Let's wait for Kokoro runs to run and I'll re-triage those first.

@mostrows2
Copy link
Contributor Author

Re-triaged the Kokoro test runs. Put out another version of this change... somebody please re-kick those test runs.

After closing a file descriptor honor the "closed" flag to avoid
re-closing it in post-fork processing.

File descriptors must be closed during an orphan operation, because
the closing of the file-descriptor is necessary for the correct function
of code that is polling on the descriptor.  Thus even if there are active
references, the close() call is necessary.  But this means that post-fork
code may close the file-descriptor, since it is only unregistered from
post-fork after the ref-count reaches 0.  All of this can be handled by
ensuring that the post-fork code honors the "close" flag.
@mostrows2
Copy link
Contributor Author

New code version pushed. Automated tests appear stuck and Kokoro run needs to be triggered.

@lidizheng
Copy link
Contributor

@mostrows2 CI for this repo needs special tag to run the full suite of tests. Test triggered.

@mostrows2
Copy link
Contributor Author

I've taken a look through the failing Kokoro runs, and I'm seeing some failures that clearly are not related ( 502 bad gateway errors, errors in list splicing tests, npm command failures). I'd appreciate a set of eyes on these and suggestions on how to proceed (I don't think I'm even in a position where I'm able to run some of these tests.)

@lidizheng
Copy link
Contributor

@mostrows2 I think the failed tests are not related to this PR.
Known failures: #20502 #20545 #20468 #20547

@yashykt When you got some time can you take a look at this PR? It fixes forking on macOS.

@mostrows2
Copy link
Contributor Author

Ping? Any comments to unblock this?

@lidizheng
Copy link
Contributor

@gnossen Can you take a look at this PR?
@mostrows2 We need one additional maintainer's approval to merge this PR.

@lidizheng lidizheng requested review from gnossen and removed request for yashykt October 16, 2019 17:56
@lidizheng lidizheng assigned gnossen and unassigned yashykt Oct 16, 2019
@mostrows2
Copy link
Contributor Author

@gnossen Please take a look

@lidizheng lidizheng merged commit 7b7f38a into grpc:master Oct 30, 2019
@mostrows2
Copy link
Contributor Author

Any chance of getting this into a 1.25 release?

lidizheng added a commit that referenced this pull request Nov 4, 2019
Backport #20452 to fix the forking issue on macOS
@lock lock bot locked as resolved and limited conversation to collaborators Jan 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lang/c++ lang/Python release notes: yes Indicates if PR needs to be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants