Skip to content

Conversation

mattwidmann
Copy link

@mattwidmann mattwidmann commented Nov 25, 2017

This fixes #7632 for me. It's tricky to test, since it relies on a race between a subprocess exiting and waiting for cscope's output.

  • Should we instead change all of these to use vim_fgets? I didn't here because vim_fgets has some additional handling for lines that are too long, but that might be alright for all or some of these cases.
  • The linter is complaining about missing braces on if statements I didn't change -- is that expected?

@marvim marvim added the RFC label Nov 25, 2017
@justinmk
Copy link
Member

Thanks!

Should we instead change all of these to use vim_fgets

Better to change one thing at a time.

The linter is complaining about missing braces on if statements I didn't change -- is that expected?

Yes, it "cascades" to nearby lines, so legacy code gets updated incrementally.

ignoredp = fgets((char *)tbuf, FGETS_SIZE, fp);
if (ignoredp == NULL && errno == EINTR) {
goto retry_ignore;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically we're not ignoring the retval now, so ignoredp should not be used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess we can just rely on the existing loop to retry the fgets here.

Copy link
Author

@mattwidmann mattwidmann Nov 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, the existing code won't properly handle an EOF, either...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear, i was only speaking about the specific ignoredp global which is used as a dummy assignment when Vim doesn't care about some return value.

@justinmk
Copy link
Member

would be good to include your explanation and debugging steps from #7632 , in the commit message.

@justinmk justinmk added this to the 0.2.3 milestone Nov 25, 2017
@mattwidmann
Copy link
Author

mattwidmann commented Nov 25, 2017

I had to rewrite the vim_fgets function to make it safe (could just spin forever if EOF is returned from one of the calls to fgets). I could put this in a separate commit when I squash, since it's a separate (though overlapping) fix.

I also pushed a linter fix and an empty commit with the proposed commit message.

@justinmk
Copy link
Member

LGTM, just leave a comment when ready.

The calls to `fgets` in `src/nvim/if_cscope.c` (and elsewhere) can show
communication errors to the user if a signal is delivered during its
system calls. For plugins that proxy subprocess output into cscope
requests, a `SIGCHLD` might *always* interfere with calls into `fgets`.

To see this in a debugger, put a breakpoint on `cs_reading_emsg` and
watch signals come in (with lldb, using `process handle --notify true
--pass true`).  Next, run a subcommand from neovim that calls through
cscope when it returns.  A tag picker plugin, like vim-picker and fzy,
with `cscopetag` and `cscopetagorder=0` set, reproduced this reliably.
The breakpoint will hit after a `SIGCHLD` is delivered, and `errno` will
be set to 4, `EINTR`.

The caller of `fgets` should retry when `NULL` is returned with `errno`
set to `EINTR`.
@mattwidmann
Copy link
Author

Looking at the test failure here: http://neovim-qb.szakmeister.net/build/15079/tap_report/by_test?reportset=functionaltest-freebsd-64-debug

Is this test flaky? This change could have caused issues here, but the test looks somewhat racy. The failure is:

test/functional/eval/timer_spec.lua @ 41
Failure message: test/functional/eval/timer_spec.lua:44: Expected objects to be the same.
Passed in:
(number) 1
Expected:
(number) 0
stack traceback:
test/functional/eval/timer_spec.lua:44: in function <test/functional/eval/timer_spec.lua:41>

test/functional/eval/timer_spec.lua @ 49: timers works with zero timeout

And the test is:

  it('are triggered during sleep', function()
    command("call timer_start(50, 'MyHandler', {'repeat': 2})")
    nvim_async("command", "sleep 10")
    eq(0,eval("g:val"))
    run(nil, nil, nil, 300)
    eq(2,eval("g:val"))
  end)

There's nothing preventing the timer_start command and the check that g:val equals 0 from being separated by 50 ms or more. This didn't reproduce on the other test harnesses, either.

If an EOF is returned from `fgets`, `vim_fgets` might spin forever, as
it tries to consume the current line.

A `NULL` return value from `fgets` should break out of the function
(unless `errno` is `EINTR`), and then `feof` should be used to check for
the EOF condition on the stream.
@justinmk
Copy link
Member

Yes, that test is flaky. Can ignore it here.

@mattwidmann
Copy link
Author

Alright -- I've pushed two commits here and tested them locally. One of the fileio tests is failing on 32-bit:

�[32m�[2m[ RUN      ]�[0m�[0m file_read can read small chunks of input until eof: 

That's an alarming test to have fail against this branch. I didn't change that fileio code -- it uses its own buffering scheme instead of FILE streams, so it makes the read or readv syscalls itself.

Have you seen that test fail before? Is there any way I can test 32-bit on macOS? I ran that test on my machine and it passed, though it took about a second to complete. Is it timing out, maybe?

@justinmk
Copy link
Member

@mattwidmann That is a very weird failure, it looks like a timeout (build log stopped abruptly), but travis didn't flag it as a timeout. Have not seen that before.

I restarted the test since the logs don't show anything useful, to see if the failure is consistent.

@mattwidmann
Copy link
Author

Looks like the build failed this time, trying to fetch dependencies:

E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
apt-get.diagnostics
apt-get install failed
[...]
The command "sudo -E apt-get -yq --no-install-suggests --no-install-recommends --force-yes install autoconf automake apport build-essential clang-4.0 cmake cscope g++-5-multilib g++-multilib gcc-5-multilib gcc-multilib gdb language-pack-tr libc6-dev-i386 libtool llvm-4.0-dev locales pkg-config unzip valgrind xclip" failed and exited with 100 during .

Might need to be restarted again.

@justinmk
Copy link
Member

restarted again. definitely looks like travis issue:

oci runtime error: exec failed: container_linux.go:265: starting container process caused "could not create session key: disk quota exceeded"

@mattwidmann
Copy link
Author

Well, I'm done with this PR -- feel free to merge it when you're ready. Let me know if anything should be changed.

@justinmk justinmk merged commit a043899 into neovim:master Nov 26, 2017
@justinmk justinmk removed the RFC label Nov 26, 2017
@mattwidmann mattwidmann deleted the fgets-retry branch November 26, 2017 21:07
@justinmk justinmk mentioned this pull request Feb 8, 2018
@justinmk justinmk added the system OS resources, pipes, streams label Mar 13, 2018
justinmk added a commit that referenced this pull request Jun 11, 2018
FEATURES:
3cc7ebf #7234 built-in VimL expression parser
6a7c904 #4419 implement <Cmd> key to invoke command in any mode
b836328 #7679 'startup: treat stdin as text instead of commands'
58b210e :digraphs : highlight with hl-SpecialKey #2690
7a13611 #8276 'startup: Let `-s -` read from stdin'
1e71978 events: VimSuspend, VimResume #8280
1e7d5e8 #6272 'stdpath()'
f96d99a #8247 server: introduce --listen
e8c39f7 #8226 insert-mode: interpret unmapped META as ESC
98e7112 msg: do not scroll entire screen (#8088)
f72630b #8055 let negative 'writedelay' show all redraws
5d2dd2e win: has("wsl") on Windows Subsystem for Linux #7330
a4f6cec cmdline: CmdlineEnter and CmdlineLeave autocommands (#7422)
207b7ca #6844 channels: support buffered output and bytes sockets/stdio

API:
f85cbea #7917 API: buffer updates
418abfc #6743 API: list information about all channels/jobs.
36b2e3f #8375 API: nvim_get_commands
273d2cd #8329 API: Make nvim_set_option() update `:verbose set …`
8d40b36 #8371 API: more reliable/descriptive VimL errors
ebb1acb #8353 API: nvim_call_dict_function
9f994bb #8004 API: nvim_list_uis
3405704 #7520 API/UI: forward option updates to UIs
911b1e4 #7821 API: improve nvim_command_output

WINDOWS OS:
9cefd83 #8084, #8516 build/win: support MSVC
ee4e1fd win: Fix reading content from stdin (#8267)

TUI:
ffb8904 #8309 TUI: add support for mouse release events in urxvt
8d5a46e #8081 TUI: implement "standout" attribute
6071637 TUI: support TERM=konsole-256color
67848c0 #7653 TUI: report TUI info with -V3 ('verbose' >= 3)
3d0ee17 TUI/rxvt: enable focus-reporting
d109f56 #7640 TUI: 'term' option: reflect effective terminal behavior

FIXES:
ed6a113 #8273 'job-control: avoid kill-timer race'
4e02f1a #8107 'jobs: separate process-group'
451c48a terminal: flush vterm output buffer on pty output #8486
5d6732f :checkhealth fixes #8335
53f11dc #8218 'Fix errors reported by PVS'
d05712f inccommand: pause :terminal redraws (#8307)
51af911 inccommand: do not execute trailing commands #8256
84359a4 terminal: resize to the max dimensions (#8249)
d49c1dd #8228 Make vim_fgets() return the same values as in Vim
60e96a4 screen: winhl=Normal:Background should not override syntax (#8093)
0c59ac1 #5908 'shada: Also save numbered marks'
ba87a2c cscope: ignore EINTR while reading the prompt (#8079)
b1412dc #7971 ':terminal Enter/Leave should not increment jumplist'
3a5721e TUI: libtermkey: force CSI driver for mouse input #7948
6ff13d7 #7720 TUI: faster startup
1c6e956 #7862 TUI: fix resize-related segfaults
a58c909 #7676 TUI: always hide cursor when flushing, never flush buffers during unibilium output
303e1df #7624 TUI: disable BCE almost always
249bdb0 #7761 mark: Make sure that jumplist item will not have zero lnum
6f41ce0 #7704 macOS: Set $LANG based on the system locale
a043899 #7633 'Retry fgets on EINTR'

CHANGES:
ad60927 #8304 default to 'nofsync'
f3f1970 #8035 defaults: 'fillchars'
a6052c7 #7984 defaults: sidescroll=1
b69fa86 #7888 defaults: enable cscopeverbose
7c4bb23 defaults: do :filetype stuff unless explicitly "off"
2aa308c #5658 'Apply :lmap in macros'
8ce6393 terminal: Leave 'relativenumber' alone (#8360)
e46534b #4486 refactor: Remove maxmem, maxmemtot options
131aad9 win: defaults: 'shellcmdflag', 'shellxquote' #7343
c57d315 #8031 jobwait(): return -2 on interrupt also with timeout
6452831 clipboard: macOS: fallback to tmux if pbcopy is broken #7940
300d365 #7919 Make 'langnoremap' apply directly after a map
ada1956 #7880 'lua/executor: Remove lightuserdata'

INTERNAL:
de0a954 #7806 internal statistics for list impl
dee78a4 #7708 rewrite internal list impl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system OS resources, pipes, streams
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calls to fgets should be retried when error is EINTR
3 participants