Skip to content

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Jun 26, 2025

This is the last missing bit from #15685.

The difference with the PoC is that opening a pipe, fifo or chardev file will still block until another thread of process has also opened the file. This was separately extracted in #15768 and obsoleted by #15871 that provides a reusable solution, leading the blocking arg to no longer have any purpose.

  • The blocking args of File constructors now default to nil to let the event loops decide how to configure the fd / handle.

  • The polling evloops now set the fd to non blocking by default. There's no downside for regular disk files to have O_NONBLOCK set (the OS overlooks it) while not having it for other the kinds of file is an issue (it blocks).

    Errata: on macOS the polling evloops set the fd to blocking because polling a non-blocking fifo seems incorrect (at the OS level).

  • The IOCP evloop uses OVERLAPPED IO by default. Reading and writing files is now fully async on Windows 🎉

- The `blocking` args of File constructors now defaults to `nil`.
- The polling evloops now set the fd to non blocking by default.
- The IOCP evloop uses OVERLAPPED IO by default.
@straight-shoota straight-shoota added this to the 1.17.0 milestone Jun 26, 2025
@straight-shoota straight-shoota added the kind:breaking Intentional breaking change with significant impact. Shows up on top of the changelog. label Jun 26, 2025
@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jun 28, 2025

There seems to be something wrong on macOS... Maybe the spec is blocking, somehow.

EDIT: I disabled the spec on darwin, and apparently it passes.

EDIT: yes, one fiber fails not_nil for some reason 😢

  can read/write fifo file without blocking

Unhandled exception in spawn: Nil assertion failed (NilAssertionError)
  from src/nil.cr:113:7 in 'not_nil!'
  from src/nil.cr:109:3 in 'not_nil!'
  from spec/std/file_spec.cr:51:3 in '->'
  from src/wait_group.cr:68:13 in '->'
  from src/fiber.cr:170:11 in 'run'
  from src/fiber.cr:105:3 in '->'

The error is easy to fix: wait for the thread to be terminated so either the writer is opened or it will re-raise an exception.

Problem is, with less than 4 reads/writes then the spec terminates, but starting with 4+ read/write loops, then it eventually blocks 😭

Here's the evloop tracing, with execution contexts enabled:

evloop.write 2257866931450 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3600:writer fd=9
evloop.write 2257866943212 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3600:writer fd=9
evloop.wait_writable 2257866948573 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3600:writer fd=9
evloop.kevent 2257866953322 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3600:writer op=add fd=9 index=38654705664
evloop.read 2257866967971 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3540:reader fd=10
evloop.read 2257866980312 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3540:reader fd=10
evloop.wait_readable 2257866986096 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3540:reader fd=10
evloop.kevent 2257866992343 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3540:reader op=add fd=10 index=42949672960
evloop.run 2257867003458 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3540:reader blocking=0
evloop.event 2257867025557 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3540:reader fd=9 index=38654705664 filter=-2 flags=33 fflags=0
evloop.event 2257867043832 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3540:reader fd=10 index=42949672960 filter=-2 flags=33 fflags=0
evloop.done_writable 2257867050735 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3600:writer fd=9
evloop.write 2257867057484 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3600:writer fd=9
evloop.wait_writable 2257867062706 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3600:writer fd=9
evloop.run 2257867067933 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3600:writer blocking=0
evloop.run 2257867646682 thread=0x7ff859141a00:DEFAULT fiber=0x1103c3cc0:DEFAULT:loop blocking=1

@straight-shoota straight-shoota removed this from the 1.17.0 milestone Jun 30, 2025
@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jun 30, 2025

It might an issue with the kqueue event loop: there's one event for fd=9 (writer) and fd=10 (reader) but only the writer seems to be resumed.

EDIT: it doesn't reproduce on FreeBSD.

EDIT: there's definitely something wrong on macOS:

  • writer fiber: waits for writable (fd=9)
  • writer fiber: runs the evloop
  • writer fiber: gets a kevent for fd=9
  • ... nothing is enqueued 🤨

EDIT: the kevent.filter is... the opposite of what it should be: it reports EVFILT_READ for the writer (fd=9), and EVFILT_WRITE for the reader (fd=10), which leads to resume nothing because there's no waiter for the reported readiness :feelsgood:

EDIT: If I reverse the reader and the writer spawns, then the fibers start talking, then it waits for write, we get an kevent that fd=9 and fd=10 are ready for write, which resumes the writer that ends up waiting again... then we never get the read readiness kevent for fd=10 and it hangs forever.

AFAICT: Non-blocking fifo read/write is broken on macOS for some reason. Maybe it gets confused by the two EVFILT_READ + EVFILT_WRITE kevent registration... but that doesn't happen for sockets, so that's very weird.

Maybe it gets confused because the fifo file is opened by the same thread in the same process, which is maybe an odd case?

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jun 30, 2025

I tried always resuming a reader along with a writer, but to no avail: it does resume both fibers but both read and write will fail with EAGAIN 😭

EDIT: I tried libevent: the fifo read/write also hangs forever on macOS, aka it's macOS that's broken 🤷

@ysbaddaden ysbaddaden force-pushed the rework/file-non-blocking-behavior branch from cc20573 to 855aa85 Compare June 30, 2025 17:07
@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jun 30, 2025

Files now default to blocking on macOS because polling of non-blocking fifo files appears to be incorrect (at the OS level). This reinforces the argument that we should never assume any specific configuration for the system fd 😓

@ysbaddaden
Copy link
Contributor Author

I investigated some more, using two threads as well as two processes (one reader, one writer) and sending/reading enough data to overflow the kernel buffers: the FIFO communication always gets stuck whenever I set O_NONBLOCK on macOS, but always succeeds when I keep the fd to blocking.

I.e. non-blocking FIFO doesn't work on macOS.

Co-authored-by: Johannes Müller <straightshoota@gmail.com>
@straight-shoota straight-shoota added this to the 1.17.0 milestone Jul 1, 2025
@straight-shoota straight-shoota merged commit 1ca6be6 into crystal-lang:master Jul 2, 2025
38 checks passed
@straight-shoota straight-shoota changed the title File: let the event loop decide the blocking mode Let the event loop decide the blocking mode of File Jul 2, 2025
@ysbaddaden ysbaddaden deleted the rework/file-non-blocking-behavior branch July 3, 2025 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:breaking Intentional breaking change with significant impact. Shows up on top of the changelog. topic:stdlib:runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants