-
-
Notifications
You must be signed in to change notification settings - Fork 604
Description
I wrote the author of readdirp a few weeks ago because after tracking down the source of my electron application lockup at startup, I am pretty convinced that its approach to blasting the filesystem with a huge number of parallel requests is flawed / not scalable. Plus, I think this approach is the source of many bugs or performance issues (such as #229) and has caused workarounds like needing graceful-fs, backpressure, etc.
In my personal project I have replaced readdirp in chokidar with a simple walk implementation to meet my needs and things are smoother and much faster for me (no numbers, but first want to float an idea).
Basically, my approach was to make it massively serial (async.eachSeries) so that it played nicely at any scale and to simplfy it since there is some funky async logic in it. Obviously, some parallelism could improve throughput, but it adds complexity to manage performance for large scans (but still is probably desired).
Proposal: replace readdirp
I could extract my filtered walk into a small library. My current API is simply:
walk(rootPath, (path, stat) { /* filter */ }, done)
.on('file', function(path, stat) {})
.on('directory', function(path, stat) {});
It would do lstat before every filter call like you rely on, but things like globing and depth would be left up to the library user.
Customizing it to chokidar's needs:
- _isIgnored would be quite simple since the filter is called with stats, you can process directories and files differently - if you filter a directory, it doesn't traverse it and there is no event emitter; whereas, if you filter a file, you just do not get an event for it.
- depth would be up to splitting the directly path and comparing the number of links
- readdirp 'entry' could be generated (or partially generated with only what you need) when emitting the file or directory.
- globbing would be handled in the filter
This change would require a small to medium refactor of chokidar so before considering me developing the code and some performance tests, I was wondering if you agree with the philosophy of the proposal (serial and no graceful-fs), have feedback, etc. Also, I'm not sure about all of the edge cases so I might be proposing something with a serious flaw.
Interested? Suggestions (like an infinitely scalable and simple parallelism strategy)?