Skip to content

Conversation

jntesteves
Copy link
Contributor

normalizeInputSource: always pipe stream through a PassThrough stream

This is needed to guarantee pausing the stream if it's already flowing, since it will only be processed in a (distant) future iteration of the event loop, and will lose data if already flowing when added to the queue.

This should guarantee maximum compatibility without breaking anything as far as I can tell. I tried checking the stream's state, but couldn't find a way to tell for sure if it will lose data due to early consumption, hence this solution of always pausing.

Pausing with a PassThrough stream gives a buffer and guarantees the stream will be paused. First I tried simply calling the stream#pause() method, as was done in a very old commit in archiver (from before the utils repo existed), but that doesn't guarantee the stream will be paused and kept losing data in my tests.

A couple ways to reproduce the problem with archiver:

Spawning child processes:

const fs = require('fs')
const { spawn } = require('child_process')
const archiver = require('archiver')

const archive = archiver('zip')
archive.pipe(fs.createWriteStream('spawn-archiver.zip'))

for (let i = 0; i < 10; ++i) {
  const child = spawn('dir', [], {
    shell: true,
    stdio: ['ignore', 'pipe', 'ignore']
  })

  archive.append(child.stdout, { name: 'test' + i + '.txt' })
}

archive.finalize()

Processing the response of a request:

const archiver = require('archiver')
const fs = require('fs')
const r = require('request')
const zlib = require('zlib')

const output = fs.createWriteStream('./http.IncomingMessage-archiver.zip')

const archive = archiver('zip', {
  store: true
})

const request = r({
  method: 'GET',
  uri: 'https://4.bp.blogspot.com/-jT50buLTo8M/U6Ray3egZEI/AAAAAAAABoY/VWKmka_LTZA/s1600/1.jpg'
})

// pipe archive data to the file
archive.pipe(output)

request.on('response', response => {
  if (response.headers['content-encoding'] === 'gzip') {
    response = response.pipe(zlib.createGunzip())
  }

  console.log(`response.readableFlowing ${response.readableFlowing}`)
  archive.append(response, { name: 'db.jpg' })

  // finalize the archive (ie we are done appending files but streams have to finish yet)
  // 'close', 'end' or 'finish' may be fired right after calling this method so register to them beforehand
  archive.finalize()
})

This fixes node-archiver's issue 364

This is needed to guarantee pausing the stream if it's already flowing, since it will only be processed in a (distant) future iteration of the event loop, and will lose data if already flowing when added to the queue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants