Skip to content

Support decompressing stdin. #692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 26, 2024
Merged

Conversation

rcorre
Copy link

@rcorre rcorre commented Jul 17, 2024

Fixes #687.

If "-" is passed as a filename, decompress data from stdin.

Currently --format must be passed as well, but as a next step,
we could try to infer the format from magic numbers.

As stdin is not connected to the terminal, we cannot prompt for Y/N
when warning about decompression in memory, for e.g. zip. Just default
to No, and require passing "-y" in these cases.

For zip, we have to buffer the whole stream in memory to seek into it,
just as we do with a chained decoder like .zip.bz.

The rar format requires an actual file (not an impl Read), so
we write a temp file that it can decode.

When decoding a single-file archive (e.g. file.bz), the output filename
is just -, since we don't know the original filename. I had to add
a bit of a hack to the tests to work around this. Another option
would be to interpret "-d" as a destination filename in this case.

When decoding a multi-file archive, I decided to unpack directly into
the destination directory, as this seemed like a better experience than
adding a top-level "-" folder inside the destination.

Fixes ouch-org#687.

If "-" is passed as a filename, decompress data from stdin.

Currently `--format` must be passed as well, but as a next step,
we could try to infer the format from magic numbers.

As stdin is not connected to the terminal, we cannot prompt for Y/N
when warning about decompression in memory, for e.g. zip. Just default
to No, and require passing "-y" in these cases.

For zip, we have to buffer the whole stream in memory to seek into it,
just as we do with a chained decoder like `.zip.bz`.

The rar format requires an actual file (not an `impl Read`), so
we write a temp file that it can decode.

When decoding a single-file archive (e.g. file.bz), the output filename
is just `-`, since we don't know the original filename. I had to add
a bit of a hack to the tests to work around this. Another option
would be to interpret "-d" as a destination filename in this case.

When decoding a multi-file archive, I decided to unpack directly into
the destination directory, as this seemed like a better experience than
adding a top-level "-" folder inside the destination.
@marcospb19
Copy link
Member

Thanks for such a detailed description of every choice you made, that helps a lot.

This is a very good PR.

When decoding a single-file archive (e.g. file.bz), the output filename
is just -, since we don't know the original filename. I had to add
a bit of a hack to the tests to work around this. Another option
would be to interpret "-d" as a destination filename in this case.

When decoding a multi-file archive, I decided to unpack directly into
the destination directory, as this seemed like a better experience than
adding a top-level "-" folder inside the destination.

Those are indeed weird cases to deal with.

Having an output file named - is weird so what about something like "stdin-output.ext1.ext2", "output.ext1.ext2" or "ouch-output.ext1.ext2"? (any suggestions for names of that?)

About unpacking in the current folder instead of using the smart_unpack functionality, due to lack of a better name for the directory, I think this falls in the same suggestion as the above, if we can have an intuitive name like "stdin-output", then suddenly it makes more sense to unpack it in a folder like usual.

What do you think?

@rcorre
Copy link
Author

rcorre commented Jul 27, 2024

Thanks for taking a look!

Having an output file named - is weird so what about something like "stdin-output.ext1.ext2", "output.ext1.ext2" or "ouch-output.ext1.ext2"? (any suggestions for names of that?)

I'm fine with either of those, but I'm not sure where the extensions would come from.

About unpacking in the current folder instead of using the smart_unpack functionality, due to lack of a better name for the directory, I think this falls in the same suggestion as the above, if we can have an intuitive name like "stdin-output", then suddenly it makes more sense to unpack it in a folder like usual.

As a consumer, I feel like the most convenient behavior is:

  1. ouch d --format tgz -d out < example.tgz will unpack example directly to the directory out
  2. ouch d --format gz -d out < example.gz will decompress example the file out

That being said, I get that it's weird for stdin to change the behavior of -d, so I don't feel strongly about this. I can live with an extra layer of directory -- it's still way easier than remembering how to use tar 😆

@marcospb19
Copy link
Member

I'm fine with either of those, but I'm not sure where the extensions would come from.

It would come from --format, but actually that could be done later, I can create an issue with the suggestion after we merge this.

That being said, I get that it's weird for stdin to change the behavior of -d, so I don't feel strongly about this.

Hahaha yeah that's what I was thinking, if we add special cases for the behavior of -d, it makes it harder to document and/or predict what Ouch is going to do.

It's also hard to remember why after some years.

So can you change it to write to a "stdin-output" (or other name?) file and folder, respectively, for the single-file and multi-file archive? Then we think about extensions later.

(The problem with inferring the extension from --format is that the input stdin might already have an extension, in that case, adding more messes it up.)

rcorre added 2 commits July 31, 2024 06:37
Fixes:

```
error: doc list item missing indentation
   --> src/commands/decompress.rs:237:5
    |
237 | /// Note: This functions assumes that `output_dir` exists
    |     ^
    |
    = help: if this is supposed to be its own paragraph, add a blank line
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#doc_lazy_continuation
    = note: `-D clippy::doc-lazy-continuation` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::doc_lazy_continuation)]`
help: indent this line
    |
237 | ///   Note: This functions assumes that `output_dir` exists
    |     ++
```
@rcorre
Copy link
Author

rcorre commented Aug 1, 2024

aarch/musl build failures seem unrelated:

 /build/musl-cross-make/build/local/aarch64-linux-musl/obj_musl/../src_musl/src/math/frexpl.c:16: undefined reference to `__multf3'

@marcospb19
Copy link
Member

Can't believe 3 weeks went by, sorry, time flies sometimes.

I'll try to get back to this this weekend.

@rcorre
Copy link
Author

rcorre commented Aug 22, 2024

No rush, I know how it goes. Thanks for checking in!

@marcospb19
Copy link
Member

Alright so I need to dedicate a bit of time to figure out why CI failed randomly, but I don't think this PR needs to wait for it.

Thank you so much for the contribution! :) This is a great one.

@marcospb19 marcospb19 merged commit 61f96ca into ouch-org:main Aug 26, 2024
74 of 82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support decompressing stdin
2 participants