Skip to content

gzip files can contain multiple concatenated gzips #794

@TomiBelan

Description

@TomiBelan

What version are you using (fq -v)?

$ fq -v
0.8.0 (linux amd64)

How was fq installed?

go run

Can you reproduce the problem using the latest release or master branch?

Yes

What did you do?

$ printf aaaaaaaaaa | gzip > test.gz
$ printf bbbbbbbbbb | gzip >> test.gz
$ zcat test.gz; echo .
aaaaaaaaaabbbbbbbbbb.
$ go run github.com/wader/fq@master dd test.gz
go: downloading github.com/wader/fq v0.8.1-0.20231020164445-1a3823f1877b
     |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|.{}: test.gz (gzip)
0x000|1f 8b                                          |..              |  identification: raw bits (valid)
0x000|      08                                       |  .             |  compression_method: "deflate" (8)
     |                                               |                |  flags{}:
0x000|         00                                    |   .            |    text: false
0x000|         00                                    |   .            |    header_crc: false
0x000|         00                                    |   .            |    extra: false
0x000|         00                                    |   .            |    name: false
0x000|         00                                    |   .            |    comment: false
0x000|         00                                    |   .            |    reserved: 0
0x000|            00 00 00 00                        |    ....        |  mtime: 0 (1970-01-01T00:00:00Z)
0x000|                        00                     |        .       |  extra_flags: 0
0x000|                           03                  |         .      |  os: "unix" (3)
     |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|
  0x0|61 61 61 61 61 61 61 61 61 61|                 |aaaaaaaaaa|     |  uncompressed: raw bits
0x000|                              4b 4c 84 01 00   |          KL... |  compressed: raw bits
0x000|                                             f0|               .|  crc32: 0x4c11cdf0 (valid)
0x010|cd 11 4c                                       |..L             |
0x010|         0a 00 00 00                           |   ....         |  isize: 10
0x010|                     1f 8b 08 00 00 00 00 00 00|       .........|  gap0: raw bits
0x020|03 4b 4a 82 01 00 f8 4c 2f 42 0a 00 00 00|     |.KJ....L/B....| |

What result did you expect?

The top level type should not be an object with "identification", "compression_method" etc., but an array of such objects.

A gzip file consists of a series of “members” (compressed data sets). The format of each member is specified in the following section. The members simply appear one after another in the file, with no additional information before, between, or after them. RFC 1952

What did you see instead?

The "bbbbbbbbbb" member is shown as gap0 and not parsed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions