Skip to content

content-range response header cannot be parsed correctly #391

@avhou

Description

@avhou

Hi,
I'm using xh to download parts of common crawl data segments using the -dco flags, like this :

xh get -dco 4607e003-835a-47a8-af2a-0ddda862b101.gz https://data.commoncrawl.org/crawl-data/CC-MAIN-2022-21/segments/1652662509990.19/robotstxt/CC-MAIN-20220516041337-20220516071337-00631.warc.gz Range:bytes=2150280-2151022

if the file does not exist yet, this works fine :

age: 6036
content-length: 743
content-range: bytes 2150280-2151022/2379541
content-type: application/octet-stream
date: Tue, 17 Dec 2024 14:38:29 GMT
etag: "4b8c3e82521a956c0828d015ed6bbe33"
last-modified: Mon, 16 May 2022 09:09:00 GMT
server: AmazonS3
via: 1.1 b23911d471c22383c023eec862afc500.cloudfront.net (CloudFront)
x-amz-cf-id: 5PRFJohIDU3Bdpj742IoMPg1tbqQXRvuGjUw9VI6Z9kzraobpdmB6A==
x-amz-cf-pop: BRU50-P1
x-amz-storage-class: INTELLIGENT_TIERING
x-amz-version-id: null
x-cache: Hit from cloudfront

Downloading 743 B to "4607e003-835a-47a8-af2a-0ddda862b101.gz"
Done. 743 B in 0.00113s (642.85 KiB/s)

However, if the file does exist, this no longer works (ie by executing the same command a second time) :

HTTP/2.0 206 Partial Content
age: 6091
content-length: 743
content-range: bytes 2150280-2151022/2379541
content-type: application/octet-stream
date: Tue, 17 Dec 2024 14:38:29 GMT
etag: "4b8c3e82521a956c0828d015ed6bbe33"
last-modified: Mon, 16 May 2022 09:09:00 GMT
server: AmazonS3
via: 1.1 961d53799e25f07a5cd3c15086a9948c.cloudfront.net (CloudFront)
x-amz-cf-id: L466R4kzNFkFWYSzbrQ8UDEGtY0i2dQjQ2z4Bl8U0kUKwZu3VSqwSA==
x-amz-cf-pop: BRU50-P1
x-amz-storage-class: INTELLIGENT_TIERING
x-amz-version-id: null
x-cache: Hit from cloudfront

xh: error: Content-Range has wrong end: "bytes 2150280-2151022/2379541"

Note that the server returns the same header twice, indicating that it sent bytes 2150280-2151022 of a file that totals 2379541 bytes. Only if the file already exists, xh crashes on this header.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions