-
Notifications
You must be signed in to change notification settings - Fork 106
Closed
Description
Hi,
I'm using xh to download parts of common crawl data segments using the -dco
flags, like this :
xh get -dco 4607e003-835a-47a8-af2a-0ddda862b101.gz https://data.commoncrawl.org/crawl-data/CC-MAIN-2022-21/segments/1652662509990.19/robotstxt/CC-MAIN-20220516041337-20220516071337-00631.warc.gz Range:bytes=2150280-2151022
if the file does not exist yet, this works fine :
age: 6036
content-length: 743
content-range: bytes 2150280-2151022/2379541
content-type: application/octet-stream
date: Tue, 17 Dec 2024 14:38:29 GMT
etag: "4b8c3e82521a956c0828d015ed6bbe33"
last-modified: Mon, 16 May 2022 09:09:00 GMT
server: AmazonS3
via: 1.1 b23911d471c22383c023eec862afc500.cloudfront.net (CloudFront)
x-amz-cf-id: 5PRFJohIDU3Bdpj742IoMPg1tbqQXRvuGjUw9VI6Z9kzraobpdmB6A==
x-amz-cf-pop: BRU50-P1
x-amz-storage-class: INTELLIGENT_TIERING
x-amz-version-id: null
x-cache: Hit from cloudfront
Downloading 743 B to "4607e003-835a-47a8-af2a-0ddda862b101.gz"
Done. 743 B in 0.00113s (642.85 KiB/s)
However, if the file does exist, this no longer works (ie by executing the same command a second time) :
HTTP/2.0 206 Partial Content
age: 6091
content-length: 743
content-range: bytes 2150280-2151022/2379541
content-type: application/octet-stream
date: Tue, 17 Dec 2024 14:38:29 GMT
etag: "4b8c3e82521a956c0828d015ed6bbe33"
last-modified: Mon, 16 May 2022 09:09:00 GMT
server: AmazonS3
via: 1.1 961d53799e25f07a5cd3c15086a9948c.cloudfront.net (CloudFront)
x-amz-cf-id: L466R4kzNFkFWYSzbrQ8UDEGtY0i2dQjQ2z4Bl8U0kUKwZu3VSqwSA==
x-amz-cf-pop: BRU50-P1
x-amz-storage-class: INTELLIGENT_TIERING
x-amz-version-id: null
x-cache: Hit from cloudfront
xh: error: Content-Range has wrong end: "bytes 2150280-2151022/2379541"
Note that the server returns the same header twice, indicating that it sent bytes 2150280-2151022 of a file that totals 2379541 bytes. Only if the file already exists, xh crashes on this header.
Metadata
Metadata
Assignees
Labels
No labels