-
Notifications
You must be signed in to change notification settings - Fork 288
Closed
Description
's5cmd cat' sub command is not using concurrent connections, like 's5cmd cp' does.
My use-case is downloading a 427 GB tarball from S3 and extracting it on the fly:
time s5cmd cat s3://bucket/file.tar.zst - | pzstd -d | tar -xv -C /
Example EC2 instance type: c5d.9xlarge with 36 CPU cores, 72 GB RAM, 900 GB local SSD
When just comparing the download part with aws cli:
# time aws s3 cp s3://bucket/file.tar.zst - | cat >/dev/null
real 37m56.415s
user 22m50.195s
sys 19m8.677s
(around 192 MB/s)
With 's5cmd cat':
# time s5cmd cat s3://bucket/file.tar.zst >/dev/null
Still running. Only around 85 MB/s on a single S3 connection, according to netstat.
With 's5cmd cp' and writing to disk (without decompression):
time s5cmd cp s3://bucket/file.tar.zst /file.tar.zst
real 23m58.230s
user 7m56.734s
sys 22m40.482s
(around 304 MB/s)
With higher concurrency and larger parts:
# time s5cmd cp -c 36 -p 600 s3://bucket/file.tar.zst /file.tar.zst
real 10m3.064s
user 6m53.378s
sys 41m30.392s
(around 729 MB/s)
jolynch and jvilk-stripe
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done