Skip to content

's5cmd cat' sub command is not using concurrent connections #245

@kaazoo

Description

@kaazoo

's5cmd cat' sub command is not using concurrent connections, like 's5cmd cp' does.

My use-case is downloading a 427 GB tarball from S3 and extracting it on the fly:
time s5cmd cat s3://bucket/file.tar.zst - | pzstd -d | tar -xv -C /

Example EC2 instance type: c5d.9xlarge with 36 CPU cores, 72 GB RAM, 900 GB local SSD

When just comparing the download part with aws cli:

# time aws s3 cp s3://bucket/file.tar.zst - | cat >/dev/null
real    37m56.415s
user    22m50.195s
sys     19m8.677s
(around 192 MB/s)

With 's5cmd cat':

# time s5cmd cat s3://bucket/file.tar.zst >/dev/null
Still running. Only around 85 MB/s on a single S3 connection, according to netstat.

With 's5cmd cp' and writing to disk (without decompression):

time s5cmd cp s3://bucket/file.tar.zst /file.tar.zst
real    23m58.230s
user    7m56.734s
sys     22m40.482s
(around 304 MB/s)

With higher concurrency and larger parts:

# time s5cmd cp -c 36 -p 600 s3://bucket/file.tar.zst /file.tar.zst
real    10m3.064s
user    6m53.378s
sys     41m30.392s
(around 729 MB/s)

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions