-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Copy link
Labels
fs: s3Related to the S3 filesystemRelated to the S3 filesystem
Description
Bug Report
Description
Here is a GitHub Action that ran:
dvc import-url s3://tripdata/202505-citibike-tripdata.zip s3/tripdata/202505-citibike-tripdata.zip
# Importing 's3://tripdata/202505-citibike-tripdata.zip' -> 's3/tripdata/202505-citibike-tripdata.zip'
dvc import-url s3://tripdata/JC-202505-citibike-tripdata.csv.zip s3/tripdata/JC-202505-citibike-tripdata.csv.zip
# Importing 's3://tripdata/JC-202505-citibike-tripdata.csv.zip' -> 's3/tripdata/JC-202505-citibike-tripdata.csv.zip'
dvc push
# 2 files pushed
However, the first imported file (s3/tripdata/202505-citibike-tripdata.zip
) ended up truncated, in my S3 remote cache.
I backed up the truncated blob with a .bad
suffix, and then manually fixed the blob in the cache (with aws s3 cp
, dvc add
, dvc push
):
aws s3 ls s3://ctbk/.dvc/files/md5/9e/880ca091cc946d563ea4b115ec443e
# 2025-06-06 19:44:58 844607858 880ca091cc946d563ea4b115ec443e
# 2025-06-06 19:39:50 838860800 880ca091cc946d563ea4b115ec443e.bad
Verifying that 9e/880ca091cc946d563ea4b115ec443e.bad
is a prefix of the full blob:
aws s3 cp s3://ctbk/.dvc/files/md5/9e/880ca091cc946d563ea4b115ec443e.bad - | md5sum
# ef7b7328a690dfdc9858c2da4cad9f41 -
bad_size="$(aws s3 ls s3://ctbk/.dvc/files/md5/9e/880ca091cc946d563ea4b115ec443e.bad | awk '{print $3}')"; echo $bad_size
# 838860800
aws s3 cp s3://ctbk/.dvc/files/md5/9e/880ca091cc946d563ea4b115ec443e - 2>/dev/null | head -c "$bad_size" | md5sum
# ef7b7328a690dfdc9858c2da4cad9f41 -
Reproduce
I'm guessing it was a transient issue in my GHA run. I haven't tried to reproduce it.
I'm not sure which one failed here:
- It could be that
import-url
failed,dvc push
happily pushed the truncated blob - Or
import-url
may have been fine, butpush
silently failed to complete.
Expected
If import-url
or push
fails to import or push a full file, the command should exit non-zero, and some errors should be logged.
Environment information
You can see everything in the the GHA:
ubuntu-latest
pip install
output showsdvc-3.59.2
and etc.
Metadata
Metadata
Assignees
Labels
fs: s3Related to the S3 filesystemRelated to the S3 filesystem
Type
Projects
Status
Done