-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Bug Report
import: fetch downloads data to default cache and ignores cache config
Description
When fetching data that is imported with import-url
, the data is first downloaded to the default cache directory before it is moved to the cache directory set in the configuration. This is problematic when a large dataset is imported.
The code that defines the path for the fs
cache uses the default cache directory instead of the one set in the configuration.
# dvc/cachemgr.py
@property
def fs_cache(self):
"""Filesystem-based cache.
Currently used as a temporary location to download files that we don't
yet have a regular oid (e.g. md5) for.
"""
from dvc_data.index import FileStorage
return FileStorage(
key=(),
fs=self.local.fs,
path=self.local.fs.join(self.default_local_cache_dir, self.FS_DIR),
)
Reproduce
dvc import-url --no-download https://data.dvc.org/get-started/data.xml
dvc fetch data.xml.dvc
Expected
The data is downloaded to CACHE_DIR/fs
where CACHE_DIR
is the one set in the configuration.
Environment information
Output of dvc doctor
:
DVC version: 3.60.1 (pip)
-------------------------
Platform: Python 3.13.5 on Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
Subprojects:
dvc_data = 3.16.10
dvc_objects = 5.1.1
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.11
Supports:
http (aiohttp = 3.12.13, aiohttp-retry = 2.9.1),
https (aiohttp = 3.12.13, aiohttp-retry = 2.9.1),
s3 (s3fs = 2025.5.1, boto3 = 1.38.27)
Config:
Global: /home/estilon/.config/dvc
System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdd
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sdd
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/29d288460d09394776878f493af9fadf