Skip to content

import: fetch downloads data to default cache dir #10794

@italosestilon

Description

@italosestilon

Bug Report

import: fetch downloads data to default cache and ignores cache config

Description

When fetching data that is imported with import-url, the data is first downloaded to the default cache directory before it is moved to the cache directory set in the configuration. This is problematic when a large dataset is imported.

The code that defines the path for the fs cache uses the default cache directory instead of the one set in the configuration.

# dvc/cachemgr.py
    @property
    def fs_cache(self):
        """Filesystem-based cache.

        Currently used as a temporary location to download files that we don't
        yet have a regular oid (e.g. md5) for.
        """
        from dvc_data.index import FileStorage

        return FileStorage(
            key=(),
            fs=self.local.fs,
            path=self.local.fs.join(self.default_local_cache_dir, self.FS_DIR),
        )

Reproduce

  1. dvc import-url --no-download https://data.dvc.org/get-started/data.xml
  2. dvc fetch data.xml.dvc

Expected

The data is downloaded to CACHE_DIR/fs where CACHE_DIR is the one set in the configuration.

Environment information

Output of dvc doctor:

DVC version: 3.60.1 (pip)
-------------------------
Platform: Python 3.13.5 on Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
Subprojects:
        dvc_data = 3.16.10
        dvc_objects = 5.1.1
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.11
Supports:
        http (aiohttp = 3.12.13, aiohttp-retry = 2.9.1),
        https (aiohttp = 3.12.13, aiohttp-retry = 2.9.1),
        s3 (s3fs = 2025.5.1, boto3 = 1.38.27)
Config:
        Global: /home/estilon/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdd
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sdd
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/29d288460d09394776878f493af9fadf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions