Skip to content

DVC integration broken #7421

@maxstrobel

Description

@maxstrobel

Describe the bug

The DVC integration seems to be broken.
Followed this guide: https://dvc.org/doc/user-guide/integrations/huggingface

Steps to reproduce the bug

Script to reproduce

from datasets import load_dataset

dataset = load_dataset(
    "csv",
    data_files="dvc://workshop/satellite-data/jan_train.csv",
    storage_options={"url": "https://github.com/iterative/dataset-registry.git"},
)

print(dataset)

Error log

Traceback (most recent call last):
  File "C:\tmp\test\load.py", line 3, in <module>
    dataset = load_dataset(
              ^^^^^^^^^^^^^
  File "C:\tmp\test\.venv\Lib\site-packages\datasets\load.py", line 2151, in load_dataset
    builder_instance.download_and_prepare(
  File "C:\tmp\test\.venv\Lib\site-packages\datasets\builder.py", line 808, in download_and_prepare
    fs, output_dir = url_to_fs(output_dir, **(storage_options or {}))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: url_to_fs() got multiple values for argument 'url'

Expected behavior

Integration would work and the indicated file is downloaded and opened.

Environment info

Python version

python --version
Python 3.11.10

Venv (pip install datasets dvc):

Package                Version
---------------------- -----------
aiohappyeyeballs       2.4.6
aiohttp                3.11.13
aiohttp-retry          2.9.1
aiosignal              1.3.2
amqp                   5.3.1
annotated-types        0.7.0
antlr4-python3-runtime 4.9.3
appdirs                1.4.4
asyncssh               2.20.0
atpublic               5.1
attrs                  25.1.0
billiard               4.2.1
celery                 5.4.0
certifi                2025.1.31
cffi                   1.17.1
charset-normalizer     3.4.1
click                  8.1.8
click-didyoumean       0.3.1
click-plugins          1.1.1
click-repl             0.3.0
colorama               0.4.6
configobj              5.0.9
cryptography           44.0.1
datasets               3.3.2
dictdiffer             0.9.0
dill                   0.3.8
diskcache              5.6.3
distro                 1.9.0
dpath                  2.2.0
dulwich                0.22.7
dvc                    3.59.1
dvc-data               3.16.9
dvc-http               2.32.0
dvc-objects            5.1.0
dvc-render             1.0.2
dvc-studio-client      0.21.0
dvc-task               0.40.2
entrypoints            0.4
filelock               3.17.0
flatten-dict           0.4.2
flufl-lock             8.1.0
frozenlist             1.5.0
fsspec                 2024.12.0
funcy                  2.0
gitdb                  4.0.12
gitpython              3.1.44
grandalf               0.8
gto                    1.7.2
huggingface-hub        0.29.1
hydra-core             1.3.2
idna                   3.10
iterative-telemetry    0.0.10
kombu                  5.4.2
markdown-it-py         3.0.0
mdurl                  0.1.2
multidict              6.1.0
multiprocess           0.70.16
networkx               3.4.2
numpy                  2.2.3
omegaconf              2.3.0
orjson                 3.10.15
packaging              24.2
pandas                 2.2.3
pathspec               0.12.1
platformdirs           4.3.6
prompt-toolkit         3.0.50
propcache              0.3.0
psutil                 7.0.0
pyarrow                19.0.1
pycparser              2.22
pydantic               2.10.6
pydantic-core          2.27.2
pydot                  3.0.4
pygit2                 1.17.0
pygments               2.19.1
pygtrie                2.5.0
pyparsing              3.2.1
python-dateutil        2.9.0.post0
pytz                   2025.1
pywin32                308
pyyaml                 6.0.2
requests               2.32.3
rich                   13.9.4
ruamel-yaml            0.18.10
ruamel-yaml-clib       0.2.12
scmrepo                3.3.10
semver                 3.0.4
setuptools             75.8.0
shellingham            1.5.4
shortuuid              1.0.13
shtab                  1.7.1
six                    1.17.0
smmap                  5.0.2
sqltrie                0.11.2
tabulate               0.9.0
tomlkit                0.13.2
tqdm                   4.67.1
typer                  0.15.1
typing-extensions      4.12.2
tzdata                 2025.1
urllib3                2.3.0
vine                   5.1.0
voluptuous             0.15.2
wcwidth                0.2.13
xxhash                 3.5.0
yarl                   1.18.3
zc-lockfile            3.0.post1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions