-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
triageNeeds to be triagedNeeds to be triaged
Description
I'm having an issue with queued jobs failing with no explanation. The jobs complete as expected when run without the --queue flag. Somewhat similar to #10673 but there are some differences (some of the tasks #10673 completed) so I'm starting a new issue. I also tried using absolute paths in dvc.yaml as mentioned in #8747 but that didn't work.
Running directly with
dvc exp run --set-param decimation_percent=50 --name "dp-502"
works as expected:
root@docker-desktop:/eod3d_pipeline# dvc exp run --set-param decimation_percent=50 --name "dp-502"
Reproducing experiment 'dp-502'
Building workspace index |445 [00:00, 738entry/s]
Comparing indexes |444 [00:00, 24.6kentry/s]
WARNING: No file hash info found for '/eod3d_pipeline/results_50/plots'. It won't be created.
WARNING: No file hash info found for '/eod3d_pipeline/results_50/metrics.json'. It won't be created.
Applying changes |0.00 [00:01, ?file/s]
Stage 'query' didn't change, skipping
Running stage 'eval':
> python eval.py
Importing samples...
100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 432/432 [63.1ms elapsed, 0s remaining, 6.8K samples/s]
100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 432/432 [5.3s elapsed, 0s remaining, 94.7 samples/s]
Updating lock file 'dvc.lock'
Ran experiment(s): dp-502
Experiment results have been applied to your workspace.
root@docker-desktop:/eod3d_pipeline#
However, when I do dvc exp run --queue --set-param decimation_percent=50 --name "dp-502"
and then dvc queue start -v
I get the following output:
root@docker-desktop:/eod3d_pipeline# dvc queue start -v
2025-06-12 13:24:30,686 DEBUG: v3.60.0 (pip), CPython 3.9.23 on Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.36
2025-06-12 13:24:30,686 DEBUG: command: /usr/local/bin/dvc queue start -v
2025-06-12 13:24:31,171 DEBUG: Spawning 1 exp queue workers
2025-06-12 13:24:32,257 DEBUG: Worker status: {}
2025-06-12 13:24:32,259 DEBUG: Spawning exp queue worker
2025-06-12 13:24:32,259 DEBUG: start a new worker: dvc-exp-worker-1, node: dvc-exp-9e32c0-1@localhost
Started '1' new experiments task queue worker.
2025-06-12 13:24:32,273 DEBUG: Analytics is enabled.
2025-06-12 13:24:32,308 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpp1vkuyyv', '-v']
2025-06-12 13:24:32,314 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpp1vkuyyv', '-v'] with pid 24563
root@docker-desktop:/eod3d_pipeline# dvc doctor
dvc queue status
shows:
root@docker-desktop:/eod3d_pipeline# dvc queue status
Task Name Created Status
56cca35 dp-502 01:22 PM Failed
dvc queue logs 56cca35
does not return any logs:
root@docker-desktop:/eod3d_pipeline# dvc queue logs dp-502
ERROR: No output logs found for experiment 'dp-502'
root@docker-desktop:/eod3d_pipeline#
The output of dvc doctor
:
root@docker-desktop:/eod3d_pipeline# dvc doctor
DVC version: 3.60.0 (pip)
-------------------------
Platform: Python 3.9.23 on Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.36
Subprojects:
dvc_data = 3.16.10
dvc_objects = 5.1.1
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.11
Supports:
http (aiohttp = 3.12.9, aiohttp-retry = 2.9.1),
https (aiohttp = 3.12.9, aiohttp-retry = 2.9.1)
Config:
Global: /root/.config/dvc
System: /etc/xdg/dvc
Cache types: symlink
Cache directory: 9p on C:\
Caches: local
Remotes: local
Workspace directory: 9p on C:\
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/965097ccbc74643f161ae89ff14d8ce1
Metadata
Metadata
Assignees
Labels
triageNeeds to be triagedNeeds to be triaged