exp show :Add `--hide-queued` and `--hide-failed` flag #8318

karajan1001 · 2022-09-19T03:39:08Z

❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Add "status" to replace "running" and "queued" in output of exp show.
Add flags --hide-queued and --hide-failed to exp show
Allow exp show to show failed experiments.
Add unit test for the failed experiments shown.
Add error msg to the exp show output

Need a coordinate with VS-CODE extension. @mattseddon

Example of current output in JSON:
for failed exp

"data": {
    "timestamp": ANY,
    "params": {"params.yaml": {"data": {"foo": 2}}},
    "deps": {"copy.py": {"hash": None, "size": None, "nfiles": None}},
    "outs": {},
    "status": "Failed",
    "executor": None,
    "error": {
        "msg": "ERROR: failed to reproduce 'failed-copy-file': "
        "failed to run: python -c 'import sys; sys.exit(1)', "
        "exited with 1",
        "type": "",
    },
}

for other

"data": {
    "deps": {
        "copy.py": {
            "hash": ANY,
            "size": ANY,
            "nfiles": None,
        }
    },
    "metrics": {},
    "outs": {},
    "params": {"params.yaml": {"data": {"foo": 1}}},
    "status": "Success",
    "executor": None,
    "timestamp": timestamp,
    "name": "master",
    "error": {}, 
}

["data"]["error"]["type"] is now only a placeholder. Need to define different types and other info.
2.["data"]["error"]["msg"] will show the last line of the output log.
The old ["data"]["running"] and ["data"]["queued"] were obsolete because they will show the wrong status on the failed experiment to the user.
Current status list in dvc/repo/experiments/show.py

class ExpStatus(Enum):
    Success = 0
    Queued = 1
    Running = 2
    Failed = 3

pmrowla · 2022-09-19T09:15:08Z

dvc/repo/experiments/show.py

+def _get_last_error_message(
+    celery_queue: "LocalCeleryQueue", exp_rev: str
+) -> str:
+    queue_entry: Optional[
+        "QueueEntry"
+    ] = celery_queue.match_queue_entry_by_name(
+        {exp_rev}, celery_queue.iter_failed()
+    ).get(
+        exp_rev
+    )
+    if queue_entry:
+        try:
+            proc_info: "ProcessInfo" = celery_queue.proc[queue_entry.stash_rev]
+            with open(proc_info.stdout, encoding="utf-8") as f_read:
+                return f_read.readlines()[-1].strip()
+        except (KeyError, FileNotFoundError):
+            pass
+    return ""


Not sure this is really a reliable way to handle this case. It only works if the pipeline was run without debugging enabled.

@mattseddon can weigh in on this, but I think it would be better to just use a generic "Experiment run failed." error message, and then add a new field where we can pass the log file path to the vscode extension. And then on the vscode side they could potentially display the entire log to the user.

Actually, passing the log location seems useful in every exp case, not just for failures. This is something that the vscode extension could also get through dvc queue logs but I think it might be easier for them to just get the log file path and read it themselves when needed

My plan on this is to use "type" field in "error" later in which I can add info of in which part (initialization, env preparing, running, result collection) of the tasks the error comes out.

In that case I would still suggest just using a generic error message for now instead of trying to use the last log line, since the last log line is not actually guaranteed to contain anything useful.

In the long run we should probably also add an error message field to the executor info and save it here:

dvc/dvc/repo/experiments/executor/base.py

Lines 568 to 580 in d2a31eb

try:

logger.debug("Running repro in '%s'", os.getcwd())

yield dvc

except CheckpointKilledError:

raise

except DvcException:

if log_errors:

logger.exception("")

raise

except Exception:

if log_errors:

logger.exception("unexpected error")

raise

So on getting an exception during repro, we would just do something like

info.error = str(exc)

and then that would be included in the dump:

dvc/dvc/repo/experiments/executor/base.py

Line 494 in d2a31eb

if infofile is not None:

This would give you a more reliable error message for the failed exp case

@mattseddon can weigh in on this, but I think it would be better to just use a generic "Experiment run failed." error message, and then add a new field where we can pass the log file path to the vscode extension. And then on the vscode side they could potentially display the entire log to the user.

Actually, passing the log location seems useful in every exp case, not just for failures. This is something that the vscode extension could also get through dvc queue logs but I think it might be easier for them to just get the log file path and read it themselves when needed

Yes, the log location would be helpful.

The current approach for any error is to display the provided error as a tooltip in the experiments table. E.g:

It is unlikely that we would want to put the entire log into a tooltip so we would have to change the approach there. It would be good to get as much information about the error as possible without having to parse the logs.

pmrowla

Other than the error log handling this LGTM

mattseddon · 2022-09-20T06:06:47Z

for failed exp

"data": {
    "timestamp": ANY,
    "params": {"params.yaml": {"data": {"foo": 2}}},
    "deps": {"copy.py": {"hash": None, "size": None, "nfiles": None}},
    "outs": {},
    "status": "Failed",
    "executor": None,
    "error": {
        "msg": "ERROR: failed to reproduce 'failed-copy-file': "
        "failed to run: python -c 'import sys; sys.exit(1)', "
        "exited with 1",
        "type": "",
    },
}

for other

"data": {
    "deps": {
        "copy.py": {
            "hash": ANY,
            "size": ANY,
            "nfiles": None,
        }
    },
    "metrics": {},
    "outs": {},
    "params": {"params.yaml": {"data": {"foo": 1}}},
    "status": "Success",
    "executor": None,
    "timestamp": timestamp,
    "name": "master",
    "error": {}, 
}

It would be helpful if you could not add the error key into the dict unless there has been an error (i.e the dict is non-empty).

karajan1001 · 2022-09-30T10:49:14Z

Use a more general message to replace the error message. BTW, I found the tasks can fail for

Failed to acquire lock.
Too many open files.
...
We should gather these and they are helpful msg for the users. But this is a separate feature request from this.

dberenbaum · 2022-09-30T15:43:13Z

Thanks @karajan1001! Could you rebase?

Do you need either @pmrowla or @mattseddon to review?

karajan1001 · 2022-10-03T08:55:37Z

Thanks @karajan1001! Could you rebase?

Do you need either @pmrowla or @mattseddon to review?

I think both the engineering and the API sides require a check.

fix: iterative#7986 1. Add two new flags `--hide-queued` and `--hide-failed` to `exp show` 2. Allow `exp show` to show failed experiments. 3. Add unit test for the failed experiments shown. 4. Add name support for failed exp 5. Add error msg to the `exp show` output

karajan1001 added the feature is a feature label Sep 19, 2022

karajan1001 requested a review from pmrowla September 19, 2022 03:39

karajan1001 self-assigned this Sep 19, 2022

pmrowla reviewed Sep 19, 2022

View reviewed changes

karajan1001 force-pushed the fix7986 branch from 5a12c6b to 2d9964f Compare September 20, 2022 08:54

karajan1001 mentioned this pull request Sep 27, 2022

Use celery status as the exp show status #8369

Merged

2 tasks

karajan1001 force-pushed the fix7986 branch 2 times, most recently from a6de22a to caf7bb2 Compare September 30, 2022 10:19

mattseddon mentioned this pull request Oct 4, 2022

Bump min DVC version to 2.30.0 (Use status from exp show) iterative/vscode-dvc#2521

Merged

karajan1001 requested a review from pmrowla October 6, 2022 01:40

pmrowla approved these changes Oct 6, 2022

View reviewed changes

mattseddon mentioned this pull request Oct 6, 2022

Display failed experiments iterative/vscode-dvc#2535

Merged

karajan1001 force-pushed the fix7986 branch from 85826c3 to ebfd7d0 Compare October 6, 2022 04:36

karajan1001 merged commit 884182f into iterative:main Oct 6, 2022

karajan1001 deleted the fix7986 branch October 6, 2022 09:20

aschuh-hf mentioned this pull request Nov 24, 2022

exp: Standalone experiment running in --temp folder should be included in dvc exp show table #8613

Closed

skshetry mentioned this pull request Jul 7, 2025

exp: add --hide-workspace flag to exp show #10798

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

exp show :Add `--hide-queued` and `--hide-failed` flag #8318

exp show :Add `--hide-queued` and `--hide-failed` flag #8318

Uh oh!

karajan1001 commented Sep 19, 2022 •

edited

Loading

Uh oh!

pmrowla Sep 19, 2022

Uh oh!

karajan1001 Sep 20, 2022

Uh oh!

pmrowla Sep 20, 2022

Uh oh!

mattseddon Sep 20, 2022

Uh oh!

pmrowla left a comment

Uh oh!

mattseddon commented Sep 20, 2022

Uh oh!

karajan1001 commented Sep 30, 2022

Uh oh!

dberenbaum commented Sep 30, 2022

Uh oh!

karajan1001 commented Oct 3, 2022

Uh oh!

Uh oh!

	try:
	logger.debug("Running repro in '%s'", os.getcwd())
	yield dvc
	except CheckpointKilledError:
	raise
	except DvcException:
	if log_errors:
	logger.exception("")
	raise
	except Exception:
	if log_errors:
	logger.exception("unexpected error")
	raise

exp show :Add --hide-queued and --hide-failed flag #8318

exp show :Add --hide-queued and --hide-failed flag #8318

Uh oh!

Conversation

karajan1001 commented Sep 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmrowla Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

karajan1001 Sep 20, 2022

Choose a reason for hiding this comment

Uh oh!

pmrowla Sep 20, 2022

Choose a reason for hiding this comment

Uh oh!

mattseddon Sep 20, 2022

Choose a reason for hiding this comment

Uh oh!

pmrowla left a comment

Choose a reason for hiding this comment

Uh oh!

mattseddon commented Sep 20, 2022

Uh oh!

karajan1001 commented Sep 30, 2022

Uh oh!

dberenbaum commented Sep 30, 2022

Uh oh!

karajan1001 commented Oct 3, 2022

Uh oh!

Uh oh!

exp show :Add `--hide-queued` and `--hide-failed` flag #8318

exp show :Add `--hide-queued` and `--hide-failed` flag #8318

karajan1001 commented Sep 19, 2022 •

edited

Loading