CI conda: Ignore baseline test failures #36678

mkoeppe · 2023-11-08T21:50:09Z

This makes the CI conda pass by marking affected files as "known baseline failures", thus reducing the noise that this workflow makes on PRs (including #36666 (comment))

Less intrusive version of #36372.

In addition to the failure marked there, here we collect a number of more failures observed over the course of a week in https://github.com/sagemath/sage/actions/workflows/ci-conda.yml?query=is%3Acompleted

📝 Checklist

The title is concise, informative, and self-explanatory.
The description explains in detail what this PR is about.
I have linked a relevant issue or discussion.
I have created tests covering the changes.
I have updated the documentation accordingly.

⌛ Dependencies

tobiasdiez

There are only a couple of test failures in these files. These ignore patterns are way to coarse.

(removing blocker label for that reason to not hide possible issues, and because it's not working anyway in this context)

mkoeppe · 2023-11-09T01:38:40Z

because it's not working anyway in this context

It's working as designed. Green checkmarks. Failures marked as "failed in baseline"
https://github.com/sagemath/sage/actions/runs/6804955526/job/18503541137?pr=36678#step:11:10205

tobiasdiez · 2023-11-09T01:42:58Z

So if another new test in the same file fails, the conda ci is red?

mkoeppe · 2023-11-09T02:17:40Z

Perhaps you should clarify what you think the purpose of the ci-conda workflows is.
To my understanding they are being run so that the major breakage that we have seen in the past (from unmonitored package upgrades in conda-forge) does not go unnoticed.
I don't think it's really the job of this workflow to duplicate the test coverage done by build.yml

Marking these files as baseline failures is done so that on unrelated PRs, the red checkmarks on the conda workflows do not annoy developers.

mkoeppe · 2023-11-09T02:20:13Z

And obviously work is needed to actually diagnose, report, and fix these failures. The approach of #36372 ("skip_conda") hides these failures too well.

tobiasdiez · 2023-11-09T20:32:35Z

Perhaps you should clarify what you think the purpose of the ci-conda workflows is. To my understanding they are being run so that the major breakage that we have seen in the past (from unmonitored package upgrades in conda-forge) does not go unnoticed. I don't think it's really the job of this workflow to duplicate the test coverage done by build.yml

They have the same purpose as the Build & Test workflow for people using the conda environment. So yes, they should duplicate the test coverage.

(In the long term, once the conda runs are a bit more stable and there are no more conda-specific issues, we can remove the Build & Test workflow).

kwankyu · 2023-11-10T00:47:59Z

They have the same purpose as the Build & Test workflow for people using the conda environment. So yes, they should duplicate the test coverage.

B&T is mainly for checking sage library changes to help developers and reviewers. It is unlikely that some code change in the sage library works on a platform and fails on another. That is why B&T runs only on a specific decent platform and tests on all other platforms are done only for weekly release.

I do not use conda (linux neither). I regard conda as another platform. So it looks strange to me that currently 6 B&T workflows for conda platforms run for each PR. I think there is already too much duplication.

Perhaps conda is special, and separate B&T for conda is necessary for PRs (perhaps upgrading packages). Then why not just two B&T for conda (one for ubuntu and one for mac) for PRs and more tests for weekly release?

About #36372, I object to introducing conda-specific doctest tags into sage library. That is another cluttering of sage library, in addition to all doctest tags for modularization. We decided to live with modularization. But no platform-specific doctest tags, please.

mkoeppe · 2023-11-10T18:10:23Z

it looks strange to me that currently 6 B&T workflows for conda platforms run for each PR. I think there is already too much duplication.

FWIW, I approved the change to run these jobs on all PRs in #36373. Tobias has already implemented a mitigation (see #36373 (comment)) there on my request by making it "fail-fast" and setting a max-parallel 2.

But I think at least the macOS jobs need reconsideration / additional adjustment. Given that we only have 5 macOS runners, launching 3 jobs (each 1 hour) for every PR is clearly too much.

For example, at the time that 10.2.rc1 was released (over an 1 hour ago), at least 7 "ci-conda" workflows were already in the queue (https://github.com/sagemath/sage/actions/workflows/ci-conda.yml?query=is%3Aqueued), with at least 14 macOS jobs waiting for a runner. As a result, the CI-macos run of 10.2.rc1 (https://github.com/sagemath/sage/actions/runs/6827163283) as well as the cibuildwheel job for macOS (https://github.com/sagemath/sage/actions/runs/6827163257/job/18568938031) have not been able to start yet.

mkoeppe · 2023-11-11T22:20:14Z

I think at least the macOS jobs need reconsideration / additional adjustment. Given that we only have 5 macOS runners, launching 3 jobs (each 1 hour) for every PR is clearly too much.

Implemented in #36694

mkoeppe · 2023-11-14T18:22:34Z

Then why not just two B&T for conda (one for ubuntu and one for mac) for PRs and more tests for weekly release?

I agree and have updated #36694 to do this.

… in https://github.com/sagemath/sage/actions/runs/6870325919/job/18684964234\#step:11:10059

… in https://github.com/sagemath/sage/actions/runs/6859244281/job/18651257775\#step:11:10102

… in https://github.com/sagemath/sage/actions/runs/6840579851/job/18600012965\#step:11:13016

… in https://github.com/sagemath/sage/actions/runs/6840502530/job/18599835766\#step:11:10107

… in https://github.com/sagemath/sage/actions/runs/6836592771/job/18591690058\#step:11:10059

… in https://github.com/sagemath/sage/actions/runs/6835143781/job/18588599649\#step:11:16939

… in https://github.com/sagemath/sage/actions/runs/6827937663/job/18571052628\#step:11:12362

…lures

github-actions · 2023-11-15T03:10:13Z

Documentation preview for this PR (built with commit 3c2edf3; changes) is ready! 🎉

kwankyu · 2023-11-23T10:39:44Z

Currently cond ci is the main reason breaking and delaying checks.

Let's get this in.

tobiasdiez · 2023-11-25T03:52:28Z

I would. We should. "B&T for conda" is specifically for conda. If there is a failure with it, I would exclude the file from being tested for other PRs until the failure is fixed.

Really? You would exclude the whole file and not just mark the single failing test with "# random"?

We are talking about some doctest failing only in conda ci, and there is no problem with the doctest on the rest of the platforms. Why tag it with # random?

It seems to be the established practice: #36647 (among others)
But I agree with you that random is not a good solution for this, but since there are currently no means to tag a doctest "this breaks on system xyz, but not others" it's the next best thing. To have a more conceptual solution, I introduced the coda-specific exception.

Anyway, I'm okay with ignoring the failing tests using an external file. But this mechanism should only exclude the tests that are really failing to a) prevent that additional errors are introduced and b) to make it easier to work on these issues (it's essential to know which line actually breaks randomly in source file of +1000 lines).

kwankyu · 2023-11-25T04:50:08Z

I would. We should. "B&T for conda" is specifically for conda. If there is a failure with it, I would exclude the file from being tested for other PRs until the failure is fixed.

Really? You would exclude the whole file and not just mark the single failing test with "# random"?

We are talking about some doctest failing only in conda ci, and there is no problem with the doctest on the rest of the platforms. Why tag it with # random?

It seems to be the established practice: #36647 (among others) But I agree with you that random is not a good solution for this, but since there are currently no means to tag a doctest "this breaks on system xyz, but not others" it's the next best thing. To have a more conceptual solution, I introduced the coda-specific exception.

Anyway, I'm okay with ignoring the failing tests using an external file. But this mechanism should only exclude the tests that are really failing to a) prevent that additional errors are introduced and b) to make it easier to work on these issues (it's essential to know which line actually breaks randomly in source file of +1000 lines).

Just focus on the failing test that made the file excluded. We may add a comment telling the failing test (or tests) when we add the file to the list of the ignored files. It would make things complicated if we try to exclude only the failing tests. Why make things complicated just for rare cases (almost looking imaginary to me)?

mkoeppe · 2023-11-25T05:24:32Z

b) to make it easier to work on these issues (it's essential to know which line actually breaks randomly in source file of +1000 lines).

One sees this in the logs -- because with the approach here, the failures are not hidden but only marked as "[failed in baseline]".

tobiasdiez · 2023-11-25T08:54:24Z

b) to make it easier to work on these issues (it's essential to know which line actually breaks randomly in source file of +1000 lines).

One sees this in the logs -- because with the approach here, the failures are not hidden but only marked as "[failed in baseline]".

That clearly only works for the issues that occur often/reproducible. I was talking about the random failures that happen once every 100 runs or so (which is the order of magnitude of the remaining failures that we are currently seeing from time to time).

kwankyu · 2023-11-25T10:31:13Z

The random failure you are talking about seems not related with PRs. Why do you want to observe it in PR checks? To investigate such a random failure, I would just devise massive repetitive test runs with just one version of sage. No?

mkoeppe · 2023-11-25T23:17:37Z

I also don't understand the comment in #36678 (comment)

tobiasdiez · 2023-11-26T15:17:32Z

Why do you want to observe it in PR checks?

I don't want to observe them at all. It just reality that these failing tests are occurring very rarely, which makes it hard to reproduce them locally. Heck, even now there are random test failures of the Build & Test workflow (e.g. https://github.com/sagemath/sage/actions/runs/6970122859/job/18967437726). Do you want to exclude the whole file from the tests because of this?

To investigate such a random failure, I would just devise massive repetitive test runs with just one version of sage. No?

I actually did this. But that only revealed a couple of errors. But if you have computation time over that you don't need, I would appreciate it very much if you could run the conda tests in a loop and let me know about the failures.

I also don't understand the comment in #36678 (comment)

Take a failure that occurs once every 100 runs. If you only know that the failure happens in a given file, you need to run all doctests in that file for 100 times (I know, I know statistics...), which could easily take a few hours depending on the file, only to reproduce it locally. And that would imply that the error actually happens on the given system (some of the errors are occurring only on eg macos). If you know the exact test that fails, you copy that one to a test file, put a loop around it and have it reproduced in a couple of mins.

mkoeppe · 2023-11-26T17:55:33Z

[...] If you know the exact test that fails [...]

... then you open an Issue to report it.

There is no need for adding a record of the exact failing test to the source code.

mkoeppe · 2023-11-26T17:56:20Z

even now there are random test failures of the Build & Test workflow [...]. Do you want to exclude the whole file from the tests because of this?

Asked and answered already.

tobiasdiez · 2023-12-03T06:29:17Z

[...] If you know the exact test that fails [...]

... then you open an Issue to report it.

There is no need for adding a record of the exact failing test to the source code.

That example was for the situation where I have 15min extra time and want to use that time to fix a conda bug. With your solution here, I already may need 2 hours to just find out which test is actually failing.

Another example: #36372 (comment) it was very easy to recognize that another PR fixed a certain failure on conda because the info was there directly in the code. With the coarse ignore rules proposed in this PR here, that would have been not that easy. (That also brings up the question who is maintaining the exclusion list, and how?)

mkoeppe · 2023-12-03T06:40:49Z

I don't think this adds anything to the discussion here. I'll explain again:

Test failure is noticed
You record it in a GitHub Issue. There you include the necessary information so that it does not take 2 hours to reconstruct it.
One adds stuff to the exclusion list to eliminate noise in the tests. It's not for humans to read, and definitely not for humans to use 2 hours of guessing time to reconstruct the failure.
The exclusion list is maintained like everything else in Sage -- by PRs, not by specific people in charge.

kwankyu · 2023-12-03T07:07:55Z

Test failure is noticed

You record it in a GitHub Issue. There you include the necessary information so that it does not take 2 hours to reconstruct it.

One adds stuff to the exclusion list to eliminate noise in the tests.

and items in the exclusion list are commented with the the GitHub Issue numbers...

kwankyu

LGTM.

tobiasdiez · 2023-12-08T16:08:19Z

Matthias, can you please just stop adding "positive review" labels on your PRs. Thanks!

Labeling it as "r: invalid" and "s: needs review" as a motion to close this PR. There are clear short comings with the approach of this PR as described above.

mkoeppe · 2023-12-08T16:30:48Z

There are clear short comings with the approach of this PR as described above.

Nope, all your claims have been refuted.

This reverts commit 0a69c2a, reversing changes made to 2017233.

sagemathgh-36694: CI conda: On `pull_request`, only run 1 macOS job and 1 Linux job    as discussed in sagemath#36678 (comment), sagemath#36616 (comment) On pushes to a tag, all 3 Linux and 3 macOS jobs are still run, as demonstrated in https://github.com/mkoeppe/sage/actions/runs/6829619271   After merging, the branch protection rule https://github.com/sagemath/sa ge/settings/branch_protection_rules/33965567 will need to be updated. It controls what workflows show as "Required" in the PR Checks. ### 📝 Checklist     - [x] The title is concise, informative, and self-explanatory. - [ ] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation accordingly. ### ⌛ Dependencies   URL: sagemath#36694 Reported by: Matthias Köppe Reviewer(s): Kwankyu Lee, Tobias Diez

tobiasdiez previously requested changes Nov 9, 2023

View reviewed changes

tobiasdiez added s: needs work and removed s: needs review p: blocker / 1 labels Nov 9, 2023

vbraun force-pushed the develop branch 2 times, most recently from 883e05f to e349b00 Compare November 12, 2023 16:25

Matthias Koeppe added 9 commits November 14, 2023 18:07

.github/workflows/ci-conda-known-test-failures.json: New

00a2c40

.github/workflows/ci-conda-known-test-failures.json: Add failure seen…

ca8b413

… in https://github.com/sagemath/sage/actions/runs/6870325919/job/18684964234\#step:11:10059

.github/workflows/ci-conda-known-test-failures.json: Add failure seen…

32f3f35

… in https://github.com/sagemath/sage/actions/runs/6859244281/job/18651257775\#step:11:10102

.github/workflows/ci-conda-known-test-failures.json: Add failure seen…

8f936b7

… in https://github.com/sagemath/sage/actions/runs/6840579851/job/18600012965\#step:11:13016

.github/workflows/ci-conda-known-test-failures.json: Add failure seen…

ab82f8d

… in https://github.com/sagemath/sage/actions/runs/6840502530/job/18599835766\#step:11:10107

.github/workflows/ci-conda-known-test-failures.json: Add failure seen…

147ecf5

… in https://github.com/sagemath/sage/actions/runs/6836592771/job/18591690058\#step:11:10059

.github/workflows/ci-conda-known-test-failures.json: Add failure seen…

4c39f26

… in https://github.com/sagemath/sage/actions/runs/6835143781/job/18588599649\#step:11:16939

.github/workflows/ci-conda-known-test-failures.json: Add failure seen…

46efdda

… in https://github.com/sagemath/sage/actions/runs/6827937663/job/18571052628\#step:11:12362

.github/workflows/ci-conda-known-test-failures.json: Add two more fai…

3c2edf3

…lures

kwankyu removed the s: needs review label Nov 23, 2023

kwankyu approved these changes Dec 3, 2023

View reviewed changes

tobiasdiez added r: invalid s: needs review and removed s: positive review labels Dec 8, 2023

kwankyu added s: positive review and removed s: needs review labels Dec 13, 2023

tobiasdiez removed the s: positive review label Dec 18, 2023

vbraun merged commit 0a69c2a into sagemath:develop Dec 19, 2023

github-actions bot removed the s: positive review label Dec 19, 2023

tobiasdiez added a commit to tobiasdiez/sage that referenced this pull request Dec 19, 2023

Revert "sagemathgh-36678: CI conda: Ignore baseline test failures "

29901c4

This reverts commit 0a69c2a, reversing changes made to 2017233.

Uh oh!

CI conda: Ignore baseline test failures #36678

CI conda: Ignore baseline test failures #36678

Uh oh!

Conversation

mkoeppe commented Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Checklist

⌛ Dependencies

Uh oh!

tobiasdiez left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkoeppe commented Nov 9, 2023

Uh oh!

tobiasdiez commented Nov 9, 2023

Uh oh!

mkoeppe commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkoeppe commented Nov 9, 2023

Uh oh!

tobiasdiez commented Nov 9, 2023

Uh oh!

kwankyu commented Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkoeppe commented Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkoeppe commented Nov 11, 2023

Uh oh!

mkoeppe commented Nov 14, 2023

Uh oh!

github-actions bot commented Nov 15, 2023

Uh oh!

kwankyu commented Nov 23, 2023

Uh oh!

tobiasdiez commented Nov 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwankyu commented Nov 25, 2023

Uh oh!

mkoeppe commented Nov 25, 2023

Uh oh!

tobiasdiez commented Nov 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwankyu commented Nov 25, 2023

Uh oh!

mkoeppe commented Nov 25, 2023

Uh oh!

tobiasdiez commented Nov 26, 2023

Uh oh!

mkoeppe commented Nov 26, 2023

Uh oh!

mkoeppe commented Nov 26, 2023

Uh oh!

tobiasdiez commented Dec 3, 2023

Uh oh!

mkoeppe commented Dec 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwankyu commented Dec 3, 2023

Uh oh!

kwankyu left a comment

Choose a reason for hiding this comment

Uh oh!

tobiasdiez commented Dec 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkoeppe commented Dec 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mkoeppe commented Nov 8, 2023 •

edited

Loading

tobiasdiez left a comment •

edited

Loading

mkoeppe commented Nov 9, 2023 •

edited

Loading

kwankyu commented Nov 10, 2023 •

edited

Loading

mkoeppe commented Nov 10, 2023 •

edited

Loading

tobiasdiez commented Nov 25, 2023 •

edited

Loading

tobiasdiez commented Nov 25, 2023 •

edited

Loading

mkoeppe commented Dec 3, 2023 •

edited

Loading

tobiasdiez commented Dec 8, 2023 •

edited

Loading

mkoeppe commented Dec 8, 2023 •

edited

Loading