Skip to content

Conversation

mkoeppe
Copy link
Contributor

@mkoeppe mkoeppe commented Sep 25, 2023

The new script .ci/merge-fixes.sh uses the GitHub CLI to retrieve open PRs with the "blocker" label from sagemath/sage and tries to merge them into the tested branch, ignoring any PRs that give merge failures.

We call this script in all CI workflows. Thus,

will take effect in the CI workflows of all PRs.

This can be seen in the runs of the workflows of the present PR (section "Merge CI fixes from sagemath/sage").

Additionally,

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

⌛ Dependencies

# Merge open PRs from sagemath/sage labeled "blocker".
REPO="sagemath/sage"
GH="gh -R $REPO"
PRs="$($GH pr list --label "p: blocker / 1" --json number --jq '.[].number')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach. There is a couple of thoughts I have about this:

  • We would merge any blocker PR. Is there any quality control, e.g. pipeline passed etc. or positively reviewed.
  • This would self-reference blockers.
  • Unless we merge in all blockers first, we might merge in PRs that depend on blockers, but do not declare that dependency correctly.

Would there still be a sanity check that just tests this PR without merging in other things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We would merge any blocker PR. Is there any quality control, e.g. pipeline passed etc. or positively reviewed.

There isn't, and I don't think it's needed. Since the transition to GitHub, using priority labels has become pretty rare, and no one has used the blocker tag without a good reason.

Copy link
Contributor Author

@mkoeppe mkoeppe Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This would self-reference blockers.

Yes, but for these the merge would be a no-op, and I think I'm handling this correctly in the scripts.

EDIT: See the run for the current workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Unless we merge in all blockers first, we might merge in PRs that depend on blockers, but do not declare that dependency correctly.

Currently, the release manager scripts ignore all dependencies and all priorities anyway...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds plausible.

@tobiasdiez
Copy link
Contributor

tobiasdiez commented Sep 26, 2023

I am not the biggest fan of this auto-apply of blocker PRs in all work flows. I think it would be better to attack the issue at the root and make it impossible to merge PRs with failing workflows. Everything else feels like a weird hack, especially since one still has to create these hotfix PRs in the first place.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Sep 26, 2023

Yes, this PR is a practical solution. It is designed to avoid two approaches, each of which would be more elegant:

  • It does not create hotfix branches explicitly. Hence, there is no hotfix branch to maintain; hence there is not need for a maintainer of the hotfix branch.
  • It does not require communication with the release manager.

This careful design makes this PR an effective solution.

@kliem
Copy link
Contributor

kliem commented Sep 26, 2023

The CI is rather fragile depending on a bunch of system packages (and so is the entire installation process). Package updates will break building on certain environments. So I think this is a reasonable solution. Hotfixes are applied on the CI immediately and eliminate noise for which those authors are not responsible.

One should agree thought that the blocker label is only applied once the author confident that this is a solution.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Sep 26, 2023

Thanks @kliem

One should agree though that the blocker label is only applied once the author confident that this is a solution.

Setting the priority label is limited to the Triage role; I think we can trust that developers who have been elevated to this role will not abuse it.

@kliem
Copy link
Contributor

kliem commented Sep 26, 2023

Thanks @kliem

One should agree though that the blocker label is only applied once the author confident that this is a solution.

Setting the priority label is limited to the Triage role; I think we can trust that developers who have been elevated to this role will not abuse it.

Cool. As you noticed, I'm not very involved at the moment and I wasn't aware that the move to github enabled more granular control of such things.

@tobiasdiez
Copy link
Contributor

The CI is rather fragile depending on a bunch of system packages (and so is the entire installation process). Package updates will break building on certain environments. So I think this is a reasonable solution.

Thats not the problem. The CI doesn't randomly fails. It fails because PRs with broken CI are merged since the release manager takes only a different set of CI checks into account.

A more elegant and effective solution would be to enable and enforce branch protection rules. But yes, that would require input from @vbraun.

@tobiasdiez tobiasdiez requested a review from vbraun September 26, 2023 22:33
Copy link
Member

@dimpase dimpase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. At worst, we break CI if it doesn't quite work.

@github-actions
Copy link

Documentation preview for this PR (built with commit b14ef26; changes) is ready! 🎉

@kwankyu
Copy link
Collaborator

kwankyu commented Sep 27, 2023

All checks passed! 🎉

Merging CI-fixes immediately to CI-workflows is a great idea. Thanks!

# Merge open PRs from sagemath/sage labeled "blocker".
REPO="sagemath/sage"
GH="gh -R $REPO"
PRs="$($GH pr list --label "p: blocker / 1" --json number --jq '.[].number')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to make sure that they also have the "postive review" label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not. Sometimes it takes a while to find a competent reviewer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #36338 (comment) for @mkoeppe's opinion on this.

Without requiring "positive review" label, we can check the PR before we set it positive review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will influence every other PR, so I think there should be some safety measure.

Without requiring "positive review" label, we can check the PR before we set it positive review.

What do you mean? Doesn't the CI of that PR already checks the PR? Why do you need other PRs to check it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I think there should be some safety measure.

We already have that, namely the Triage group + common sense.

Copy link
Contributor Author

@mkoeppe mkoeppe Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not putting off fire. There's no need to act immediately as soon as someone pushes a CI-fix PR and adds blocker label.

Well, it's not fire, but it's a matter of "availability of the CI services", which, if poor, leads to developer frustration (@fchapoton @tornaria) and can dramatically reduce development velocity because developers may disengage or wait until the CI is fixed again.

Let's consider the period from the release of a beta version to the release of the next beta version, currently about a week. For, say, 95% of availability of the CI, we want a CI fix to take effect within a few hours of a broken release.

We all want to fix CI failures as soon as possible and it would not take long before another one would lend second eyes to check the PR and give positive review.

This may be true for trivial fixes such as the linter failures, which can be fixed and reviewed by many people.

But for fixes to CI and build system, it hinges on the availability of a very small number of people, and time-to-positive-review can be several days -- which can easily make the availability of the CI services less than 50%.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tobiasdiez wrote:

Sorry, but I don't understand the argument here: because two people don't like the change, we go along with the solution that one person prefers?

I know that @tobiasdiez knows this already, but I'll point out again: Our project does not hold majority votes In PR discussions.

I actually think this PR here is a perfect example: there is absolutely no reason this is tagged as a blocker, it doesn't resolve any issues with falling CIs or something similar,

This is false. This PR does resolve the issue of the failing CIs. Merging this PR is enough to fix all CI problems in the current cycle: it auto-applies the blocker PRs #36327 and #36276.

it is not tested very well (or did any of the reviewers tried to run the new macos workflow and the other changed workflows on the main repo?)

This is false, and moreover an inappropriate attack on author and reviewers.

Copy link
Collaborator

@kwankyu kwankyu Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Here is my suggestion. As we can observe by looking at the labelling war below, there are conflicting opinions on which way this PR should proceed. If the issue is large and of broad interest in our community, then this situation would be resolved by a voting in sage-devel. But the current issue is a small one, and it would be ridiculous and shameful to spend a long time for endless discussions or a large-scale voting. Hence let me suggest a small voting here, as defined below:

The voting is about (a) "only blocker label" vs (b) "blocker label + positive review". The voting occurs only in this thread. The voting starts by the votes of @tobiasdiez and @mkoeppe, by which they are supposed to have vowed on accepting the voting result. The voting ends automatically when there was no more vote for the last 24 hours. When the voting ends, the author of the PR proceeds in the way of winning candidate. Please only vote with an optional simple comment in the rest of this thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on holding majority votes in PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tobiasdiez wrote:

Sorry, but I don't understand the argument here: because two people don't like the change, we go along with the solution that one person prefers?

I know that @tobiasdiez knows this already, but I'll point out again: Our project does not hold majority votes In PR discussions.

I actually think this PR here is a perfect example: there is absolutely no reason this is tagged as a blocker, it doesn't resolve any issues with falling CIs or something similar,

This is false. This PR does resolve the issue of the failing CIs. Merging this PR is enough to fix all CI problems in the current cycle: it auto-applies the blocker PRs #36327 and #36276.

it is not tested very well (or did any of the reviewers tried to run the new macos workflow and the other changed workflows on the main repo?)

This is false, and moreover an inappropriate attack on author and reviewers.

Sadly not false...

@@ -24,9 +24,30 @@ concurrency:
cancel-in-progress: true

jobs:
get_ci_fixes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds now another row in the checks under each PR, which is already a pretty crowded. Can this not be merged directly in the build workflow as you do say with conda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the container does not have a suitable version of the gh script available, and installing it is too annoying.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkoeppe, No response to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need to switch a different view; I responded above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that one's not portable to i686, and overall I don't think this is a worthwhile change

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the duplicate lines too. But that's a trivial issue, and there's no simple fix. Let's ignore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using any i686 architecture...that's a very potential if and when argument...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using any i686 architecture...that's a very potential if and when argument...

Yes, we are. The build.yml workflow accepts the container image on workflow_dispatch, and we do test 32-bit platforms.

@dimpase
Copy link
Member

dimpase commented Sep 27, 2023

are we not reinventing the wheel here, by any chance?
https://github.blog/changelog/2023-07-12-pull-request-merge-queue-is-now-generally-available/

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Sep 27, 2023

are we not reinventing the wheel here, by any chance?
https://github.blog/changelog/2023-07-12-pull-request-merge-queue-is-now-generally-available/

Yes, @dimpase, I have considered many options, including Merge Queues. But moving to using Merge Queues is not actionable for the project; the release manager would have to adopt a changed workflow. As I wrote in #36338 (comment):

Yes, this PR is a practical solution. It is designed to avoid two approaches, each of which would be more elegant:

  • It does not create hotfix branches explicitly. Hence, there is no hotfix branch to maintain; hence there is not need for a maintainer of the hotfix branch.
  • It does not require communication with the release manager.

This careful design makes this PR an effective solution.

@dimpase
Copy link
Member

dimpase commented Sep 27, 2023

I don't see how a merge queue with blockers (or positively reviewed blockers) is different in functionality from this PR.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Sep 27, 2023

I don't see how a merge queue with blockers (or positively reviewed blockers) is different in functionality from this PR.

Well, Merge Queues are attached to a branch protection rule for a branch. A PR can only be merged in one branch, namely its target branch. So doing anything like this would interfere with the release manager's work.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Sep 27, 2023

@dimpase You may want to try out merge queues with https://github.com/dimpase/primecountpy/pulls ...

@dimpase
Copy link
Member

dimpase commented Sep 27, 2023

@dimpase You may want to try out merge queues with https://github.com/dimpase/primecountpy/pulls ...

I don't think it's a repo which needs them.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Sep 27, 2023

But it's harmless to play with them there.

@vbraun vbraun merged commit bb04839 into sagemath:develop Sep 27, 2023
@kwankyu
Copy link
Collaborator

kwankyu commented Sep 27, 2023

Thanks for the benevolent action, @vbraun!

vbraun pushed a commit to vbraun/sage that referenced this pull request Oct 1, 2023
sagemathgh-35373: Fix workflow "Build documentation (PDF)"
    
<!-- Please provide a concise, informative and self-explanatory title.
-->
<!-- Don't put issue numbers in the title. Put it in the Description
below. -->
<!-- For example, instead of "Fixes sagemath#12345", use "Add a new method to
multiply two integers" -->

### 📚 Description

Follow-up on sagemath#35169.

The container `sage-docker-fedora-31-maximal-with-targets` used for the
PDF docbuild turned out to be not reliable.
Here we replace it by `ubuntu-focal-standard-with-targets` and install
texlive in it.

We also copy over the incremental build from doc-build.yml and the
method to get CI fixes from blocker tickets from sagemath#36338.

This workflow is currently disabled in sagemath/sage.
Example run:
https://github.com/mkoeppe/sage/actions/runs/6318741659/job/17158468016

<!-- Describe your changes here in detail. -->
<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. It should be `[x]` not `[x
]`. -->

- [x] The title is concise, informative, and self-explanatory.
- [ ] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36338
- Depends on sagemath#36348

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#35373
Reported by: Matthias Köppe
Reviewer(s): Dima Pasechnik
vbraun pushed a commit to vbraun/sage that referenced this pull request Oct 1, 2023
sagemathgh-36358: CI: Fix the multi-stage build in ci-linux.yml, fix conda/centos/archlinux system packages
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->
Fixes breakage from sagemath#36338 seen in https://github.com/sagemath/sage/acti
ons/runs/6332061118/job/17206153694:

The 2nd stage (`...-with-targets`) does not find the container from the
1st stage (`...-with-targets-pre`) because they are keyed to the commit
sha. Fixed here by keying it to the sha before merging the CI fixes.

Also fixing the following failures:
- `conda-forge-minimal` (as seen in https://github.com/sagemath/sage/act
ions/runs/6207962004/job/16858998561); failed because the python3
depcheck packages were missing. Fixed now - check with `tox -e docker-
conda-forge-minimal -- config.status`
- `centos-stream-9-python3.9-minimal` (https://github.com/sagemath/sage/
actions/runs/6207962004/job/16858997494),
`almalinux-9-python3.11-minimal` (https://github.com/sagemath/sage/actio
ns/runs/6207962004/job/16858997663) and `-standard`; fixes sagemath#34786
- `archlinux-standard` - libgiac was missing in at least one run

Also adding a few more uses of GH Action output groups for convenience.

Tests running at https://github.com/mkoeppe/sage/actions/workflows/ci-
linux.yml

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36358
Reported by: Matthias Köppe
Reviewer(s): Dima Pasechnik
vbraun pushed a commit that referenced this pull request Oct 8, 2023
gh-35373: Fix workflow "Build documentation (PDF)"
    
<!-- Please provide a concise, informative and self-explanatory title.
-->
<!-- Don't put issue numbers in the title. Put it in the Description
below. -->
<!-- For example, instead of "Fixes #12345", use "Add a new method to
multiply two integers" -->

### 📚 Description

Follow-up on #35169.

The container `sage-docker-fedora-31-maximal-with-targets` used for the
PDF docbuild turned out to be not reliable.
Here we replace it by `ubuntu-focal-standard-with-targets` and install
texlive in it.

We also copy over the incremental build from doc-build.yml and the
method to get CI fixes from blocker tickets from #36338.

This workflow is currently disabled in sagemath/sage.
Example run:
https://github.com/mkoeppe/sage/actions/runs/6318741659/job/17158468016

<!-- Describe your changes here in detail. -->
<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes #12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. It should be `[x]` not `[x
]`. -->

- [x] The title is concise, informative, and self-explanatory.
- [ ] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- #12345: short description why this is a dependency
- #34567: ...
-->
- Depends on #36338
- Depends on #36348

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: #35373
Reported by: Matthias Köppe
Reviewer(s): Dima Pasechnik
vbraun pushed a commit that referenced this pull request Oct 8, 2023
gh-36358: CI: Fix the multi-stage build in ci-linux.yml, fix conda/centos/archlinux system packages
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes #1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->
Fixes breakage from #36338 seen in https://github.com/sagemath/sage/acti
ons/runs/6332061118/job/17206153694:

The 2nd stage (`...-with-targets`) does not find the container from the
1st stage (`...-with-targets-pre`) because they are keyed to the commit
sha. Fixed here by keying it to the sha before merging the CI fixes.

Also fixing the following failures:
- `conda-forge-minimal` (as seen in https://github.com/sagemath/sage/act
ions/runs/6207962004/job/16858998561); failed because the python3
depcheck packages were missing. Fixed now - check with `tox -e docker-
conda-forge-minimal -- config.status`
- `centos-stream-9-python3.9-minimal` (https://github.com/sagemath/sage/
actions/runs/6207962004/job/16858997494),
`almalinux-9-python3.11-minimal` (https://github.com/sagemath/sage/actio
ns/runs/6207962004/job/16858997663) and `-standard`; fixes #34786
- `archlinux-standard` - libgiac was missing in at least one run

Also adding a few more uses of GH Action output groups for convenience.

Tests running at https://github.com/mkoeppe/sage/actions/workflows/ci-
linux.yml

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes #12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- #12345: short description why this is a dependency
- #34567: ...
-->

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: #36358
Reported by: Matthias Köppe
Reviewer(s): Dima Pasechnik
vbraun pushed a commit to vbraun/sage that referenced this pull request Apr 18, 2024
sagemathgh-36498: CI build, doc-build: Run containers explicitly, separate jobs for pyright, build, modularized tests, long tests
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
Running the containers explicitly, instead of using the declarative
`container:` feature of GH Actions gives us more control:
- we can create more space on the host if necessary; we just scraped by
an out of space condition in sagemath#36473 /
sagemath#36469 (comment)
- we can run some operations outside of the container but in the same
job; this will make the separate "Get CI fixes" jobs unnecessary,
addressing the cosmetic concerns from
sagemath#36338 (comment),
sagemath#36349
- it enables caching between the various workflows (as first discussed
in sagemath#36446).

We split out static type checking with pyright into its separate
workflow:
- **pyright.yml**: As static checking does not need  a build of the Sage
library, for PRs that do not make any changes to packages, there's
nothing to build, and the workflow gives a fast turnaround just after 10
minutes. For applying the CI fixes from blocker tickets, this workflow
uses the technique favored in sagemath#36349.

The workflow **build.yml** first launches a job:
- **test-new:** It builds incrementally (using a tox-generated
`Dockerfile` and https://github.com/marketplace/actions/build-and-push-
docker-images) and does the quick incremental test. This completes
within 10 to 20 minutes when there's no change.

After this is completed, more jobs are launched:
- **test-mod:** It again builds incrementally and tests a modularized
distribution. Later (with more from sagemath#35095), more jobs will be added to
this matrix job for other distributions.
- **test-long:** It again builds incrementally and runs the long test.

The workflows **doc-build.yml** and **doc-build-pdf.yml** again build
incrementally and then build the documentation. The diffing code for the
HTML documentation now runs in the host, not the container, which
simplifies things. (To show that diffing still works, we include a small
change to the Sage library.)

Splitting the workflow jobs implements @kwankyu's  suggestion from:
- sagemath#35652 (comment)
(Fixes sagemath#35814)

The steps "again builds incrementally" actually use the GH Actions
cache, https://docs.docker.com/build/ci/github-actions/cache/#cache-
backend-api. When nothing is cached and the 3 workflows are launched in
parallel, they may each build the same thing. But when there's
congestion that leads to the workflows to be run serially, the 2nd and
3rd workflow will be able to use the cache from the 1st workflow. This
elasticity may be helpful in reducing congestion.

There is a rather small per-project limit of 10 GB for this cache, so
we'll have to see how effectively caching works in practice. See
https://github.com/sagemath/sage/actions/caches


<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36938 (merged here)
- Depends on sagemath#36951 (merged here)
- Depends on sagemath#37351 (merged here)
- Depends on sagemath#37750 (merged here)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36498
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe
vbraun pushed a commit to vbraun/sage that referenced this pull request Apr 20, 2024
sagemathgh-36498: CI build, doc-build: Run containers explicitly, separate jobs for pyright, build, modularized tests, long tests
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
Running the containers explicitly, instead of using the declarative
`container:` feature of GH Actions gives us more control:
- we can create more space on the host if necessary; we just scraped by
an out of space condition in sagemath#36473 /
sagemath#36469 (comment)
- we can run some operations outside of the container but in the same
job; this will make the separate "Get CI fixes" jobs unnecessary,
addressing the cosmetic concerns from
sagemath#36338 (comment),
sagemath#36349
- it enables caching between the various workflows (as first discussed
in sagemath#36446).

We split out static type checking with pyright into its separate
workflow:
- **pyright.yml**: As static checking does not need  a build of the Sage
library, for PRs that do not make any changes to packages, there's
nothing to build, and the workflow gives a fast turnaround just after 10
minutes. For applying the CI fixes from blocker tickets, this workflow
uses the technique favored in sagemath#36349.

The workflow **build.yml** first launches a job:
- **test-new:** It builds incrementally (using a tox-generated
`Dockerfile` and https://github.com/marketplace/actions/build-and-push-
docker-images) and does the quick incremental test. This completes
within 10 to 20 minutes when there's no change.

After this is completed, more jobs are launched:
- **test-mod:** It again builds incrementally and tests a modularized
distribution. Later (with more from sagemath#35095), more jobs will be added to
this matrix job for other distributions.
- **test-long:** It again builds incrementally and runs the long test.

The workflows **doc-build.yml** and **doc-build-pdf.yml** again build
incrementally and then build the documentation. The diffing code for the
HTML documentation now runs in the host, not the container, which
simplifies things. (To show that diffing still works, we include a small
change to the Sage library.)

Splitting the workflow jobs implements @kwankyu's  suggestion from:
- sagemath#35652 (comment)
(Fixes sagemath#35814)

The steps "again builds incrementally" actually use the GH Actions
cache, https://docs.docker.com/build/ci/github-actions/cache/#cache-
backend-api. When nothing is cached and the 3 workflows are launched in
parallel, they may each build the same thing. But when there's
congestion that leads to the workflows to be run serially, the 2nd and
3rd workflow will be able to use the cache from the 1st workflow. This
elasticity may be helpful in reducing congestion.

There is a rather small per-project limit of 10 GB for this cache, so
we'll have to see how effectively caching works in practice. See
https://github.com/sagemath/sage/actions/caches


<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36938 (merged here)
- Depends on sagemath#36951 (merged here)
- Depends on sagemath#37351 (merged here)
- Depends on sagemath#37750 (merged here)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36498
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe
vbraun pushed a commit to vbraun/sage that referenced this pull request May 2, 2024
sagemathgh-36498: CI build, doc-build: Run containers explicitly, separate jobs for pyright, build, modularized tests, long tests
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
Running the containers explicitly, instead of using the declarative
`container:` feature of GH Actions gives us more control:
- we can create more space on the host if necessary; we just scraped by
an out of space condition in sagemath#36473 /
sagemath#36469 (comment)
- we can run some operations outside of the container but in the same
job; this will make the separate "Get CI fixes" jobs unnecessary,
addressing the cosmetic concerns from
sagemath#36338 (comment),
sagemath#36349
- it enables caching between the various workflows (as first discussed
in sagemath#36446).

We split out static type checking with pyright into its separate
workflow:
- **pyright.yml**: As static checking does not need  a build of the Sage
library, for PRs that do not make any changes to packages, there's
nothing to build, and the workflow gives a fast turnaround just after 10
minutes. For applying the CI fixes from blocker tickets, this workflow
uses the technique favored in sagemath#36349.

The workflow **build.yml** first launches a job:
- **test-new:** It builds incrementally (using a tox-generated
`Dockerfile` and https://github.com/marketplace/actions/build-and-push-
docker-images) and does the quick incremental test. This completes
within 10 to 20 minutes when there's no change.

After this is completed, more jobs are launched:
- **test-mod:** It again builds incrementally and tests a modularized
distribution. Later (with more from sagemath#35095), more jobs will be added to
this matrix job for other distributions.
- **test-long:** It again builds incrementally and runs the long test.

The workflows **doc-build.yml** and **doc-build-pdf.yml** again build
incrementally and then build the documentation. The diffing code for the
HTML documentation now runs in the host, not the container, which
simplifies things. (To show that diffing still works, we include a small
change to the Sage library.)

Splitting the workflow jobs implements @kwankyu's  suggestion from:
- sagemath#35652 (comment)
(Fixes sagemath#35814)

The steps "again builds incrementally" actually use the GH Actions
cache, https://docs.docker.com/build/ci/github-actions/cache/#cache-
backend-api. When nothing is cached and the 3 workflows are launched in
parallel, they may each build the same thing. But when there's
congestion that leads to the workflows to be run serially, the 2nd and
3rd workflow will be able to use the cache from the 1st workflow. This
elasticity may be helpful in reducing congestion.

There is a rather small per-project limit of 10 GB for this cache, so
we'll have to see how effectively caching works in practice. See
https://github.com/sagemath/sage/actions/caches


<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36938 (merged here)
- Depends on sagemath#36951 (merged here)
- Depends on sagemath#37351 (merged here)
- Depends on sagemath#37750 (merged here)
- Depends on sagemath#37277 (merged here)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36498
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: scripts disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ p: blocker / 1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants