Skip to content

Conversation

cevich
Copy link
Member

@cevich cevich commented Jan 31, 2020

Depends on: #5066

Signed-off-by: Chris Evich cevich@redhat.com

@mheon
Copy link
Member

mheon commented Jan 31, 2020

/approve
LGTM but would like someone more familiar with bash to review

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 31, 2020
@cevich cevich changed the title Fix empty $GOBIN & script exit bug WIP: Fix gate job & scripts Jan 31, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 31, 2020
@cevich cevich force-pushed the fix_gobin_exit_bug branch from ceee4f8 to d23da27 Compare January 31, 2020 20:01
@cevich cevich force-pushed the fix_gobin_exit_bug branch 5 times, most recently from d9aa4c8 to b9e4e34 Compare January 31, 2020 20:18
@cevich
Copy link
Member Author

cevich commented Jan 31, 2020

LGTM but would like someone more familiar with bash to review

Yeah, that and I need to improve the commit message and add some comments...

@cevich
Copy link
Member Author

cevich commented Jan 31, 2020

@mheon @baude FYI- I still have some i-dotting and t-crossing to do here. For the moment, it looks like whatever was causing download failures has passed, so I think there is less urgency to this PR. However, I did uncover and fix a bunch of false-positive problems in our current setup.

  • Confirm the success job works with the new gate container
  • Confirm w/n a manual image build & push is needed or if the current master and PR usage of the new image will work as-is.
  • Notify people to rebase their PRs if required once the new image is built and pushed.

@cevich
Copy link
Member Author

cevich commented Feb 4, 2020

Note: There are a few documentation issues I need to clean up and additional local testing needed (see checklist above)

@cevich cevich force-pushed the fix_gobin_exit_bug branch from e56a4c7 to 45a8ad5 Compare February 4, 2020 15:34
@cevich
Copy link
Member Author

cevich commented Feb 4, 2020

@mheon @baude @edsantiago manual testing has shown, we will get gate-jobs failing for a short duration once this PR merges. However, once quay builds the new image, re-running any failed jobs should fix them.

FWIW: I did experiment with trying the new code using the old container, and visa-versa but couldn't find a sane way to keep all use-cases happy.

@cevich cevich changed the title WIP: Fix gate job & scripts Cirrus: Fix gate image & false-positive exits Feb 4, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2020
@cevich cevich requested a review from edsantiago February 4, 2020 15:35
@cevich
Copy link
Member Author

cevich commented Feb 4, 2020

There are too many containers and contexts for me to figure out the right solution for this one; I'll trust that it's easy for you to do so.

This is simply a reflection of what I confirmed earlier. Due to unification of paths used in the gate-container scripts (so they're common for all CI jobs) the "old" (built from master prior to this PR) image cannot work with this PR. @mheon will have to merge this manually.

Once this code is on master, Quay will build a new gate container that will function properly. However, during the interumn time before it's built, this task will simply fail everywhere.

OTOH, I think having commonly used paths everywhere is a worthwhile change to suffer a few failures over.

Make sense?

@edsantiago
Copy link
Member

Make sense?

I guess. I agree with the elimination of conflicting/duplicate/inconsistent $GOSRC (although there's still one unfortunate duplication that I don't know how to get rid of). I just have a hard time understanding all the container setup and wasn't sure if this error was "something got missed" or "it will get fixed when the magic container gets rebuilt". I will trust your judgment that it's the latter. Thanks.

@edsantiago
Copy link
Member

LGTM except for the mkdir optimization. Nice work.

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cevich, edsantiago, mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cevich
Copy link
Member Author

cevich commented Feb 5, 2020

@mheon @baude PTAL, this one needs a manual merge. I've manually tested the new gate container as best as I can, but there's no simple way to get it the "old" gate image to pass with this specific PR.

@mheon
Copy link
Member

mheon commented Feb 5, 2020

@cevich Do we have a way to back this out quickly if it goes bad? Want to make sure we aren't going to end up with a red master and no plan to resolve it

@cevich
Copy link
Member Author

cevich commented Feb 6, 2020

@mheon The backout plan is: Revert this commit. Allow quay.io to notice and build the "old" gate container.

Backout plan B: I will make a copy of the current gate container tagged "reverted_5039"

@mheon
Copy link
Member

mheon commented Feb 6, 2020

@cevich Ack. Let's wait until I cut 1.8.0 and then get this in.

@cevich
Copy link
Member Author

cevich commented Feb 6, 2020

yep, np.

Tag created: quay.io/libpod/gate:reverted_5039

@cevich cevich force-pushed the fix_gobin_exit_bug branch 2 times, most recently from b316bdd to 1a98ab6 Compare February 27, 2020 21:33
A number of scripts relating to tooling used and the gate container
image were not exiting upon errors as intended.  Coupled with
external service unavailability (i.e. downloading golangci-lint)
was observed to cause difficult to debug failures.

This change corrects the scripts inside/out of the gate container as
well as fixes many golang related path consistency problems vs other CI
jobs.  After this change, all jobs use consistent path names reducing
the number of special-case overrides needed.

Lastly, I also made a documentation-pass, updating/correcting as needed,
including documenting a likely local validation-failure mode, related to
`$EPOCH_TEST_COMMIT`.  This is dependent on the developers git
environment, so documentation is the only possible "fix".

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich cevich force-pushed the fix_gobin_exit_bug branch from 1a98ab6 to d0782e7 Compare March 2, 2020 13:51
@cevich
Copy link
Member Author

cevich commented Mar 5, 2020

@mheon we released 1.8 already no?

@mheon
Copy link
Member

mheon commented Mar 5, 2020

1.8.0 is out, 1.8.1 is in RC

@mheon
Copy link
Member

mheon commented Mar 5, 2020

So yes, we should probably go ahead with this

@edsantiago
Copy link
Member

1.8.1 is in RC

Um, how certain are you? RC1 had rootless regressions, and there haven't been any builds since then...

@mheon
Copy link
Member

mheon commented Mar 5, 2020

We cut an RC2 tag, didn't we?

@mheon
Copy link
Member

mheon commented Mar 5, 2020

https://github.com/containers/libpod/releases/tag/v1.8.1-rc2

Must not have been built?

@edsantiago
Copy link
Member

fc31 build failed in koji, I saw no other attempted builds

@cevich
Copy link
Member Author

cevich commented Mar 5, 2020

All well and good, but THIS pr is needed to improve the gate job accuracy and documentation 😄 It's also got some important fixes to avoid running container images produced from other branches.

@edsantiago
Copy link
Member

Way over my head, but LGTM in principle.

@rhatdan
Copy link
Member

rhatdan commented Mar 5, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2020
@openshift-merge-robot openshift-merge-robot merged commit 60e9e7c into containers:master Mar 5, 2020
@cevich cevich deleted the fix_gobin_exit_bug branch June 30, 2021 18:12
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants