Skip to content

Conversation

jedevc
Copy link
Collaborator

@jedevc jedevc commented Feb 10, 2023

A few commits with various improvements for the remote controller code.

  • Don't kill the remote controller immediately after build. We should keep the server alive, since the client can't know exactly what's running on the controller - also we want the behavior for debugging or not to remain relatively similar. At some point in the future, we could introduce logic to the server where it can shut itself down, if no builds are currently being run, however, the server should dictate this, instead of each individual client.
  • Add more informative server exit messages, so that the reasons for exit are always available in a log.
  • Add signal handling for SIGTERM (in addition to the existing SIGINT). SIGKILL can be caused by pkill and similar programs, which is useful for debugging, etc.
  • Switch to using context.Context for timeout handling with GRPC instead of a custom loop - this simplifies the code, makes it more configurable, and gives us better performance.
  • Use unique files for each buildx version. Instead of only having a single controller server, we allow for having multiple, one for each version of buildx, so we should never have any version clashes, meaning user interaction is less necessary. This means we can dramatically change the protocol between buildx versions, without needing to worry about any form of compatibility.
    • We also should keep separate log files for each server - but I'm not quite sure how that should interact with a log_file specified in config.toml. Ideas for this welcome 😄

We don't know if other builds might be running, etc, so we should allow
the server to decide when to exit.

Signed-off-by: Justin Chadwell <me@jedevc.com>
When exiting, we should ideally always print a message, and give details
as to exactly what error we received.

Signed-off-by: Justin Chadwell <me@jedevc.com>
This signal may be sent using an external tool such as pkill, and since
we can handle it neatly, we should.

Signed-off-by: Justin Chadwell <me@jedevc.com>
This patch improves timeout logic for testing and creating a buildx
server. Instead of needing to have a custom loop, we can just use the
primitives provided by go and grpc for easier handling.

Additionally, without the explicit time.Sleeps, we defer to GRPCs
exponential backoff algorithm which should provide substantially better
results.

Signed-off-by: Justin Chadwell <me@jedevc.com>
This ensures that we should never accidentally connect to a server with
a mismatched version, while also allowing us to run multiple buildx
servers at a time.

Signed-off-by: Justin Chadwell <me@jedevc.com>
Copy link
Collaborator

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, overall LGTM

Comment on lines +39 to +41
defaultLogFilename = fmt.Sprintf("buildx.%s.log", version.Revision)
defaultSocketFilename = fmt.Sprintf("buildx.%s.sock", version.Revision)
defaultPIDFilename = fmt.Sprintf("buildx.%s.pid", version.Revision)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should be encoded not to include "/" (e.g. sha256?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these revisions are already sha256-encoded, they're derived from the git revision.

@crazy-max does this make sense to you? Is there a different version identifier that might make more sense here / do we need to add a round of sha256 to remove special characters?

Copy link
Member

@crazy-max crazy-max Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an edge case where version.Revision can be empty if not built from the Dockerfile. I think we could use the same pattern as the kubernetes driver with contextPathHash if that makes sense:

// key string used for kubernetes "sticky" mode
contextPathHash, err := filepath.Abs(in.ContextPath)
if err != nil {
contextPathHash = in.ContextPath
}
. Then sha1 enc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed I got confused as this is for buildx server and not build commands. We could use https://pkg.go.dev/runtime/debug#ReadBuildInfo and read vcs.revision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a follow-up in #1637.

@jedevc
Copy link
Collaborator Author

jedevc commented Feb 14, 2023

Added an updated commit to ensure that we take the absolute path of the buildx binary, since on exec we change the working directory to /, which is likely different to the working directory (so relative paths will resolve to the wrong location).

Signed-off-by: Justin Chadwell <me@jedevc.com>
Copy link
Collaborator

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. LGTM.

@jedevc jedevc merged commit ed4fd96 into docker:master Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants