Skip to content

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented May 15, 2025

Split from:


e.g.,

https://github.com/moby/example.git?ref=v1.0.0&commit=deadbeef&subdir=/subdir

# alias of ref=refs/tags/v1.0.0
https://github.com/moby/example.git?tag=v1.0.0

# alias of ref=refs/heads/v1.0
https://github.com/moby/example.git?branch=v1.0

Fix #4905, but the syntax differs from the original proposal.

The document will be added to
https://github.com/docker/docs/blob/main/content/manuals/build/concepts/context.md#url-fragments

@tonistiigi
Copy link
Member

@AkihiroSuda Needs rebase after #5975

@AkihiroSuda

This comment was marked as resolved.

@AkihiroSuda AkihiroSuda marked this pull request as ready for review May 20, 2025 10:19
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is weird atm is the URL parsing in LLB (client lib and identifier) vs Dockerfile. The query form should really be only for Dockerfile/dockerui. In LLB the parameters are sent with attribute values instead. For historical reasons, the current LLB is kind of a mix where #ref is still part of the URL from initial implementation, but for example protocol of the URL is always git:// and the real protocol is sent as an attribute.

Currently, it looks like in LLB client we parse a full URL (with querystring etc.) but then we drop everything but Host/Path and send the original URL as attr (that I think will container querystring). Also #ref is not even part of the URL the user passes in the client function, but is glued to the identifier internally.

Then on LLB receiver side, this now handles querystring in this PR in the URL in NewGitIdentifier (although it looks that it would require client side to error for the querystring to be passed at all there), but actually the checksum gets set later from pb.AttrGitChecksum.

In order to fix this, I think we should separate these concepts. Assuming that we do need some level of parser in the LLB side, I think having two separate packages might be too much so maybe just two parser functions where the Dockerfile one would parse the querystring and other Dockerfile specific things and then internally call the more basic parser. Looking at the current return type, there is already some weird things there, eg. IndistinguishableFromLocal seems to be only used in Dockerfile, and UnencryptedTCP doesn't seem to be used anywhere at all.

I'm also ok with changing the signature for llb.Git, eg. adding GitSubdir() (maybe even GitRef()) so it is the same as other git properties instead of requiring microformat and URL parsing as input that gets converted anyway. On the identifier parser side, we do need to keep backwards compatibility though (but could deprecate the old behavior).

repo.Subdir = u.Fragment.Subdir
if u.Opts != nil {
repo.Ref = u.Opts.Ref
repo.Checksum = u.Opts.Checksum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the place where it is getting checksum from querystring in LLB while it really comes from pb.AttrGitChecksum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is still a wrong type in here after parsing as the checksum should only come from props (I don't think this is some backwards compatibility?).

@tonistiigi
Copy link
Member

@AkihiroSuda Any update?

@AkihiroSuda AkihiroSuda modified the milestones: v0.23.0, v0.24.0 Jun 10, 2025
@cokybit cokybit linked an issue Jun 22, 2025 that may be closed by this pull request
@crazy-max
Copy link
Member

Rebased, but refactoring the parser may take some time. Unlikely in this week.

@AkihiroSuda Do you have time to look at it? We plan to do first v0.24 RC next week.

@AkihiroSuda
Copy link
Member Author

Looking at the current return type, there is already some weird things there, eg. IndistinguishableFromLocal seems to be only used in Dockerfile, and UnencryptedTCP doesn't seem to be used anywhere at all.

Moving these weirdo to frontend/dockerfile/dfgitutil:

@AkihiroSuda
Copy link
Member Author

Added refactoring commits

  • util/gitutil: add ParseURLBasic
  • client/llb: add Git2

@AkihiroSuda

This comment was marked as resolved.

@AkihiroSuda AkihiroSuda marked this pull request as draft August 20, 2025 10:38
Fix issue 4905, but the syntax differs from the original proposal.

The document will be added to
https://github.com/docker/docs/blob/main/content/manuals/build/concepts/context.md#url-fragments

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Prior to this commit, `ADD https://github.com/moby/example.git?invalid=42`
was fetching a single HTML file without causing an error.

Now it returns an error `unexpected query "invalid"`.

Hint: use `git show --ignore-all-space` to review this commit.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
ParseURLBasic is a simplified variant of ParseURL.

Basically, the full ParseURL should be used only in the LLB frontends
such as `dockerfile`.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda AkihiroSuda force-pushed the giturl-query branch 3 times, most recently from f06fdae to 3b48201 Compare August 22, 2025 08:15
Similar to `Git`, but does not parse git url fragments and queries.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda AkihiroSuda marked this pull request as ready for review August 22, 2025 12:33
Comment on lines +237 to +238
// TODO: consider allowing this, when the tag actually exists on the branch
url: "https://github.com/moby/buildkit?tag=v1.0.0&branch=v1.0",
Copy link
Member

@crazy-max crazy-max Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is similar to Checksum but could potentially be a validation of both branch and commit?

Copy link
Member

@crazy-max crazy-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing these changes and so far so good (with also vendoring in buildx client docker/buildx#3380) but I found a use case that would be needed related to checksum validation.

This relates to how the Docker GitHub Actions currently handle Git context. Currently we always checkout by commit sha if ref is not a pull request: https://github.com/docker/actions-toolkit/blob/633bcf1936ffff0e05a4bbe2b23c71abf6f4111e/src/context.ts#L45-L60

That can be be problematic because then git tag is not pulled similar to https://github.com/crazy-max/buildkit/actions/runs/16967780276/job/48096258733#step:7:451. (see also docker/actions-toolkit#677)

But with query form we could use ref=refs/tags/v0.23.0&commit=28c90eadc4c12cc78155ad59ca5f486220241d2a so it would fetch the tag and validate commit sha. That's good!

But what should we do for a another ref like default branch? I don't think having ref=refs/heads/master&commit=28c90eadc4c12cc78155ad59ca5f486220241d2a with strict checksum validation makes a lot of sense as there might be race condition when build runs on CI.

WDYT if we have another var to opt-out for checksum validation while keeping ref and commit like:

ref=refs/heads/master&commit=28c90eadc4c12cc78155ad59ca5f486220241d2a&checksum=false

@tonistiigi
Copy link
Member

WDYT if we have another var to opt-out for checksum validation while keeping ref and commit like:

We could have another option like fetch-by-commit that could ignore ref (just set it blindly) if the intention is to support refs that used to be a specific commit. But if commit is set then buildkit should always guarantee that this is the correct commit that was checked out. Could also have something like fetch-by-commit=fast-forward for the heads/master case as master branch is always expected to be either commit or based on the commit. For the PR CI case where PR gets forced pushed, I think we should just fail the build in our actions, as no point in building a ref of the old PR if it has already been forced pushed to something else (new CI would start anyway).

}

// Git2 is similar to Git but takes [gitutil.GitURLBase] as the argument.
func Git2(remote gitutil.GitURLBase, opts ...GitOption) State {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to go with numeric functions in here. GitFromRemote ?

})
}

func GitIDSuffix(v string) GitOption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make this public if only used internally. Same for prop.

if gu.Remote != "" {
fullURL = gu.Remote
}
gitOptions := []llb.GitOption{llb.WithCustomName(pgName), llb.GitChecksum(gitRef.Checksum),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get why do we need to call ParseGitRef and then still pass a bunch of the options manually. Why can't we convert to a structure that already can be connected to llb. There are so many duplicates in here: user needs to pass parsed GitURLBase structure, but then still needs to pass GitFullURL as well. Why is the commit passed to both GitChecksum and GitIDSuffix. These look like internal behaviors what the user should not know about. If we parse the input URL we should have remote(full), ref, commit, subdir and we should be able to pass them without duplicates.

repo.Subdir = u.Fragment.Subdir
if u.Opts != nil {
repo.Ref = u.Opts.Ref
repo.Checksum = u.Opts.Checksum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is still a wrong type in here after parsing as the checksum should only come from props (I don't think this is some backwards compatibility?).

func parseOpts(fragment string) *GitURLOpts {
if fragment == "" {
return nil
func parseOpts(fragment string, query url.Values) (*GitURLOpts, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querystring is only part of the new dockerfile-style URLs so I don't understand why this parsing is in this pacakge.

@AkihiroSuda
Copy link
Member Author

Carried and merged as:

Thanks @tonistiigi 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.github/dependabot.yml Proposal: csv syntax for git repos
3 participants