Skip to content

Conversation

chooglen
Copy link
Contributor

@chooglen chooglen commented May 5, 2022

Thanks all! This version takes Junio's wording suggestions on the
previous round, renames the config variable and, because of conflicts
with 2.37.1, is newly rebased onto master.

The config variable is now named "safe.bareRepository"; I discarded this
name earlier in this series, but the intent behind the series has come
around to the point where "safe.bareRepository" makes sense again. And
thanks to Dscho's suggestion to explicitly create an option for "no
discovered bare repos" [1], the UI is self-documenting enough that we can
write terser docs.

As mentioned upthread, I would love to see "safe.*" namespace renamed (it's
an adjective, not a Git command/entity), but I'm doubling down on it anyway.
This config option does such a similar thing to "safe.directory" that they
really should be siblings, and I don't foresee a world where
"safe.directory" gets renamed.

I've triple-checked to make sure I've scrubbed the commits of the old name,
but I'd appreciate the extra eyes :)

= Description

There is a known social engineering attack that takes advantage of the fact
that a working tree can include an entire bare repository, including a
config file. A user could run a Git command inside the bare repository
thinking that the config file of the 'outer' repository would be used, but
in reality, the bare repository's config file (which is attacker-controlled)
is used, which may result in arbitrary code execution. See [2] for a fuller
description and deeper discussion.

This series implements a simple way of preventing such attacks: create a
config option, safe.bareRepository, that tells Git whether or not to die
when it finds a bare repository. safe.bareRepository has two values:

  • "all": allow all bare repositories (default), identical to current
    behavior
  • "explicit": only allow bare repositories specified via --git-dir
    or GIT_DIR

and users/system administrators who never expect to work with bare
repositories can secure their environments using "explicit". We
still trust explicit bare repositories because we are confident that the
user is not confused about which repository is being used.

This series does not change the default behavior, but in the long-run, a
"no-embedded" option might be a safe and usable default [3]. "never" is too
restrictive and unlikely to be the default.

For security reasons, safe.bareRepository cannot be read from
repository-level config (because we would end up trusting the embedded bare
repository that we aren't supposed to trust to begin with). Since this would
introduce a 3rd variable that is only read from 'protected/trusted
configuration' (the others are safe.directory and
uploadpack.packObjectsHook) this series also defines and creates a shared
implementation for 'protected configuration'

= Patch organization

  • Patch 1 add a section on configuration scopes to our docs
  • Patches 2-3 define 'protected configuration' and create a shared
    implementation.
  • Patch 4 refactors safe.directory to use protected configuration
  • Patch 5 adds safe.bareRepository

= Series history

Changes in v8:

  • Rename discovery.bare -> safe.bareRepository, change values from
    "always|never" -> "all|explicit"
  • Numerous docs improvements
  • Rebase onto post-2.37.1 master

Changes in v7:

  • Numerous docs improvements and code cleanup.
  • In 3/5's commit message, drop "as fast as possible" and allude to lookup
    functions coming in a later series.
  • Remove a comment in 3/5 about repository.protected_config. That was stale
    since v4, but slipped under the radar until now.
  • Fix some s/protected config/protected configuration (leftover from v5).

Changes in v6:

  • Add TEST_PASSES_SANITIZE_LEAK=true
  • Replace all sub-shells with -C and use test_config_global
  • Change the expect_rejected helper to use "grep -F" with a more specific
    message.
    • This reveals that the "-c discovery.bare=" assertion in the last test
      was passing for the wrong reason (because '' is an invalid value for
      "discovery.bare"). I removed it because it wasn't doing anything useful
      anyway - I was trying to make discovery.bare unset in the command line,
      but the whole point of that test is to assert that we respect the CLI
      arg.

Changes in v5:

  • Standardize the usage of "protected configuration" instead of mixing
    "config" and "configuration". This required some unfortunate rewrapping.
  • Remove mentions of "trustworthiness" when discussing protected
    configuration and focus on what Git does instead.
    • The rationale of protected vs non-protected is still kept.
  • Fix the stale documentation entry for discovery.bare.
  • Include a fuller description of how discovery.bare and "--git-dir"
    interact instead of saying "has no effect".

Changes in v4:

  • 2/5's commit message now justifies what scopes are included in protected
    config
  • The global configset is now a file-scope static inside config.c
    (previously it was a member of the_repository).
  • Rename discovery_bare_config to discovery_bare_allowed
  • Make discovery_bare_allowed function-scoped (instead of global).
  • Add an expect_accepted helper to the discovery.bare tests.
  • Add a helper to "upload-pack" that reads the protected and non-protected
    config

Changes in v3:

  • Rebase onto a more recent 'master'
  • Reframe this feature in only in terms of the 'embedded bare repo' attack.
  • Other docs improvements (thanks Stolee in particular!)
  • Protected config no longer uses read_very_early_config() and is only read
    once
  • Protected config now includes "-c"
  • uploadpack.packObjectsHook now uses protected config instead of ignoring
    repo config using config scopes

Changes in v2:

  • Rename safe.barerepository to discovery.bare and make it die()
  • Move tests into t/t0034-discovery-bare.sh
  • Avoid unnecessary config reading by using a static variable
  • Add discovery.bare=cwd
  • Fix typos

= Future work

  • This series doesn't implement config lookup functions for protected
    config. This will be done in a follow up series.
  • This series does not implement the "no-embedded" option [3] and I won't
    work on it any time soon, but I'd be more than happy to review if someone
    sends patches.
  • With discovery.bare, if a builtin is marked RUN_SETUP_GENTLY, setup.c
    doesn't die() and we don't tell users why their repository was rejected,
    e.g. "git config" gives an opaque "fatal: not in a git directory". This
    isn't a new problem though, since safe.directory has the same issue.

[1] https://lore.kernel.org/git/5ps2q552-1rr3-7161-4181-31556pp2ns12@tzk.qr

[2] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com

[3] This was first suggested in
https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@github.com

Cc: Taylor Blau me@ttaylorr.com, Derrick Stolee derrickstolee@github.com, Junio C Hamano gitster@pobox.com, Emily Shaffer emilyshaffer@google.com, Jonathan Tan jonathantanmy@google.com, Ævar Arnfjörð Bjarmason avarab@gmail.com, Johannes Schindelin Johannes.Schindelin@gmx.de

@chooglen chooglen force-pushed the setup/disable-bare-repo-config branch 2 times, most recently from f00e91b to 96038cd Compare May 6, 2022 16:56
@chooglen chooglen changed the title setup.c: make bare repo discovery optional [RFC] setup.c: make bare repo discovery optional May 6, 2022
@chooglen
Copy link
Contributor Author

chooglen commented May 6, 2022

/preview

@gitgitgadget-git
Copy link

Preview email sent as pull.1261.git.git.1651858667271.gitgitgadget@gmail.com

@chooglen chooglen changed the title [RFC] setup.c: make bare repo discovery optional RFC setup.c: make bare repo discovery optional May 6, 2022
@chooglen chooglen force-pushed the setup/disable-bare-repo-config branch from 96038cd to 3370258 Compare May 6, 2022 18:23
@chooglen
Copy link
Contributor Author

chooglen commented May 6, 2022

/submit

@gitgitgadget-git
Copy link

Submitted as pull.1261.git.git.1651861810633.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1261/chooglen/setup/disable-bare-repo-config-v1

To fetch this version to local tag pr-git-1261/chooglen/setup/disable-bare-repo-config-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1261/chooglen/setup/disable-bare-repo-config-v1

@gitgitgadget-git
Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> Add a config variable, `safe.barerepository`, that tells Git whether or
> not to recognize bare repositories when it is trying to discover the
> repository. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).

> +safe.barerepository::
> +	This config entry specifies directories that Git can recognize as
> +	a bare repository when looking for the repository (aka repository
> +	discovery). This has no effect if repository discovery is not
> +	performed e.g. the path to the repository is set via `--git-dir`
> +	(see linkgit:git[1]).
> ++
> +It is recommended that you set this value so that Git will only use the bare
> +repositories you intend it to. This prevents certain types of security and
> +non-security problems, such as:
> +
> +* `git clone`-ing a repository containing a maliciously bare repository
> +  inside it.

"maliciously bare"? "malicious bare" probably.

> +* Git recognizing a directory that isn't mean to be a bare repository,

"mean to be" -> "meant to be".

> +  but happens to look like one.

> diff --git a/setup.c b/setup.c
> index a7b36f3ffbf..9b5dd877273 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1133,6 +1133,40 @@ static int ensure_valid_ownership(const char *path)
>  	return data.is_safe;
>  }
>  
> +/*
> + * This is similar to safe_directory_data, but only supports true/false.
> + */
> +struct safe_bare_repository_data {
> +	int is_safe;
> +};
> +
> +static int safe_bare_repository_cb(const char *key, const char *value, void *d)
> +{
> +	struct safe_bare_repository_data *data = d;
> +
> +	if (strcmp(key, "safe.barerepository"))
> +		return 0;
> +
> +	if (!value || !strcmp(value, "*")) {
> +		data->is_safe = 1;
> +		return 0;
> +	}
> +	if (!*value) {
> +		data->is_safe = 0;
> +		return 0;
> +	}
> +	return -1;
> +}
> +
> +static int should_detect_bare(void)
> +{
> +	struct safe_bare_repository_data data;
> +
> +	read_very_early_config(safe_bare_repository_cb, &data);
> +
> +	return data.is_safe;
> +}
> +
>  enum discovery_result {
>  	GIT_DIR_NONE = 0,
>  	GIT_DIR_EXPLICIT,
> @@ -1238,7 +1272,7 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
>  			return GIT_DIR_DISCOVERED;
>  		}
>  
> -		if (is_git_directory(dir->buf)) {
> +		if (should_detect_bare() && is_git_directory(dir->buf)) {
>  			if (!ensure_valid_ownership(dir->buf))
>  				return GIT_DIR_INVALID_OWNERSHIP;
>  			strbuf_addstr(gitdir, ".");

This is in a loop, which will go up and try the parent directory if
the body of this block is not entered, so it is calling the new
should_detect_bare() helper over and over if it returns false.

Not a very good idea.

Perhaps this would help?  I dunno.

static int should_detect_bare(void)
{
	static int should = -1; /* unknown yet */

	if (should < 0) {
		struct safe_bare_repository_data data = { 0 };
		read_very_early_config(safe_bare_repository_cb, &data);
		should = data.is_safe;
	}
	return should;
}

In any case, I very much appreciate the fact that this touches the
setup_git_directory_gently_1() codepath only minimally, as we have
other plans to update the code further soonish.

Thanks.

@gitgitgadget-git
Copy link

On the Git mailing list, Taylor Blau wrote (reply to this):

Hi Glen,

On Fri, May 06, 2022 at 06:30:10PM +0000, Glen Choo via GitGitGadget wrote:
> From: Glen Choo <chooglen@google.com>
>
> Add a config variable, `safe.barerepository`, that tells Git whether or
> not to recognize bare repositories when it is trying to discover the
> repository. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).

Thanks for working on this! I'm excited to see some patches here, though
I'm not totally convinced of this direction. More below.

To summarize, this proposal attempts to work around the problem of
embedding bare repositories in non-bare checkouts by providing a way to
opt-out of bare repository discovery (which is to only discover things
that are listed in the safe.bareRepository configuration).

I agree that this would prevent the problem you're trying to solve, but
I have significant concerns that this patch is going too far (at the
risk of future damage to unrelated workflows) in order to accomplish
that goal.

My concern is that if we ever flipped the default (i.e. that
"safe.bareRepository" might someday be ""), that many legitimate cases
of using bare repositories would be broken. I think there are many such
legitimate use cases that _do_ rely on discovering bare repositories
(i.e., Git invocations that do not have a `--git-dir` in their
command-line). One such example would be forges, but I imagine that
there are many other uses we don't even know about, and I would like to
avoid breaking those if we ended up changing the default.

If it's possible to pursue a more targeted fix that leaves non-embedded
bare repositories alone, I'd like to try and focus these efforts on a
more narrow fix that would address just the case of embedded bare
repositories. I think that the direction I outlined in:

    https://lore.kernel.org/git/Ylobp7sntKeWTLDX@nand.local/

could be a good place to start (see the paragraph beginning with "Here's
an alternative approach" and below for the details).

One potential problem with that approach (that this patch doesn't suffer
from) is that any discovery which finds a bare repository would have to
continue up to the root of the volume in order to figure out whether or
not that bare repository is embedded in another non-bare one. That is
probably a non-starter due to performance, but I think you could easily
work around with a top-level setting that controls whether or not you
even _care_ about embedded bare repositories.

For example, if I set safe.bareRepository='*' in my top-level
/etc/gitconfig, then we can avoid having to continue discovery for bare
repositories altogether because we know we'll allow it anyway.

To pursue a change that targets just embedded bare repositories, I think
you fundamentally have to do an exhaustive repository discovery in order
to figure out whether the (bare) repository you're dealing with is
embedded or not. So having an opt-out for users that either (a) don't
care or (b) can't accept the performance degradation that Emily
mentioned as a result of doing unbounded filesystem traversal would be
sensible.

Playing devil's advocate for a moment, though, even if we had something
like the proposal I outlined, flipping the top-level default from '*' to
some value that implies we stop working in embedded bare repositories
will break existing workflows. But that breakage would just be limited
to embedded bare repositories, and not non-embedded ones. So I think on
balance that breakage would affect fewer real-world users, while still
being just as easy to recover from.

>     safe.barerepository is presented to users as an allow-list of
>     directories that Git will recognize as a bare repository during the
>     repository discovery process (much like safe.directory), but this patch
>     only implements (and permits) boolean behavior (i.e. on, off and unset).
>     Hopefully, this gives us some room to discuss and experiment with
>     possible formats.
>
>     Thanks to Taylor for suggesting the allow-list idea :)

I did suggest an allow-list, but not this one ;-).

>     I think the core concept of letting users toggle bare repo discovery is
>     solid, but I'm sending this as RFC for the following reasons:
>
>      * I don't love the name safe.barerepository, because it feels like Git
>        is saying that bare repos are unsafe and consequently, that bare repo
>        users are behaving unsafely. On the other hand, this is quite similar
>        to safe.directory in a few respects, so it might make sense for the
>        naming to reflect that.

Yes, the concerns I outlined above are definitely echoing this
sentiment. Another way to say it is that this feels like too big of a
hammer (i.e., it is targeting _all_ bare repositories, not just embedded
ones) for too small of a nail (embedded bare repositories). As you're
probably sick of hearing me say by now, I would strongly prefer a more
targeted solution (perhaps what I outlined, or perhaps something else,
so long as it doesn't break non-embedded bare repositories if/ever we
decided to change the default value of safe.bareRepository).

>      * The *-gcc CI jobs don't pass. I haven't discerned any kind of pattern
>        yet.

Interesting. I wouldn't expect this to be the case (since the default is
to allow everything right now).

>      * In the longer-term, we might identify a usable-enough default that we
>        can give opt-out protection that works for the vast majority of
>        users.

Perhaps, and I think if this were the case then I would feel differently
about this patch. But I don't want us to paint ourselves into a corner,
either. It would be unfortunate to, say, find ourselves in a position
where the only protection against some novel embedded bare repository
attack is to change a default that would break many existing workflows
for _non_-embedded bare repositories.

>     = Other questions/Concerns
>
>      * Maybe it's more informative for the user if we die() (or warn()) when
>        we find a bare repo instead of silently ignoring it?

We should definitely provide more feedback to the user. If I set
`safe.bareRepository` to the empty string via a global config, and then
execute a Git command in a non-embedded bare repository, I get:

    $ git.compile config --get --global --default='*' safe.bareRepository

    $ git.compile rev-parse --absolute-git-dir
    fatal: not a git repository (or any of the parent directories): .git

whereas on the last release of Git, I get instead:

    $ git rev-parse --absolute-git-dir
    /home/ttaylorr/repo.git

I'm still not convinced that just reading repository extensions while
ignoring the rest of config and hooks is too confusing, so I'd be more
in favor of something like:

    $ git.compile rev-parse --absolute-git-dir
    warning: ignoring repository config and hooks
    advice: to permit bare repository discovery (which
    advice: will read config and hooks), consider running:
    advice:
    advice:   $ git config --global --add safe.bareRepository /home/ttaylorr/repo.git
    /home/ttaylorr/repo.git

(though I still feel strongly that we should pursue a more targeted
approach here).

Thanks,
Taylor

@gitgitgadget-git
Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

Taylor Blau <me@ttaylorr.com> writes:

> Thanks for working on this! I'm excited to see some patches here, though
> I'm not totally convinced of this direction. More below.
>
> To summarize, this proposal attempts to work around the problem of
> embedding bare repositories in non-bare checkouts by providing a way to
> opt-out of bare repository discovery (which is to only discover things
> that are listed in the safe.bareRepository configuration).
>
> I agree that this would prevent the problem you're trying to solve, but
> I have significant concerns that this patch is going too far (at the
> risk of future damage to unrelated workflows) in order to accomplish
> that goal.
>
> My concern is that if we ever flipped the default (i.e. that
> "safe.bareRepository" might someday be ""), that many legitimate cases
> of using bare repositories would be broken. I think there are many such
> legitimate use cases that _do_ rely on discovering bare repositories
> (i.e., Git invocations that do not have a `--git-dir` in their
> command-line).

I think 99% of such use is to chdir into the directory with HEAD,
refs/ and objects/ in it and let git recognise the cwd is a git
directory.  Am I mistaken, or are there tools that chdir into
objects/08/ and rely on setup_git_directory_gently_1() to find the
parent directory of that 'objects' directory to be a git directory?

I am wondering if another knob to help that particular use case
easier may be sufficient.  If you are a forge operator, you'd just
set a boolean configuration variable to say "it is sufficient to
chdir into a directory to use it a bare repository without exporting
the environment variable GIT_DIR=."

It is likely that end-user human users would not want to enable such
a variable, of course, but I wonder if a simple single knob would be
sufficient to help other use cases you are worried about?

While I wish "extensions and nothing else", i.e. we use "degraded
access", not "refuse to give access at all", were workable, I am
pessimistic that it would work well in practice.

Saying "nothing else" is easy, but we do "if X exists, use it" for
hook, and to implement "nothing else", you'd need to find such a
code and say "even if X exists, because we are in this strange
embedded bare thing, ignore this part of the logic" for every X.
We've been casually saying "potentially risky config" and then
started mixing "hooks" in the discussion, but who knows what other
things are used from the repository by third-party tools that we
need to yet add to the mix?

> I'm still not convinced that just reading repository extensions while
> ignoring the rest of config and hooks is too confusing, so I'd be more
> in favor of something like:

I do not think it would be confusing.  I am worried about it being
error prone.

>     $ git.compile rev-parse --absolute-git-dir
>     warning: ignoring repository config and hooks
>     advice: to permit bare repository discovery (which
>     advice: will read config and hooks), consider running:
>     advice:
>     advice:   $ git config --global --add safe.bareRepository /home/ttaylorr/repo.git
>     /home/ttaylorr/repo.git

Is the last line meant to be an output from "rev-parse --absolute-git-dir"?
IOW, the warning says you are ignoring, but we are still recognising
it as a repository?

By the way, do we need safe.bareRepository?  Shouldn't
safe.directory cover the same purpose?  

If a directory is on the latter, you are saying that (1) the
directory is OK to use as a repository, and (2) it is so even if the
directory is owned by somebody else, not you.

Theoretically you can argue that there can be cases where you only
want (1) and not (2), but as long as you control such a directory
(like an embedded repository in your project's checkout) yourself,
you do not have to worry about the "ok even if it is owned by
somebody else" part.




@gitgitgadget-git
Copy link

On the Git mailing list, Taylor Blau wrote (reply to this):

On Mon, May 09, 2022 at 03:54:08PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Thanks for working on this! I'm excited to see some patches here, though
> > I'm not totally convinced of this direction. More below.
> >
> > To summarize, this proposal attempts to work around the problem of
> > embedding bare repositories in non-bare checkouts by providing a way to
> > opt-out of bare repository discovery (which is to only discover things
> > that are listed in the safe.bareRepository configuration).
> >
> > I agree that this would prevent the problem you're trying to solve, but
> > I have significant concerns that this patch is going too far (at the
> > risk of future damage to unrelated workflows) in order to accomplish
> > that goal.
> >
> > My concern is that if we ever flipped the default (i.e. that
> > "safe.bareRepository" might someday be ""), that many legitimate cases
> > of using bare repositories would be broken. I think there are many such
> > legitimate use cases that _do_ rely on discovering bare repositories
> > (i.e., Git invocations that do not have a `--git-dir` in their
> > command-line).
>
> I think 99% of such use is to chdir into the directory with HEAD,
> refs/ and objects/ in it and let git recognise the cwd is a git
> directory.  Am I mistaken, or are there tools that chdir into
> objects/08/ and rely on setup_git_directory_gently_1() to find the
> parent directory of that 'objects' directory to be a git directory?

If you took this change, and then at some point in the future we changed
the default value of safe.bareRepository to "", wouldn't that break that
99% of use cases you are talking about?

When I read your "I think 99% of such use is ...", it makes me think
that this change won't disrupt bare repo discovery when we only traverse
one layer above $CWD. But this change disrupts the case where we don't
need to traverse at all to do discovery (i.e., when $CWD is the root of
a bare repository).

> I am wondering if another knob to help that particular use case
> easier may be sufficient.  If you are a forge operator, you'd just
> set a boolean configuration variable to say "it is sufficient to
> chdir into a directory to use it a bare repository without exporting
> the environment variable GIT_DIR=."

Yes, GitHub would almost certainly set safe.bareRepository to "*"
regardless of what Git's own default would be.

> It is likely that end-user human users would not want to enable such
> a variable, of course, but I wonder if a simple single knob would be
> sufficient to help other use cases you are worried about?

I'm not sure I agree that end-users wouldn't want to touch this knob. If
they have embedded bare repositories that they rely on as test fixtures,
for example, wouldn't safe.bareRepository need to be tweaked?

(On a separate but somewhat-related note, I still think that this
setting should be read from the repository config, too, i.e., it seems
odd that we'd force a user to set safe.bareRepository to some deeply
nested repository (in the embedded case) via their global config.)

> While I wish "extensions and nothing else", i.e. we use "degraded
> access", not "refuse to give access at all", were workable, I am
> pessimistic that it would work well in practice.
>
> Saying "nothing else" is easy, but we do "if X exists, use it" for
> hook, and to implement "nothing else", you'd need to find such a
> code and say "even if X exists, because we are in this strange
> embedded bare thing, ignore this part of the logic" for every X.
> We've been casually saying "potentially risky config" and then
> started mixing "hooks" in the discussion, but who knows what other
> things are used from the repository by third-party tools that we
> need to yet add to the mix?
>
> > I'm still not convinced that just reading repository extensions while
> > ignoring the rest of config and hooks is too confusing, so I'd be more
> > in favor of something like:
>
> I do not think it would be confusing.  I am worried about it being
> error prone.

Yeah, on this and the above quoted hunk, I am fine if our behavior
eventually became "call die()" for when we are in an embedded bare
repository. But I do think this transition should be gradual, i.e., we
should likely emit a warning in those cases that would be broken in the
future to say "this will break, run this `git config` invocation if you
want it to remain working".

> >     $ git.compile rev-parse --absolute-git-dir
> >     warning: ignoring repository config and hooks
> >     advice: to permit bare repository discovery (which
> >     advice: will read config and hooks), consider running:
> >     advice:
> >     advice:   $ git config --global --add safe.bareRepository /home/ttaylorr/repo.git
> >     /home/ttaylorr/repo.git
>
> Is the last line meant to be an output from "rev-parse --absolute-git-dir"?
> IOW, the warning says you are ignoring, but we are still recognising
> it as a repository?

In this example, yes. But again, I'm not so deeply attached to the idea
that we *have* to run in those cases. So I would equally be OK with the
above s/warning/fatal and minus the last line, too (i.e., that we call
die(), obviously we'd have to emit the advice before calling die()).

> By the way, do we need safe.bareRepository?  Shouldn't
> safe.directory cover the same purpose?
>
> If a directory is on the latter, you are saying that (1) the
> directory is OK to use as a repository, and (2) it is so even if the
> directory is owned by somebody else, not you.
>
> Theoretically you can argue that there can be cases where you only
> want (1) and not (2), but as long as you control such a directory
> (like an embedded repository in your project's checkout) yourself,
> you do not have to worry about the "ok even if it is owned by
> somebody else" part.

I'm not sure yet, but will think more about it.

Thanks,
Taylor

@gitgitgadget-git
Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

Taylor Blau <me@ttaylorr.com> writes:

>> > My concern is that if we ever flipped the default (i.e. that
>> > "safe.bareRepository" might someday be ""), that many legitimate cases
>> > of using bare repositories would be broken. I think there are many such
>> > legitimate use cases that _do_ rely on discovering bare repositories
>> > (i.e., Git invocations that do not have a `--git-dir` in their
>> > command-line).
>>
>> I think 99% of such use is to chdir into the directory with HEAD,
>> refs/ and objects/ in it and let git recognise the cwd is a git
>> directory.  Am I mistaken, or are there tools that chdir into
>> objects/08/ and rely on setup_git_directory_gently_1() to find the
>> parent directory of that 'objects' directory to be a git directory?
>
> If you took this change, and then at some point in the future we changed
> the default value of safe.bareRepository to "", wouldn't that break that
> 99% of use cases you are talking about?

Our spawning (e.g. "fetch" run_command()s "upload-pack" in a local
repository, or "fetch" runs "upload-pack" over ssh connection, or
http gateway runs "upload-pack" after learning which repository the
request is fetching from) of subcommands can and should be fixed by
exporting "GIT_DIR=." when we spawn them in the target directory,
and such a fix should be more or less trivial.  It must happen
before such a switch of default happens (if it is what we plan to
do, that is).  Also, the trivial fix must be conveyed to third-party
tool authors and give them time to adjust their ware.

That's part of the usual migration process, and I am not so worried
about it.

If some third-party tool for whatever reason wants to start from a
random subdirectory in a bare repository, that is a different story.
Fixing such a third-party tool would be more involved than "more or
less trivial".

> When I read your "I think 99% of such use is ...", it makes me think
> that this change won't disrupt bare repo discovery when we only traverse
> one layer above $CWD. But this change disrupts the case where we don't
> need to traverse at all to do discovery (i.e., when $CWD is the root of
> a bare repository).

By "this change" you mean what Glen proposes?  I think it was
designed to break the use case where you go there to signal that you
want to use the directory as a repository.

>> I am wondering if another knob to help that particular use case
>> easier may be sufficient.  If you are a forge operator, you'd just
>> set a boolean configuration variable to say "it is sufficient to
>> chdir into a directory to use it a bare repository without exporting
>> the environment variable GIT_DIR=."

And such a boolean, without safe.bareRepository setting, should be
sufficient to cover that 99% of such use, because it disables that
deliberate refusal of treating CWD as a repository without
explicitly saying that is what you want with "GIT_DIR=.".  One thing
I wasn't sure about was if that 99% number is close to reality,
hence my question.

> Yes, GitHub would almost certainly set safe.bareRepository to "*"
> regardless of what Git's own default would be.

And with such a boolean, I am hoping that GitHub do not have to make
such a wildly open setting.  Only $CWD that is the top of a repository,
without allowing it to be any random subdirectory, would be allowed.

> I'm not sure I agree that end-users wouldn't want to touch this knob. If
> they have embedded bare repositories that they rely on as test fixtures,
> for example, wouldn't safe.bareRepository need to be tweaked?

But not in the "My $CWD is always fine" knob, whose only reason is
to simplify things without opening you up unnecessarily too widely
for hosting sites.

@gitgitgadget-git
Copy link

On the Git mailing list, Glen Choo wrote (reply to this):

Hi Taylor,

Taylor Blau <me@ttaylorr.com> writes:

> Hi Glen,
>
> On Fri, May 06, 2022 at 06:30:10PM +0000, Glen Choo via GitGitGadget wrote:
>> From: Glen Choo <chooglen@google.com>
>>
>> Add a config variable, `safe.barerepository`, that tells Git whether or
>> not to recognize bare repositories when it is trying to discover the
>> repository. This only affects repository discovery, thus it has no
>> effect if discovery was not done (e.g. `--git-dir` was passed).
>
> To summarize, this proposal attempts to work around the problem of
> embedding bare repositories in non-bare checkouts by providing a way to
> opt-out of bare repository discovery (which is to only discover things
> that are listed in the safe.bareRepository configuration).
>
> I agree that this would prevent the problem you're trying to solve, but
> I have significant concerns that this patch is going too far (at the
> risk of future damage to unrelated workflows) in order to accomplish
> that goal.

Thanks again for the careful read. As I understand it, your concern is
that making bare repository discovery configurable and then flipping the
default to e.g. never detecting bare repositories is too disruptive to
fix the embedded bare repository problem. And to avoid disrupting
non-embedded bare repositories, you would prefer to pursue a more
targeted fix.

If the problem statement were limited to embedded bare repositories,
then I agree that this is way more than overkill, and that a targeted
solution would be preferable.

More generally however, the problem of embedded bare repositories seems
to suggest that bare repository discovery doesn't serve all users well,
and in fact, may even be a net negative for a subset of users. I'd be
interested in hearing your thoughts from that perspective, e.g.

- Should bare repository discovery should be configurable?
- What is a good default for bare repository discovery? (regardless of
  how feasible changing the default is)

This is a somewhat different direction from how the conversation started
(I hope it doesn't look like I'm shifting the goal posts), but I think
it's a good opportunity to step back and simplify something that we
wished we got right in the beginning.

And even if we don't flip the default, shipping the config value still
seems useful e.g. there's a good amount of interest in disabling bare
repository discovery at $DAYJOB (and I think we'll get a lot of
interesting results once we do).

>>     safe.barerepository is presented to users as an allow-list of
>>     directories that Git will recognize as a bare repository during the
>>     repository discovery process (much like safe.directory), but this patch
>>     only implements (and permits) boolean behavior (i.e. on, off and unset).
>>     Hopefully, this gives us some room to discuss and experiment with
>>     possible formats.
>>
>>     Thanks to Taylor for suggesting the allow-list idea :)
>
> I did suggest an allow-list, but not this one ;-).

Ah, yes. Oops. Sorry if it looked like I was putting words in your
mouth.

What I really meant was that an allow-list (untethered from any specific
purpose) seems like a useful 'UI primitive', so thanks for bringing up
the option.

>>     I think the core concept of letting users toggle bare repo discovery is
>>     solid, but I'm sending this as RFC for the following reasons:
>>
>>      * I don't love the name safe.barerepository, because it feels like Git
>>        is saying that bare repos are unsafe and consequently, that bare repo
>>        users are behaving unsafely. On the other hand, this is quite similar
>>        to safe.directory in a few respects, so it might make sense for the
>>        naming to reflect that.
>
> Yes, the concerns I outlined above are definitely echoing this
> sentiment. Another way to say it is that this feels like too big of a
> hammer (i.e., it is targeting _all_ bare repositories, not just embedded
> ones) for too small of a nail (embedded bare repositories). As you're
> probably sick of hearing me say by now, I would strongly prefer a more
> targeted solution (perhaps what I outlined, or perhaps something else,
> so long as it doesn't break non-embedded bare repositories if/ever we
> decided to change the default value of safe.bareRepository).

Ok, yeah I think safe.barerepository is a terrible way to achieve my
purported goal of 'making bare repository discovery
configurable/simpler/' - using the "safe." namespace makes it impossible
to see this as anything other than protection against dangerous, unknown
bare repositories. I'll drop the idea of safe-listing known bare
repositories for now, that seems unproductive.

'Optionally disable bare repository discovery' still sounds like it's on
the table though, but probably with a different kind of UX e.g.
"discovery.barerepository" with the options:

- always: always discover bare repos
- never: never discover bare repos
- cwd-only: only discover bare repos if they are the cwd
- dotgit-only: only discover bare repos if they are a descendant of
  .git/

>>      * The *-gcc CI jobs don't pass. I haven't discerned any kind of pattern
>>        yet.
>
> Interesting. I wouldn't expect this to be the case (since the default is
> to allow everything right now).

This might be a false alarm - I saw similar failures on an unrelated
patch. I think my "master" is just out of date :(

@chooglen chooglen force-pushed the setup/disable-bare-repo-config branch from 3370258 to b1427c0 Compare May 13, 2022 23:03
@chooglen chooglen changed the title RFC setup.c: make bare repo discovery optional setup.c: make bare repo discovery optional May 13, 2022
@chooglen chooglen force-pushed the setup/disable-bare-repo-config branch from b1427c0 to 62070aa Compare May 13, 2022 23:16
@chooglen
Copy link
Contributor Author

/submit

@gitgitgadget-git
Copy link

Submitted as pull.1261.v2.git.git.1652485058.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1261/chooglen/setup/disable-bare-repo-config-v2

To fetch this version to local tag pr-git-1261/chooglen/setup/disable-bare-repo-config-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1261/chooglen/setup/disable-bare-repo-config-v2

@gitgitgadget-git
Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  * die()-ing is necessary if we're trying to flip the default value of
>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>    more helpful to fail loudly than to silently ignore the bare repo.
>
>    But in the long term, long after we've flipped the default and users know
>    that they need to opt into bare repo discovery, would it be a better UX
>    to just silently ignore the bare repo?

Would a middle-ground of giving a warning() message help?  Can it be
loud and annoying enough to knudge the users to adjust without
breaking the functionality?

The longer-term default should be "cwd is allowed, but we do not
bother going up from object/04 subdirectory of a bare repository",
not "bare repositories should not be usable at all without GIT_DIR".

>      +    Add a config variable, `discovery.bare`, that tells Git whether or not
>      +    it should work with the bare repository it has discovered i.e. Git will
>      +    die() if it discovers a bare repository, but it is not allowed by

Missing comma before "i.e."

>      ++discovery.bare::
>      ++	Specifies what kinds of directories Git can recognize as a bare
>      ++	repository when looking for the repository (aka repository
>       +	discovery). This has no effect if repository discovery is not
>       +	performed e.g. the path to the repository is set via `--git-dir`
>       +	(see linkgit:git[1]).
>       ++
>       +This config setting is only respected when specified in a system or global
>       +config, not when it is specified in a repository config or via the command
>      ++line option `-c discovery.bare=<value>`.

;-)

>      +++
>      ++The currently supported values are `always` (Git always recognizes bare
>      ++repositories) and `never` (Git never recognizes bare repositories).
>      ++This defaults to `always`, but this default is likely to change.
>      +++
>      ++If your workflow does not rely on bare repositories, it is recommended that
>      ++you set this value to `never`. This makes repository discovery easier to
>      ++reason about and prevents certain types of security and non-security
>      ++problems, such as:

Hopefully "git fetch" over ssh:// and file:/// would run the other
side with GIT_DIR explicitly set?  As long as this recommendation
does not break these use cases, I think we are OK, but I do not yet
find these "problems, such as..." so convincing.

@gitgitgadget-git
Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  t/t0034-discovery-bare.sh          | 69 +++++++++++++++++++++++

This number is already in use by an in-flight topic, if I am not
mistaken.  Please make it a habit to always check your topic works
well when merged to 'next' and to 'seen'.

Thanks.

@@ -0,0 +1,24 @@
discovery.bare::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> Add a config variable, `discovery.bare`, that tells Git whether or not
> it should work with the bare repository it has discovered i.e. Git will
> die() if it discovers a bare repository, but it is not allowed by
> `discovery.bare`. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).
>
> This is motivated by the fact that some workflows don't use bare
> repositories at all, and users may prefer to opt out of bare repository
> discovery altogether:
>
> - An easy assumption for a user to make is that Git commands run
>   anywhere inside a repository's working tree will use the same
>   repository. However, if the working tree contains a bare repository
>   below the root-level (".git" is preferred at the root-level), any
>   operations inside that bare repository use the bare repository
>   instead.
>
>   In the worst case, attackers can use this confusion to trick users
>   into running arbitrary code (see [1] for a deeper discussion). But
>   even in benign situations (e.g. a user renames ".git/" to ".git.old/"
>   and commits it for archival purposes), disabling bare repository
>   discovery can be a simpler mode of operation (e.g. because the user
>   doesn't actually want to use ".git.old/") [2].
>
> - Git won't "accidentally" recognize a directory that wasn't meant to be
>   a bare repository, but happens to resemble one. While such accidents
>   are probably very rare in practice, this lets users reduce the chance
>   to zero.
>
> This config is an enum of:
>
> - ["always"|(unset)]: always recognize bare repositories (like Git does
>   today)
> - "never": never recognize bare repositories
>
> More values are expected to be added later, and the default is expected
> to change (i.e. to something other than "always").
>
> [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
> [2]: I don't personally know anyone who does this as part of their
> normal workflow, but a cursory search on GitHub suggests that there is a
> not insubstantial number of people who munge ".git" in order to store
> its contents.
>
> https://github.com/search?l=&o=desc&p=1&q=ref+size%3A%3C1000+filename%3AHEAD&s=indexed&type=Code
> (aka search for the text "ref", size:<1000, filename:HEAD)
>
> Signed-off-by: Glen Choo <chooglen@google.com>

The intended commit message ends here...

> WIP setup.c: make discovery.bare die on failure
>
> Signed-off-by: Glen Choo <chooglen@google.com>

Ugh, dumb mistake (bad squash). Fortunately this was one of my more
professional-sounding WIP commit messages.

@gitgitgadget-git
Copy link

On the Git mailing list, Glen Choo wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>  * die()-ing is necessary if we're trying to flip the default value of
>>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>>    more helpful to fail loudly than to silently ignore the bare repo.
>>
>>    But in the long term, long after we've flipped the default and users know
>>    that they need to opt into bare repo discovery, would it be a better UX
>>    to just silently ignore the bare repo?
>
> Would a middle-ground of giving a warning() message help?  Can it be
> loud and annoying enough to knudge the users to adjust without
> breaking the functionality?

Personally, when my tool changes its behavior, I would strongly prefer
it to die than to "change behavior + warn". I'd feel more comfortable
knowing that the tool did nothing as opposed to doing the wrong thing
and only being informed after the fact. Also, I sometimes ignore
warnings ;)

When we _do_ transition away from die(), ignore + warning() sounds like
a good first step.

But if any of this flies in the face of the project's conventions, let
me know as such.

>>      +    Add a config variable, `discovery.bare`, that tells Git whether or not
>>      +    it should work with the bare repository it has discovered i.e. Git will
>>      +    die() if it discovers a bare repository, but it is not allowed by
>
> Missing comma before "i.e."

Thanks.

>>      +++
>>      ++The currently supported values are `always` (Git always recognizes bare
>>      ++repositories) and `never` (Git never recognizes bare repositories).
>>      ++This defaults to `always`, but this default is likely to change.
>>      +++
>>      ++If your workflow does not rely on bare repositories, it is recommended that
>>      ++you set this value to `never`. This makes repository discovery easier to
>>      ++reason about and prevents certain types of security and non-security
>>      ++problems, such as:
>
> Hopefully "git fetch" over ssh:// and file:/// would run the other
> side with GIT_DIR explicitly set?

Ah, I'll check this and get back to you.

>                                                        I do not yet
> find these "problems, such as..." so convincing.

What would be a convincing rationale to you? I'll capture that here.

I'm assuming that you already have such an rationale in mind when you
say that the longer-term default is that "we respect bare repositories
only if they are the cwd.". I'm also assuming that this rationale is
something other than embedded bare repos, because "cwd-only" does not
protect against that.

Perhaps "never" sounds better to folks who don't ever expect bare
repositories and want to lock down the environment. Randall (cc-ed)
suggests one such use case in [1].

(To Randall: Oops, I actually meant to cc you earlier, since you were
the first to suggest a practical use case for never allowing bare repos.
It must've slipped my mind).

[1] https://lore.kernel.org/git/005d01d84ad0$782e8fc0$688baf40$@nexbridge.com.

@@ -0,0 +1,24 @@
discovery.bare::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> From: Glen Choo <chooglen@google.com>
> 
> Add a config variable, `discovery.bare`, that tells Git whether or not
> it should work with the bare repository it has discovered i.e. Git will
> die() if it discovers a bare repository, but it is not allowed by
> `discovery.bare`. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).

> This config is an enum of:
> 
> - ["always"|(unset)]: always recognize bare repositories (like Git does
>   today)
> - "never": never recognize bare repositories
> 
> More values are expected to be added later, and the default is expected
> to change (i.e. to something other than "always").

I think it is fine to include the "never" option for users to opt-in to
this super-protected state, but I want to make it very clear that we
should never move to it as a new default. This phrasing of 'something
other than "always"' is key, but it might be good to point out that
"never" is very unlikely to be that default.

> WIP setup.c: make discovery.bare die on failure
> 
> Signed-off-by: Glen Choo <chooglen@google.com>

Accidental concatenation of squashed commit?

> diff --git a/Documentation/config/discovery.txt b/Documentation/config/discovery.txt
> new file mode 100644
> index 00000000000..761cabe6e70
> --- /dev/null
> +++ b/Documentation/config/discovery.txt
> @@ -0,0 +1,24 @@
> +discovery.bare::
> +	Specifies what kinds of directories Git can recognize as a bare
> +	repository when looking for the repository (aka repository
> +	discovery). This has no effect if repository discovery is not
> +	performed e.g. the path to the repository is set via `--git-dir`
> +	(see linkgit:git[1]).

Avoid "e.g." here.

	This has no effect if the repository is specified directly via
	the --git-dir command-line option or the GIT_DIR environment
	variable.

> +This config setting is only respected when specified in a system or global
> +config, not when it is specified in a repository config or via the command
> +line option `-c discovery.bare=<value>`.

We are sprinkling config options that have these same restrictions throughout
the config documentation. It might be time to define a term like "protected
config" at the top of git-config.txt and then refer to that from these other
locations.

> +The currently supported values are `always` (Git always recognizes bare
> +repositories) and `never` (Git never recognizes bare repositories).

This sentence structure is likely to change in the future, and as it stands
will become complicated. A bulleted list will have easier edits in the future.

> +This defaults to `always`, but this default is likely to change.

For now, I would say "but this default may change in the future." instead.

> +If your workflow does not rely on bare repositories, it is recommended that
> +you set this value to `never`. This makes repository discovery easier to
> +reason about and prevents certain types of security and non-security
> +problems, such as:
> +

(You might need a "+" here.)

> +* `git clone`-ing a repository containing a malicious bare repository
> +  inside it.
> +* Git recognizing a directory that isn't meant to be a bare repository,
> +  but happens to look like one.

I think these last bits recommending the 'never' option are a bit
distracting. It doesn't make repository discovery "easier to reason
about" because we still discover the bare repo and die() instead of
skipping it and looking higher for a non-bare repository in the
parent directories. The case of an "accidentally-recognized bare
repo" is so unlikely it is probably not worth mention in these docs.

Instead, I think something like this might be better:

  If you do not use bare repositories in your workflow, then it may
  be beneficial to set `discovery.bare` to `never` in your global
  config. This will protect you from attacks that involve cloning a
  repository that contains a bare repository and running a Git
  command within that directory.

> +static int check_bare_repo_allowed(void)
> +{
> +	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
> +		read_very_early_config(discovery_bare_cb, NULL);

This will add the third place where we use read_very_early_config(),
adding to the existing calls in tr2_sysenv_load() and
ensure_valid_ownership(). If I understand it correctly, that means
that every Git execution in a bare repository will now parse the
system and global config three times.

This doesn't count the check for uploadpack.packobjectshook in
upload-pack.c that uses current_config_scope() to restrict its
value to the system and global config.

We are probably at the point where we need to instead create a
configset that stores this "protected config" and allow us to
lookup config keys directly from that configset instead of
iterating through these config files repeatedly.

> +		/* We didn't find a value; use the default. */
> +		if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN)
> +			discovery_bare_config = DISCOVERY_BARE_ALWAYS;

This could also be done in advance of the config parsing
by setting discovery_bare_config = DISCOVERY_BARE_ALWAYS before
calling read_very_early_config(). Avoids an if and a comment
here, which might be nice.

> +	}
> +	switch (discovery_bare_config) {
> +	case DISCOVERY_BARE_NEVER:
> +		return 0;
> +	case DISCOVERY_BARE_ALWAYS:
> +		return 1;
> +	default:
> +		BUG("invalid discovery_bare_config %d", discovery_bare_config);
> +	}

You return -1 in discovery_bare_cb when the key matches, but
the value is not understood. Should we check the return value
of read_very_early_config(), too?
> +static const char *discovery_bare_config_to_string(void)
> +{
> +	switch (discovery_bare_config) {
> +	case DISCOVERY_BARE_NEVER:
> +		return "never";
> +	case DISCOVERY_BARE_ALWAYS:
> +		return "always";
> +	default:
> +		BUG("invalid discovery_bare_config %d", discovery_bare_config);

In general, I'm not sure these BUG() statements are helpful,
but they aren't hurting anything. I wonder if it would be
better to use DISCOVERY_BARE_UNKNOWN instead of default,
because then the compiler should notice that the switch needs
updating when a new enum mode is added.

> @@ -1142,7 +1195,8 @@ enum discovery_result {
>  	GIT_DIR_HIT_CEILING = -1,
>  	GIT_DIR_HIT_MOUNT_POINT = -2,
>  	GIT_DIR_INVALID_GITFILE = -3,
> -	GIT_DIR_INVALID_OWNERSHIP = -4
> +	GIT_DIR_INVALID_OWNERSHIP = -4,
> +	GIT_DIR_DISALLOWED_BARE = -5

I think that you can add a comma at the end of this enum to avoid the
changed line the next time the enum needs to be expanded.

>  };
>  
>  /*
> @@ -1239,6 +1293,8 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
>  		}
>  
>  		if (is_git_directory(dir->buf)) {
> +			if (!check_bare_repo_allowed())
> +				return GIT_DIR_DISALLOWED_BARE;

Won't this fail if someone runs a Git command inside of a .git/
directory for a non-bare repository? I just want to be sure that
we hit this error instead:

	fatal: this operation must be run in a work tree

I see that this error is tested in t0008-ignores.sh, but that's
with the default "always" value. It would be good to explicitly
check that this is the right error when using the "never" config.

Thanks,
-Stolee

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Mon, May 16, 2022 at 02:46:55PM -0400, Derrick Stolee wrote:
> On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> > From: Glen Choo <chooglen@google.com>
> >
> > Add a config variable, `discovery.bare`, that tells Git whether or not
> > it should work with the bare repository it has discovered i.e. Git will
> > die() if it discovers a bare repository, but it is not allowed by
> > `discovery.bare`. This only affects repository discovery, thus it has no
> > effect if discovery was not done (e.g. `--git-dir` was passed).
>
> > This config is an enum of:
> >
> > - ["always"|(unset)]: always recognize bare repositories (like Git does
> >   today)
> > - "never": never recognize bare repositories
> >
> > More values are expected to be added later, and the default is expected
> > to change (i.e. to something other than "always").
>
> I think it is fine to include the "never" option for users to opt-in to
> this super-protected state, but I want to make it very clear that we
> should never move to it as a new default. This phrasing of 'something
> other than "always"' is key, but it might be good to point out that
> "never" is very unlikely to be that default.

I am confused, then.

What does a user who has some legitimate (non-embedded) bare
repositories do if they are skeptical of other bare repositories? I
suspect the best answer we would be able to provide with these patches
is "use `--git-dir`".

What happens to a user who has a combination of legitimate bare
repositories, embedded bare repositories that they trust, and other
embedded bare repositories that they don't?

As far as I can tell, our recommendation with these tools would be to:

  - run `git config --global discovery.bare never`, and
  - include `--git-dir=$(pwd)` in any git invocations in bare
    repositories that they do trust

This gets at my concerns from [1] and [2] (mostly [2], in this case)
that we're trying to close the embedded bare repos problem with an
overly broad solution, at the expense of usability.

I can't shake the feeling that something like I described towards the
bottom of [2] would give you all of the security guarantees you're after
without compromising on usability for non-embedded bare repositories.

I'm happy to explore this direction more myself if you don't want to. I
would just much rather see us adopt an approach that doesn't break more
use-cases than it has to if such a thing can be avoided.

I cannot endorse these patches as-is.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/Ylobp7sntKeWTLDX@nand.local/
[2]: https://lore.kernel.org/git/YnmKwLoQCorBnMe2@nand.local/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

Thanks for being thorough, I find it really helpful.

For brevity, I won't reply to comments that I think are obviously good,
so you can assume I'll incorproate anything that isn't commented on.

Derrick Stolee <derrickstolee@github.com> writes:

> On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
>> From: Glen Choo <chooglen@google.com>
>> 
>> +This config setting is only respected when specified in a system or global
>> +config, not when it is specified in a repository config or via the command
>> +line option `-c discovery.bare=<value>`.
>
> We are sprinkling config options that have these same restrictions throughout
> the config documentation. It might be time to define a term like "protected
> config" at the top of git-config.txt and then refer to that from these other
> locations.

Agree, and I think defining the term will be useful in future on-list
discussions.

>> +static int check_bare_repo_allowed(void)
>> +{
>> +	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
>> +		read_very_early_config(discovery_bare_cb, NULL);
>
> This will add the third place where we use read_very_early_config(),
> adding to the existing calls in tr2_sysenv_load() and
> ensure_valid_ownership(). If I understand it correctly, that means
> that every Git execution in a bare repository will now parse the
> system and global config three times.
>
> This doesn't count the check for uploadpack.packobjectshook in
> upload-pack.c that uses current_config_scope() to restrict its
> value to the system and global config.
>
> We are probably at the point where we need to instead create a
> configset that stores this "protected config" and allow us to
> lookup config keys directly from that configset instead of
> iterating through these config files repeatedly.

Looking at all of the read_very_early_config() calls,

- check_bare_repo_allowed() can use git_configset_get_string()
- ensure_valid_ownership() can use git_configset_get_value_multi()
- tr2_sysenv_load() reads every value with the "trace2." prefix. AFAICT
  configsets only support exact key lookups and I don't see an easy way
  teach configsets to support prefix lookups.

(I didn't look too closely at uploadpack.packobjectshook because I don't
know enough about config scopes to comment.)

So using a configset, we'll still need to read the config files at least
twice. That's better than thrice, but it doesn't cover the
tr2_sysenv_load() use case, and we'll run into this yet again if add
function that reads all config values with a given prefix.

An hacky alternative that covers all of these use cases would be to read
all protected config in a single pass, e.g.

  static struct protected_config {
         struct safe_directory_data safe_directory_data;
         const char *discovery_bare;
         struct string_list tr2_sysenv;
  };

  static int protected_config_cb()
  {
    /* Parse EVERYTHING that belongs in protected_config. */
  }

but protected_config_cb() would have to parse too many unrelated things
for my liking.

So I'll use the configset for the cases where the key is known, and
perhaps we'll punt on tr2_sysenv_load().

>> +	}
>> +	switch (discovery_bare_config) {
>> +	case DISCOVERY_BARE_NEVER:
>> +		return 0;
>> +	case DISCOVERY_BARE_ALWAYS:
>> +		return 1;
>> +	default:
>> +		BUG("invalid discovery_bare_config %d", discovery_bare_config);
>> +	}
>
> You return -1 in discovery_bare_cb when the key matches, but
> the value is not understood. Should we check the return value
> of read_very_early_config(), too?

This comment doesn't apply because unlike most other config reading
functions, read_very_early_config() and read_early_config() die when the
callback returns -1.

I'm not sure why this is the case though, and maybe you think there is
value in having a non-die()-ing variant, e.g.
read_very_early_config_gently()?

>>  };
>>  
>>  /*
>> @@ -1239,6 +1293,8 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
>>  		}
>>  
>>  		if (is_git_directory(dir->buf)) {
>> +			if (!check_bare_repo_allowed())
>> +				return GIT_DIR_DISALLOWED_BARE;
>
> Won't this fail if someone runs a Git command inside of a .git/
> directory for a non-bare repository? I just want to be sure that
> we hit this error instead:
>
> 	fatal: this operation must be run in a work tree
>
> I see that this error is tested in t0008-ignores.sh, but that's
> with the default "always" value. It would be good to explicitly
> check that this is the right error when using the "never" config.

Yes, it will fail if run inside of a .git/ directory. "never" prevents
you from working from inside .git/ unless you set GIT_DIR.

IIRC, we don't show "fatal: this operation must be run in a work
tree" for every Git command, e.g. "git log" works just fine. It makes
sense to show this warning when the CWD supports 'some, but not all' Git
commands, but I don't think this is valuable if we forbid *all* Git
commands.

Instead of trying to make "never" accomodate this use case, perhaps what
we want is a "dotgit-only" option that allows a bare repository if it is
below a .git/ directory. Since we forbid .git in the index, this seems
somewhat safe, but I hadn't proposed this sooner because I don't know if
we need it yet, and I'm certain that there are less secure edge cases
that need to be thought through.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

Glen Choo <chooglen@google.com> writes:

>>> +static int check_bare_repo_allowed(void)
>>> +{
>>> +	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
>>> +		read_very_early_config(discovery_bare_cb, NULL);
>>
>> This will add the third place where we use read_very_early_config(),
>> adding to the existing calls in tr2_sysenv_load() and
>> ensure_valid_ownership(). If I understand it correctly, that means
>> that every Git execution in a bare repository will now parse the
>> system and global config three times.
>>
>> This doesn't count the check for uploadpack.packobjectshook in
>> upload-pack.c that uses current_config_scope() to restrict its
>> value to the system and global config.
>>
>> We are probably at the point where we need to instead create a
>> configset that stores this "protected config" and allow us to
>> lookup config keys directly from that configset instead of
>> iterating through these config files repeatedly.
>
> Looking at all of the read_very_early_config() calls,
>
> - check_bare_repo_allowed() can use git_configset_get_string()
> - ensure_valid_ownership() can use git_configset_get_value_multi()
> - tr2_sysenv_load() reads every value with the "trace2." prefix. AFAICT
>   configsets only support exact key lookups and I don't see an easy way
>   teach configsets to support prefix lookups.
>
> (I didn't look too closely at uploadpack.packobjectshook because I don't
> know enough about config scopes to comment.)
>
> So using a configset, we'll still need to read the config files at least
> twice. That's better than thrice, but it doesn't cover the
> tr2_sysenv_load() use case, and we'll run into this yet again if add
> function that reads all config values with a given prefix.
>
> An hacky alternative that covers all of these use cases would be to read
> all protected config in a single pass, e.g.
>
>   static struct protected_config {
>          struct safe_directory_data safe_directory_data;
>          const char *discovery_bare;
>          struct string_list tr2_sysenv;
>   };
>
>   static int protected_config_cb()
>   {
>     /* Parse EVERYTHING that belongs in protected_config. */
>   }
>
> but protected_config_cb() would have to parse too many unrelated things
> for my liking.
>
> So I'll use the configset for the cases where the key is known, and
> perhaps we'll punt on tr2_sysenv_load().

Since I'm trying to replace read_very_early_config() anyway, is this a
good time to teach git to respect "-c safe.directory"?

My understanding of [1] is that we only ignore "-c safe.directory"
because read_very_early_config() doesn't support it, but we would prefer
to support it if we could.

[1] https://lore.kernel.org/git/xmqqlevabcsu.fsf@gitster.g/

@@ -0,0 +1,26 @@
discovery.bare::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> From: Glen Choo <chooglen@google.com>
> 
> Add a 'cwd' option to discovery.bareRepository, which allows a bare
> repository to be used if and only if the cwd is the root of a bare
> repository. This covers the common case where a user works with a bare
> repository by cd-ing into the repository's root.

I don't consider this case valuable. In addition to allowing
the most-common use case, it also allows the most-common route
that an attacker would use to try to get a user to run a Git
command in a malicious embedded bare repo. I think we are
better off without it.

Thanks,
-Stolee

@gitgitgadget-git
Copy link

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> Thanks all for the comments on v1, I've expanded this series somewhat to
> address them,...

Please include a full cover letter with each version, so reviewers
can respond to the full series goals.

Your series here intends to start protecting against malicious
embedded bare repositories by allowing users to opt-in to a more
protected state. When the 'discovery.bare' option is set, then
Git may die() on a bare repository that is discovered based on
the current working directory (these protections are ignored if
the user specifies the directory directly through --git-dir or
$GIT_DIR).

The 'discovery.bare' option has these values at the end of your
series:

* 'always' (default) allows all bare repos, matching the current
  behavior of Git.

* 'never' avoids operating in bare repositories altogether.

* 'cwd' operates in a bare repository only if the current directory
  is exactly the root of the bare repository.

It is important that we keep 'always' as the default at first,
because we do not want to introduce a breaking change without
warning (at least for an issue like this that has been around
for a long time).

The 'never' option is a good one for very security-conscious
users who really want to avoid problems. I don't anticipate that
users who know about this option and set it themselves are the
type that would fall for the social engineering required to
attack using this vector, but I can imagine an IT department
installing the value in system config across a fleet of machines.

I find the 'cwd' option to not be valuable. It unblocks most
existing users, but also almost completely removes the protection
that the option was supposed to provide.

I find neither the 'never' or 'cwd' options an acceptable choice
for a future default.

I also think that this protection is too rigid: it restricts
_all_ bare repositories, not just embedded ones. There is no check
to see if the parent directory of the bare repository is inside a
non-bare repository.

This leads to what I think would be a valuable replacement for
the 'cwd' option:

* 'no-embedded' allows non-embedded bare repositories. An
  _embedded bare repository_ is a bare repository whose parent
  directory is contained in the worktree of a non-bare Git
  repository. When in this mode, embedded bare repositories are
  not allowed unless the parent non-bare Git repository has a
  'safe.embedded' config value storing the path to the current
  embedded bare repository.

That was certainly difficult to write, but here it is as
pseudo-code to hopefully remove some doubt as to how this might
work:

  if repo is bare:
    if value == "always":
       return ALLOWED
    if value == "never":
       return FORBIDDEN;

    path = get_parent_repo()

    if !path:
       return ALLOWED
    
    if config_file_has_value("{path}/.git/config", "safe.embedded", repo):
       return ALLOWED

    return FORBIDDEN

With this kind of option, we can protect users from these
social engineering attacks while providing an opt-in protection
for scenarios where embedded bare repos are currently being used
(while also not breaking anyone using non-embedded bare repos).

I think Taylor was mentioning something like this in his previous
replies, perhaps even to the previous thread on this topic.

This 'no-embedded' option is something that I could see as a
potential new default, after it has proven itself in a released
version of Git.

There are performance drawbacks to checking the parent path for
a Git repo, which is why it is only done when in "no-embedded"
mode.

I mentioned some other concerns in your PATCH 1 about how we
are now adding the third use of read_very_early_config() and that
we should probably refactor that before adding the third option,
in order to avoid additional performance costs as well as it
being difficult to audit which config options are only checked
from these "protected" config files.

Thanks,
-Stolee

@gitgitgadget-git
Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>>  * die()-ing is necessary if we're trying to flip the default value of
>>>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>>>    more helpful to fail loudly than to silently ignore the bare repo.
>>>
>>>    But in the long term, long after we've flipped the default and users know
>>>    that they need to opt into bare repo discovery, would it be a better UX
>>>    to just silently ignore the bare repo?
>>
>> Would a middle-ground of giving a warning() message help?  Can it be
>> loud and annoying enough to knudge the users to adjust without
>> breaking the functionality?
>
> Personally, when my tool changes its behavior, I would strongly prefer
> it to die than to "change behavior + warn". I'd feel more comfortable
> knowing that the tool did nothing as opposed to doing the wrong thing
> and only being informed after the fact. Also, I sometimes ignore
> warnings ;)

Heh, personally I would try very hard not to change the behaviour
without explicitly asked by the users with configuration or command
line option.  Flipping the default has traditionally been done in
two or three phases.

 (1) We start by giving a loud and annoying warning to those who
     haven't configured and tell them the default *will* change, how
     to keep the current behaviour forever, and how to live in the
     future by adopting the future default early.

 (2) After a while, we flip the default.  Those who haven't
     configured are given a notice that the default has changed, how
     to keep the old behaviour forever, and how to explicitly choose
     the same value as the default to squelch the notice.

 (3) After yet another while, we stop giving the notice.  If we
     omitted (2), here is where we flip the default.

Strictly speaking, we can have (1) in one release and then could
directly jump to (3), but some distros may skip the releases that
has (1), and (2) is an attempt to help users of such distros.

>> Hopefully "git fetch" over ssh:// and file:/// would run the other
>> side with GIT_DIR explicitly set?
>
> Ah, I'll check this and get back to you.
>
>>                                                        I do not yet
>> find these "problems, such as..." so convincing.
>
> What would be a convincing rationale to you? I'll capture that here.

That is a wrong question.  You are the one pushing for castrating
the bare repositories.

> I'm assuming that you already have such an rationale in mind when you
> say that the longer-term default is that "we respect bare repositories
> only if they are the cwd.". I'm also assuming that this rationale is
> something other than embedded bare repos, because "cwd-only" does not
> protect against that.

No, I do not have such a "different" rationale to justify the change
proposed in this patch.  I was saying that the claim "embedded bare
repos are risky", backed by your two examples, did not sound all
that serious a problem.  Presented with a more serious brekage
scenario, it may make the description more convincing.

Thanks.

@gitgitgadget-git
Copy link

On the Git mailing list, Glen Choo wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>>  * die()-ing is necessary if we're trying to flip the default value of
>>>>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>>>>    more helpful to fail loudly than to silently ignore the bare repo.
>>>>
>>>>    But in the long term, long after we've flipped the default and users know
>>>>    that they need to opt into bare repo discovery, would it be a better UX
>>>>    to just silently ignore the bare repo?
>>>
>>> Would a middle-ground of giving a warning() message help?  Can it be
>>> loud and annoying enough to knudge the users to adjust without
>>> breaking the functionality?
>>
>> Personally, when my tool changes its behavior, I would strongly prefer
>> it to die than to "change behavior + warn". I'd feel more comfortable
>> knowing that the tool did nothing as opposed to doing the wrong thing
>> and only being informed after the fact. Also, I sometimes ignore
>> warnings ;)
>
> Heh, personally I would try very hard not to change the behaviour
> without explicitly asked by the users with configuration or command
> line option.  Flipping the default has traditionally been done in
> two or three phases.
>
>  (1) We start by giving a loud and annoying warning to those who
>      haven't configured and tell them the default *will* change, how
>      to keep the current behaviour forever, and how to live in the
>      future by adopting the future default early.
>
>  (2) After a while, we flip the default.  Those who haven't
>      configured are given a notice that the default has changed, how
>      to keep the old behaviour forever, and how to explicitly choose
>      the same value as the default to squelch the notice.
>
>  (3) After yet another while, we stop giving the notice.  If we
>      omitted (2), here is where we flip the default.
>
> Strictly speaking, we can have (1) in one release and then could
> directly jump to (3), but some distros may skip the releases that
> has (1), and (2) is an attempt to help users of such distros.

Ah, that is very helpful. Thanks. It's pretty clear that I misunderstood
what you meant by "giving a warning() message" - the warning() is there
to prepare users in advance of the change; we don't actually want the
warning() in the long term.

For something as disruptive as discovering bare repos, having all of
(1), (2) and (3) sounds appropriate.

>>> Hopefully "git fetch" over ssh:// and file:/// would run the other
>>> side with GIT_DIR explicitly set?
>>
>> Ah, I'll check this and get back to you.
>>
>>>                                                        I do not yet
>>> find these "problems, such as..." so convincing.
>>
>> What would be a convincing rationale to you? I'll capture that here.
>
> That is a wrong question.  You are the one pushing for castrating
> the bare repositories.

Let me clarify in case this wasn't received the way I intended. Earlier
in the thread, you mentioned:

  The longer-term default should be "cwd is allowed, but we do not
  bother going up from object/04 subdirectory of a bare repository",
  [...]

which I took to mean "Junio thinks that, by default, Git should stop
walking up to find a bare repo, and thinks this is better because of
rationale X.", and not, "Junio does not think that the default needs to
change, but is just suggesting a better default than Glen's".

If it is the former, then there is obviously some thought process here
that is worth sharing.

If it the latter, then I'm in favor of taking Stolee's suggestion to
drop "cwd", since nobody else finds it useful enough. (I like the
'simplification' story, but not enough to push "cwd" through, especially
since it does quite little security-wise.)

>> I'm assuming that you already have such an rationale in mind when you
>> say that the longer-term default is that "we respect bare repositories
>> only if they are the cwd.". I'm also assuming that this rationale is
>> something other than embedded bare repos, because "cwd-only" does not
>> protect against that.
>
> No, I do not have such a "different" rationale to justify the change
> proposed in this patch.  I was saying that the claim "embedded bare
> repos are risky", backed by your two examples, did not sound all
> that serious a problem.  Presented with a more serious brekage
> scenario, it may make the description more convincing.

Fair. I'll mull over this.

@gitgitgadget-git
Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

Glen Choo <chooglen@google.com> writes:

> which I took to mean "Junio thinks that, by default, Git should stop
> walking up to find a bare repo, and thinks this is better because of
> rationale X."

The X is "it would not break existing use case too badly, just to
address a 'security' story whose severity is not so clearly
expressed".

> If it the latter, then I'm in favor of taking Stolee's suggestion to
> drop "cwd", since nobody else finds it useful enough. (I like the
> 'simplification' story, but not enough to push "cwd" through, especially
> since it does quite little security-wise.)

As long as you'll be there to answer the angry mob that complain
loudly (and irritatingly enough, the only do so after a release is
made to flip the default), I do not care too much either way ;-).

Thanks.

@chooglen chooglen changed the title config: introduce discovery.bare and protected config config: introduce safe.bareRepository and protected config Jul 14, 2022
chooglen added 2 commits July 14, 2022 11:54
`uploadpack.packObjectsHook` is the only 'protected configuration only'
variable today, but we've noted that `safe.directory` and the upcoming
`safe.bareRepository` should also be 'protected configuration only'. So,
for consistency, we'd like to have a single implementation for protected
configuration.

The primary constraints are:

1. Reading from protected configuration should be fast. Nearly all "git"
   commands inside a bare repository will read both `safe.directory` and
   `safe.bareRepository`, so we cannot afford to be slow.

2. Protected configuration must be readable when the gitdir is not
   known. `safe.directory` and `safe.bareRepository` both affect
   repository discovery and the gitdir is not known at that point [1].

The chosen implementation in this commit is to read protected
configuration and cache the values in a global configset. This is
similar to the caching behavior we get with the_repository->config.

Introduce git_protected_config(), which reads protected configuration
and caches them in the global configset protected_config. Then, refactor
`uploadpack.packObjectsHook` to use git_protected_config().

The protected configuration functions are named similarly to their
non-protected counterparts, e.g. git_protected_config_check_init() vs
git_config_check_init().

In light of constraint 1, this implementation can still be improved.
git_protected_config() iterates through every variable in
protected_config, which is wasteful, but it makes the conversion simple
because it matches existing patterns. We will likely implement constant
time lookup functions for protected configuration in a future series
(such functions already exist for non-protected configuration, i.e.
repo_config_get_*()).

An alternative that avoids introducing another configset is to continue
to read all config using git_config(), but only accept values that have
the correct config scope [2]. This technically fulfills constraint 2,
because git_config() simply ignores the local and worktree config when
the gitdir is not known. However, this would read incomplete config into
the_repository->config, which would need to be reset when the gitdir is
known and git_config() needs to read the local and worktree config.
Resetting the_repository->config might be reasonable while we only have
these 'protected configuration only' variables, but it's not clear
whether this extends well to future variables.

[1] In this case, we do have a candidate gitdir though, so with a little
refactoring, it might be possible to provide a gitdir.
[2] This is how `uploadpack.packObjectsHook` was implemented prior to
this commit.

Signed-off-by: Glen Choo <chooglen@google.com>
Use git_protected_config() to read `safe.directory` instead of
read_very_early_config(), making it 'protected configuration only'.

As a result, `safe.directory` now respects "-c", so update the tests and
docs accordingly. It used to ignore "-c" due to how it was implemented,
not because of security or correctness concerns [1].

[1] https://lore.kernel.org/git/xmqqlevabcsu.fsf@gitster.g/

Signed-off-by: Glen Choo <chooglen@google.com>
@chooglen chooglen force-pushed the setup/disable-bare-repo-config branch 3 times, most recently from 9bfe369 to e6f654e Compare July 14, 2022 20:29
There is a known social engineering attack that takes advantage of the
fact that a working tree can include an entire bare repository,
including a config file. A user could run a Git command inside the bare
repository thinking that the config file of the 'outer' repository would
be used, but in reality, the bare repository's config file (which is
attacker-controlled) is used, which may result in arbitrary code
execution. See [1] for a fuller description and deeper discussion.

A simple mitigation is to forbid bare repositories unless specified via
`--git-dir` or `GIT_DIR`. In environments that don't use bare
repositories, this would be minimally disruptive.

Create a config variable, `safe.bareRepository`, that tells Git whether
or not to die() when working with a bare repository. This config is an
enum of:

- "all": allow all bare repositories (this is the default)
- "explicit": only allow bare repositories specified via --git-dir
  or GIT_DIR.

If we want to protect users from such attacks by default, neither value
will suffice - "all" provides no protection, but "explicit" is
impractical for bare repository users. A more usable default would be to
allow only non-embedded bare repositories ([2] contains one such
proposal), but detecting if a repository is embedded is potentially
non-trivial, so this work is not implemented in this series.

[1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
[2]: https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@github.com

Signed-off-by: Glen Choo <chooglen@google.com>
@chooglen chooglen force-pushed the setup/disable-bare-repo-config branch from e6f654e to 50069bb Compare July 14, 2022 20:36
@SirJosh1987
Copy link

So clearly me having no computer is an issue

@chooglen
Copy link
Contributor Author

/submit

@gitgitgadget-git
Copy link

Submitted as pull.1261.v8.git.git.1657834081.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1261/chooglen/setup/disable-bare-repo-config-v8

To fetch this version to local tag pr-git-1261/chooglen/setup/disable-bare-repo-config-v8:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1261/chooglen/setup/disable-bare-repo-config-v8

@gitgitgadget-git
Copy link

This patch series was integrated into seen via ff7938a.

@gitgitgadget-git
Copy link

This patch series was integrated into seen via 7802e43.

@gitgitgadget-git
Copy link

This patch series was integrated into next via 5206577.

@gitgitgadget-git
Copy link

This patch series was integrated into seen via b247963.

@gitgitgadget-git
Copy link

This patch series was integrated into seen via c5f32de.

@gitgitgadget-git
Copy link

This patch series was integrated into seen via 5a25a74.

@gitgitgadget-git
Copy link

There was a status update in the "Cooking" section about the branch gc/bare-repo-discovery on the Git mailing list:

Introduce a discovery.barerepository configuration variable that
allows users to forbid discovery of bare repositories.

Will merge to 'master'.
source: <pull.1261.v8.git.git.1657834081.gitgitgadget@gmail.com>

@gitgitgadget-git
Copy link

This patch series was integrated into seen via 6efbdba.

@gitgitgadget-git
Copy link

This patch series was integrated into seen via ae5c21c.

@gitgitgadget-git
Copy link

This patch series was integrated into seen via df367c2.

@gitgitgadget-git
Copy link

This patch series was integrated into seen via 18bbc79.

@gitgitgadget-git
Copy link

This patch series was integrated into master via 18bbc79.

@gitgitgadget-git
Copy link

This patch series was integrated into next via 18bbc79.

@gitgitgadget-git
Copy link

Closed via 18bbc79.

@gitgitgadget-git gitgitgadget-git bot closed this Jul 22, 2022
@@ -81,6 +81,17 @@ static enum config_scope current_parsing_scope;
static int pack_compression_seen;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Jul 14 2022, Glen Choo via GitGitGadget wrote:

> From: Glen Choo <chooglen@google.com>
>
> `uploadpack.packObjectsHook` is the only 'protected configuration only'
> variable today, but we've noted that `safe.directory` and the upcoming
> `safe.bareRepository` should also be 'protected configuration only'. So,
> for consistency, we'd like to have a single implementation for protected
> configuration.
>
> The primary constraints are:
>
> 1. Reading from protected configuration should be fast. Nearly all "git"
>    commands inside a bare repository will read both `safe.directory` and
>    `safe.bareRepository`, so we cannot afford to be slow.
>
> 2. Protected configuration must be readable when the gitdir is not
>    known. `safe.directory` and `safe.bareRepository` both affect
>    repository discovery and the gitdir is not known at that point [1].
>
> The chosen implementation in this commit is to read protected
> configuration and cache the values in a global configset. This is
> similar to the caching behavior we get with the_repository->config.
>
> Introduce git_protected_config(), which reads protected configuration
> and caches them in the global configset protected_config. Then, refactor
> `uploadpack.packObjectsHook` to use git_protected_config().
>
> The protected configuration functions are named similarly to their
> non-protected counterparts, e.g. git_protected_config_check_init() vs
> git_config_check_init().
>
> In light of constraint 1, this implementation can still be improved.
> git_protected_config() iterates through every variable in
> protected_config, which is wasteful, but it makes the conversion simple
> because it matches existing patterns. We will likely implement constant
> time lookup functions for protected configuration in a future series
> (such functions already exist for non-protected configuration, i.e.
> repo_config_get_*()).
>
> An alternative that avoids introducing another configset is to continue
> to read all config using git_config(), but only accept values that have
> the correct config scope [2]. This technically fulfills constraint 2,
> because git_config() simply ignores the local and worktree config when
> the gitdir is not known. However, this would read incomplete config into
> the_repository->config, which would need to be reset when the gitdir is
> known and git_config() needs to read the local and worktree config.
> Resetting the_repository->config might be reasonable while we only have
> these 'protected configuration only' variables, but it's not clear
> whether this extends well to future variables.
>
> [1] In this case, we do have a candidate gitdir though, so with a little
> refactoring, it might be possible to provide a gitdir.
> [2] This is how `uploadpack.packObjectsHook` was implemented prior to
> this commit.
>
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
>  config.c                     | 43 ++++++++++++++++++++++++++++++++++++
>  config.h                     | 16 ++++++++++++++
>  t/t5544-pack-objects-hook.sh |  7 +++++-
>  upload-pack.c                | 27 +++++++++++++---------
>  4 files changed, 82 insertions(+), 11 deletions(-)
>
> diff --git a/config.c b/config.c
> index 9b0e9c93285..015bec360f5 100644
> --- a/config.c
> +++ b/config.c
> @@ -81,6 +81,17 @@ static enum config_scope current_parsing_scope;
>  static int pack_compression_seen;
>  static int zlib_compression_seen;
>  
> +/*
> + * Config that comes from trusted scopes, namely:
> + * - CONFIG_SCOPE_SYSTEM (e.g. /etc/gitconfig)
> + * - CONFIG_SCOPE_GLOBAL (e.g. $HOME/.gitconfig, $XDG_CONFIG_HOME/git)
> + * - CONFIG_SCOPE_COMMAND (e.g. "-c" option, environment variables)
> + *
> + * This is declared here for code cleanliness, but unlike the other
> + * static variables, this does not hold config parser state.
> + */
> +static struct config_set protected_config;
> +
>  static int config_file_fgetc(struct config_source *conf)
>  {
>  	return getc_unlocked(conf->u.file);
> @@ -2378,6 +2389,11 @@ int git_configset_add_file(struct config_set *cs, const char *filename)
>  	return git_config_from_file(config_set_callback, filename, cs);
>  }
>  
> +int git_configset_add_parameters(struct config_set *cs)
> +{
> +	return git_config_from_parameters(config_set_callback, cs);
> +}
> +
>  int git_configset_get_value(struct config_set *cs, const char *key, const char **value)
>  {
>  	const struct string_list *values = NULL;
> @@ -2619,6 +2635,33 @@ int repo_config_get_pathname(struct repository *repo,
>  	return ret;
>  }
>  
> +/* Read values into protected_config. */
> +static void read_protected_config(void)
> +{
> +	char *xdg_config = NULL, *user_config = NULL, *system_config = NULL;
> +
> +	git_configset_init(&protected_config);
> +
> +	system_config = git_system_config();
> +	git_global_config(&user_config, &xdg_config);
> +
> +	git_configset_add_file(&protected_config, system_config);
> +	git_configset_add_file(&protected_config, xdg_config);
> +	git_configset_add_file(&protected_config, user_config);
> +	git_configset_add_parameters(&protected_config);
> +
> +	free(system_config);
> +	free(xdg_config);
> +	free(user_config);
> +}
> +
> +void git_protected_config(config_fn_t fn, void *data)
> +{
> +	if (!protected_config.hash_initialized)
> +		read_protected_config();
> +	configset_iter(&protected_config, fn, data);
> +}
> +
>  /* Functions used historically to read configuration from 'the_repository' */
>  void git_config(config_fn_t fn, void *data)
>  {
> diff --git a/config.h b/config.h
> index 7654f61c634..ca994d77147 100644
> --- a/config.h
> +++ b/config.h
> @@ -446,6 +446,15 @@ void git_configset_init(struct config_set *cs);
>   */
>  int git_configset_add_file(struct config_set *cs, const char *filename);
>  
> +/**
> + * Parses command line options and environment variables, and adds the
> + * variable-value pairs to the `config_set`. Returns 0 on success, or -1
> + * if there is an error in parsing. The caller decides whether to free
> + * the incomplete configset or continue using it when the function
> + * returns -1.
> + */
> +int git_configset_add_parameters(struct config_set *cs);
> +
>  /**
>   * Finds and returns the value list, sorted in order of increasing priority
>   * for the configuration variable `key` and config set `cs`. When the
> @@ -505,6 +514,13 @@ int repo_config_get_maybe_bool(struct repository *repo,
>  int repo_config_get_pathname(struct repository *repo,
>  			     const char *key, const char **dest);
>  
> +/*
> + * Functions for reading protected config. By definition, protected
> + * config ignores repository config, so these do not take a `struct
> + * repository` parameter.
> + */
> +void git_protected_config(config_fn_t fn, void *data);
> +
>  /**
>   * Querying For Specific Variables
>   * -------------------------------
> diff --git a/t/t5544-pack-objects-hook.sh b/t/t5544-pack-objects-hook.sh
> index dd5f44d986f..54f54f8d2eb 100755
> --- a/t/t5544-pack-objects-hook.sh
> +++ b/t/t5544-pack-objects-hook.sh
> @@ -56,7 +56,12 @@ test_expect_success 'hook does not run from repo config' '
>  	! grep "hook running" stderr &&
>  	test_path_is_missing .git/hook.args &&
>  	test_path_is_missing .git/hook.stdin &&
> -	test_path_is_missing .git/hook.stdout
> +	test_path_is_missing .git/hook.stdout &&
> +
> +	# check that global config is used instead
> +	test_config_global uploadpack.packObjectsHook ./hook &&
> +	git clone --no-local . dst2.git 2>stderr &&
> +	grep "hook running" stderr
>  '
>  
>  test_expect_success 'hook works with partial clone' '
> diff --git a/upload-pack.c b/upload-pack.c
> index 3a851b36066..09f48317b02 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -1321,18 +1321,27 @@ static int upload_pack_config(const char *var, const char *value, void *cb_data)
>  		data->advertise_sid = git_config_bool(var, value);
>  	}
>  
> -	if (current_config_scope() != CONFIG_SCOPE_LOCAL &&
> -	    current_config_scope() != CONFIG_SCOPE_WORKTREE) {
> -		if (!strcmp("uploadpack.packobjectshook", var))
> -			return git_config_string(&data->pack_objects_hook, var, value);
> -	}
> -
>  	if (parse_object_filter_config(var, value, data) < 0)
>  		return -1;
>  
>  	return parse_hide_refs_config(var, value, "uploadpack");
>  }
>  
> +static int upload_pack_protected_config(const char *var, const char *value, void *cb_data)
> +{
> +	struct upload_pack_data *data = cb_data;
> +
> +	if (!strcmp("uploadpack.packobjectshook", var))
> +		return git_config_string(&data->pack_objects_hook, var, value);
> +	return 0;
> +}
> +
> +static void get_upload_pack_config(struct upload_pack_data *data)
> +{
> +	git_config(upload_pack_config, data);
> +	git_protected_config(upload_pack_protected_config, data);
> +}
> +
>  void upload_pack(const int advertise_refs, const int stateless_rpc,
>  		 const int timeout)
>  {
> @@ -1340,8 +1349,7 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
>  	struct upload_pack_data data;
>  
>  	upload_pack_data_init(&data);
> -
> -	git_config(upload_pack_config, &data);
> +	get_upload_pack_config(&data);
>  
>  	data.stateless_rpc = stateless_rpc;
>  	data.timeout = timeout;
> @@ -1695,8 +1703,7 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request)
>  
>  	upload_pack_data_init(&data);
>  	data.use_sideband = LARGE_PACKET_MAX;
> -
> -	git_config(upload_pack_config, &data);
> +	get_upload_pack_config(&data);
>  
>  	while (state != FETCH_DONE) {
>  		switch (state) {

Noticed after it landed on master: This change fails with:

	make SANITIZE=address test T=t0410*.sh

Running that manually shows that we fail like this:
	
	$ cat trash\ directory.t0410-partial-clone/httpd/error.log | grep -o AH0.*
	AH00163: Apache/2.4.54 (Debian) configured -- resuming normal operations
	AH00094: Command line: '/usr/sbin/apache2 -d /home/avar/g/git/t/trash directory.t0410-partial-clone/httpd -f /home/avar/g/git/t/lib-httpd/apache.conf -c Listen 127.0.0.1:10410'
	AH01215: AddressSanitizer:DEADLYSIGNAL: /home/avar/g/git/git-http-backend
	AH01215: =================================================================: /home/avar/g/git/git-http-backend
	AH01215: ==27820==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f7af5dc0d66 bp 0x7fff11964450 sp 0x7fff11963be8 T0): /home/avar/g/git/git-http-backend
	AH01215: ==27820==The signal is caused by a READ memory access.: /home/avar/g/git/git-http-backend
	AH01215: ==27820==Hint: address points to the zero page.: /home/avar/g/git/git-http-backend
	AH01215:     #0 0x7f7af5dc0d66 in __sanitizer::internal_strlen(char const*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167: /home/avar/g/git/git-http-backend
	AH01215:     #1 0x7f7af5d512f2 in __interceptor_fopen64 ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:6220: /home/avar/g/git/git-http-backend
	AH01215:     #2 0x562a65e37cc8 in git_fopen compat/fopen.c:22: /home/avar/g/git/git-http-backend
	AH01215:     #3 0x562a65df3879 in fopen_or_warn wrapper.c:431: /home/avar/g/git/git-http-backend
	AH01215:     #4 0x562a65a12476 in git_config_from_file_with_options config.c:1982: /home/avar/g/git/git-http-backend
	AH01215:     #5 0x562a65a124f4 in git_config_from_file config.c:1993: /home/avar/g/git/git-http-backend
	AH01215:     #6 0x562a65a15288 in git_configset_add_file config.c:2389: /home/avar/g/git/git-http-backend
	AH01215:     #7 0x562a65a16a37 in read_protected_config config.c:2649: /home/avar/g/git/git-http-backend
	AH01215:     #8 0x562a65a16b5c in git_protected_config config.c:2661: /home/avar/g/git/git-http-backend
	AH01215:     #9 0x562a65dd9f9a in get_upload_pack_config upload-pack.c:1342: /home/avar/g/git/git-http-backend
	AH01215:     #10 0x562a65ddc1cb in upload_pack_v2 upload-pack.c:1706: /home/avar/g/git/git-http-backend
	AH01215:     #11 0x562a65d2eb8a in process_request serve.c:308: /home/avar/g/git/git-http-backend
	AH01215:     #12 0x562a65d2ec18 in protocol_v2_serve_loop serve.c:323: /home/avar/g/git/git-http-backend
	AH01215:     #13 0x562a6593c5ae in cmd_upload_pack builtin/upload-pack.c:55: /home/avar/g/git/git-http-backend
	AH01215:     #14 0x562a656cf8ff in run_builtin git.c:466: /home/avar/g/git/git-http-backend
	AH01215:     #15 0x562a656d02ab in handle_builtin git.c:720: /home/avar/g/git/git-http-backend
	AH01215:     #16 0x562a656d09d5 in run_argv git.c:787: /home/avar/g/git/git-http-backend
	AH01215:     #17 0x562a656d174f in cmd_main git.c:920: /home/avar/g/git/git-http-backend
	AH01215:     #18 0x562a6594b0b9 in main common-main.c:56: /home/avar/g/git/git-http-backend
	AH01215:     #19 0x7f7af5a5681c in __libc_start_main ../csu/libc-start.c:332: /home/avar/g/git/git-http-backend
	AH01215:     #20 0x562a656cb209 in _start (git+0x1d1209): /home/avar/g/git/git-http-backend
	AH01215: : /home/avar/g/git/git-http-backend
	AH01215: AddressSanitizer can not provide additional info.: /home/avar/g/git/git-http-backend
	AH01215: SUMMARY: AddressSanitizer: SEGV ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167 in __sanitizer::internal_strlen(char const*): /home/avar/g/git/git-http-backend
	AH01215: ==27820==ABORTING: /home/avar/g/git/git-http-backend
	AH01215: error: upload-pack died of signal 6: /home/avar/g/git/git-http-backend

(We really should have a SANITIZE=address in CI, but it takes a while...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Jul 14 2022, Glen Choo via GitGitGadget wrote:
>
>> +/* Read values into protected_config. */
>> +static void read_protected_config(void)
>> +{
>> +	char *xdg_config = NULL, *user_config = NULL, *system_config = NULL;
>> +
>> +	git_configset_init(&protected_config);
>> +
>> +	system_config = git_system_config();
>> +	git_global_config(&user_config, &xdg_config);
>> +
>> +	git_configset_add_file(&protected_config, system_config);
>> +	git_configset_add_file(&protected_config, xdg_config);
>> +	git_configset_add_file(&protected_config, user_config);
>> +	git_configset_add_parameters(&protected_config);
>> +
>> +	free(system_config);
>> +	free(xdg_config);
>> +	free(user_config);
>> +}
>
> Noticed after it landed on master: This change fails with:
>
> 	make SANITIZE=address test T=t0410*.sh
>
> Running that manually shows that we fail like this:
> 	
> 	$ cat trash\ directory.t0410-partial-clone/httpd/error.log | grep -o AH0.*
> 	AH00163: Apache/2.4.54 (Debian) configured -- resuming normal operations
> 	AH00094: Command line: '/usr/sbin/apache2 -d /home/avar/g/git/t/trash directory.t0410-partial-clone/httpd -f /home/avar/g/git/t/lib-httpd/apache.conf -c Listen 127.0.0.1:10410'
> 	AH01215: AddressSanitizer:DEADLYSIGNAL: /home/avar/g/git/git-http-backend
> 	AH01215: =================================================================: /home/avar/g/git/git-http-backend
> 	AH01215: ==27820==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f7af5dc0d66 bp 0x7fff11964450 sp 0x7fff11963be8 T0): /home/avar/g/git/git-http-backend
> 	AH01215: ==27820==The signal is caused by a READ memory access.: /home/avar/g/git/git-http-backend
> 	AH01215: ==27820==Hint: address points to the zero page.: /home/avar/g/git/git-http-backend
> 	AH01215:     #0 0x7f7af5dc0d66 in __sanitizer::internal_strlen(char const*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167: /home/avar/g/git/git-http-backend
> 	AH01215:     #1 0x7f7af5d512f2 in __interceptor_fopen64 ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:6220: /home/avar/g/git/git-http-backend
> 	AH01215:     #2 0x562a65e37cc8 in git_fopen compat/fopen.c:22: /home/avar/g/git/git-http-backend
> 	AH01215:     #3 0x562a65df3879 in fopen_or_warn wrapper.c:431: /home/avar/g/git/git-http-backend
> 	AH01215:     #4 0x562a65a12476 in git_config_from_file_with_options config.c:1982: /home/avar/g/git/git-http-backend
> 	AH01215:     #5 0x562a65a124f4 in git_config_from_file config.c:1993: /home/avar/g/git/git-http-backend
> 	AH01215:     #6 0x562a65a15288 in git_configset_add_file config.c:2389: /home/avar/g/git/git-http-backend
> 	AH01215:     #7 0x562a65a16a37 in read_protected_config config.c:2649: /home/avar/g/git/git-http-backend
> 	AH01215:     #8 0x562a65a16b5c in git_protected_config config.c:2661: /home/avar/g/git/git-http-backend
> 	AH01215:     #9 0x562a65dd9f9a in get_upload_pack_config upload-pack.c:1342: /home/avar/g/git/git-http-backend
> 	AH01215:     #10 0x562a65ddc1cb in upload_pack_v2 upload-pack.c:1706: /home/avar/g/git/git-http-backend
> 	AH01215:     #11 0x562a65d2eb8a in process_request serve.c:308: /home/avar/g/git/git-http-backend
> 	AH01215:     #12 0x562a65d2ec18 in protocol_v2_serve_loop serve.c:323: /home/avar/g/git/git-http-backend
> 	AH01215:     #13 0x562a6593c5ae in cmd_upload_pack builtin/upload-pack.c:55: /home/avar/g/git/git-http-backend
> 	AH01215:     #14 0x562a656cf8ff in run_builtin git.c:466: /home/avar/g/git/git-http-backend
> 	AH01215:     #15 0x562a656d02ab in handle_builtin git.c:720: /home/avar/g/git/git-http-backend
> 	AH01215:     #16 0x562a656d09d5 in run_argv git.c:787: /home/avar/g/git/git-http-backend
> 	AH01215:     #17 0x562a656d174f in cmd_main git.c:920: /home/avar/g/git/git-http-backend
> 	AH01215:     #18 0x562a6594b0b9 in main common-main.c:56: /home/avar/g/git/git-http-backend
> 	AH01215:     #19 0x7f7af5a5681c in __libc_start_main ../csu/libc-start.c:332: /home/avar/g/git/git-http-backend
> 	AH01215:     #20 0x562a656cb209 in _start (git+0x1d1209): /home/avar/g/git/git-http-backend
> 	AH01215: : /home/avar/g/git/git-http-backend
> 	AH01215: AddressSanitizer can not provide additional info.: /home/avar/g/git/git-http-backend
> 	AH01215: SUMMARY: AddressSanitizer: SEGV ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167 in __sanitizer::internal_strlen(char const*): /home/avar/g/git/git-http-backend
> 	AH01215: ==27820==ABORTING: /home/avar/g/git/git-http-backend
> 	AH01215: error: upload-pack died of signal 6: /home/avar/g/git/git-http-backend
>
> (We really should have a SANITIZE=address in CI, but it takes a while...)

Thanks. I narrowed the failure down to the hunk above, specifically this
line:

  git_configset_add_file(&protected_config, xdg_config);

Since xdg_config can be NULL, this results in the failing call
fopen_or_warn(NULL, "r").

This logic was lifted  from do_git_config_sequence(), which checks that
each of the paths are not NULL. So a fix might be something like:

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

  diff --git a/config.c b/config.c
  index 015bec360f..208a3dd7a7 100644
  --- a/config.c
  +++ b/config.c
  @@ -2645,9 +2645,13 @@ static void read_protected_config(void)
    system_config = git_system_config();
    git_global_config(&user_config, &xdg_config);

  -	git_configset_add_file(&protected_config, system_config);
  -	git_configset_add_file(&protected_config, xdg_config);
  -	git_configset_add_file(&protected_config, user_config);
  +
  +	if (system_config)
  +		git_configset_add_file(&protected_config, system_config);
  +	if (xdg_config)
  +		git_configset_add_file(&protected_config, xdg_config);
  +	if (user_config)
  +		git_configset_add_file(&protected_config, user_config);
    git_configset_add_parameters(&protected_config);

    free(system_config);

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

I'm not sure if system_config can ever be NULL, but (xdg|user)_config is
NULL when $HOME is unset, and xdg_config is also unset if
$GIT_CONFIG_GLOBAL is set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Jul 25 2022, Glen Choo wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Thu, Jul 14 2022, Glen Choo via GitGitGadget wrote:
>>
>>> +/* Read values into protected_config. */
>>> +static void read_protected_config(void)
>>> +{
>>> +	char *xdg_config = NULL, *user_config = NULL, *system_config = NULL;
>>> +
>>> +	git_configset_init(&protected_config);
>>> +
>>> +	system_config = git_system_config();
>>> +	git_global_config(&user_config, &xdg_config);
>>> +
>>> +	git_configset_add_file(&protected_config, system_config);
>>> +	git_configset_add_file(&protected_config, xdg_config);
>>> +	git_configset_add_file(&protected_config, user_config);
>>> +	git_configset_add_parameters(&protected_config);
>>> +
>>> +	free(system_config);
>>> +	free(xdg_config);
>>> +	free(user_config);
>>> +}
>>
>> Noticed after it landed on master: This change fails with:
>>
>> 	make SANITIZE=address test T=t0410*.sh
>>
>> Running that manually shows that we fail like this:
>> 	
>> 	$ cat trash\ directory.t0410-partial-clone/httpd/error.log | grep -o AH0.*
>> 	AH00163: Apache/2.4.54 (Debian) configured -- resuming normal operations
>> 	AH00094: Command line: '/usr/sbin/apache2 -d /home/avar/g/git/t/trash directory.t0410-partial-clone/httpd -f /home/avar/g/git/t/lib-httpd/apache.conf -c Listen 127.0.0.1:10410'
>> 	AH01215: AddressSanitizer:DEADLYSIGNAL: /home/avar/g/git/git-http-backend
>> 	AH01215: =================================================================: /home/avar/g/git/git-http-backend
>> 	AH01215: ==27820==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f7af5dc0d66 bp 0x7fff11964450 sp 0x7fff11963be8 T0): /home/avar/g/git/git-http-backend
>> 	AH01215: ==27820==The signal is caused by a READ memory access.: /home/avar/g/git/git-http-backend
>> 	AH01215: ==27820==Hint: address points to the zero page.: /home/avar/g/git/git-http-backend
>> 	AH01215:     #0 0x7f7af5dc0d66 in __sanitizer::internal_strlen(char const*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167: /home/avar/g/git/git-http-backend
>> 	AH01215:     #1 0x7f7af5d512f2 in __interceptor_fopen64 ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:6220: /home/avar/g/git/git-http-backend
>> 	AH01215:     #2 0x562a65e37cc8 in git_fopen compat/fopen.c:22: /home/avar/g/git/git-http-backend
>> 	AH01215:     #3 0x562a65df3879 in fopen_or_warn wrapper.c:431: /home/avar/g/git/git-http-backend
>> 	AH01215:     #4 0x562a65a12476 in git_config_from_file_with_options config.c:1982: /home/avar/g/git/git-http-backend
>> 	AH01215:     #5 0x562a65a124f4 in git_config_from_file config.c:1993: /home/avar/g/git/git-http-backend
>> 	AH01215:     #6 0x562a65a15288 in git_configset_add_file config.c:2389: /home/avar/g/git/git-http-backend
>> 	AH01215:     #7 0x562a65a16a37 in read_protected_config config.c:2649: /home/avar/g/git/git-http-backend
>> 	AH01215:     #8 0x562a65a16b5c in git_protected_config config.c:2661: /home/avar/g/git/git-http-backend
>> 	AH01215:     #9 0x562a65dd9f9a in get_upload_pack_config upload-pack.c:1342: /home/avar/g/git/git-http-backend
>> 	AH01215:     #10 0x562a65ddc1cb in upload_pack_v2 upload-pack.c:1706: /home/avar/g/git/git-http-backend
>> 	AH01215:     #11 0x562a65d2eb8a in process_request serve.c:308: /home/avar/g/git/git-http-backend
>> 	AH01215:     #12 0x562a65d2ec18 in protocol_v2_serve_loop serve.c:323: /home/avar/g/git/git-http-backend
>> 	AH01215:     #13 0x562a6593c5ae in cmd_upload_pack builtin/upload-pack.c:55: /home/avar/g/git/git-http-backend
>> 	AH01215:     #14 0x562a656cf8ff in run_builtin git.c:466: /home/avar/g/git/git-http-backend
>> 	AH01215:     #15 0x562a656d02ab in handle_builtin git.c:720: /home/avar/g/git/git-http-backend
>> 	AH01215:     #16 0x562a656d09d5 in run_argv git.c:787: /home/avar/g/git/git-http-backend
>> 	AH01215:     #17 0x562a656d174f in cmd_main git.c:920: /home/avar/g/git/git-http-backend
>> 	AH01215:     #18 0x562a6594b0b9 in main common-main.c:56: /home/avar/g/git/git-http-backend
>> 	AH01215:     #19 0x7f7af5a5681c in __libc_start_main ../csu/libc-start.c:332: /home/avar/g/git/git-http-backend
>> 	AH01215:     #20 0x562a656cb209 in _start (git+0x1d1209): /home/avar/g/git/git-http-backend
>> 	AH01215: : /home/avar/g/git/git-http-backend
>> 	AH01215: AddressSanitizer can not provide additional info.: /home/avar/g/git/git-http-backend
>> 	AH01215: SUMMARY: AddressSanitizer: SEGV
>> ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167
>> in __sanitizer::internal_strlen(char const*):
>> /home/avar/g/git/git-http-backend
>> 	AH01215: ==27820==ABORTING: /home/avar/g/git/git-http-backend
>> 	AH01215: error: upload-pack died of signal 6: /home/avar/g/git/git-http-backend
>>
>> (We really should have a SANITIZE=address in CI, but it takes a while...)
>
> Thanks. I narrowed the failure down to the hunk above, specifically this
> line:
>
>   git_configset_add_file(&protected_config, xdg_config);
>
> Since xdg_config can be NULL, this results in the failing call
> fopen_or_warn(NULL, "r").
>
> This logic was lifted  from do_git_config_sequence(), which checks that
> each of the paths are not NULL. So a fix might be something like:
>
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
>
>   diff --git a/config.c b/config.c
>   index 015bec360f..208a3dd7a7 100644
>   --- a/config.c
>   +++ b/config.c
>   @@ -2645,9 +2645,13 @@ static void read_protected_config(void)
>     system_config = git_system_config();
>     git_global_config(&user_config, &xdg_config);
>
>   -	git_configset_add_file(&protected_config, system_config);
>   -	git_configset_add_file(&protected_config, xdg_config);
>   -	git_configset_add_file(&protected_config, user_config);
>   +
>   +	if (system_config)
>   +		git_configset_add_file(&protected_config, system_config);
>   +	if (xdg_config)
>   +		git_configset_add_file(&protected_config, xdg_config);
>   +	if (user_config)
>   +		git_configset_add_file(&protected_config, user_config);
>     git_configset_add_parameters(&protected_config);
>
>     free(system_config);
>
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
>
> I'm not sure if system_config can ever be NULL, but (xdg|user)_config is
> NULL when $HOME is unset, and xdg_config is also unset if
> $GIT_CONFIG_GLOBAL is set.

Not having looked into it much at all: Doesn't this then introduce
another logic error where git_protected_config() is now buggy, i.e. it's
a "lazy load" method where we'll expect to read_protected_config()
first.

The assumption with that seems to have been that it's invariant within a
single process, is that still the case, or can e.g. HOME be set during
our runtime when we rely on these functions?

(I don't know)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Mon, Jul 25 2022, Glen Choo wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>> On Thu, Jul 14 2022, Glen Choo via GitGitGadget wrote:
>>>
>>>> +/* Read values into protected_config. */
>>>> +static void read_protected_config(void)
>>>> +{
>>>> +	char *xdg_config = NULL, *user_config = NULL, *system_config = NULL;
>>>> +
>>>> +	git_configset_init(&protected_config);
>>>> +
>>>> +	system_config = git_system_config();
>>>> +	git_global_config(&user_config, &xdg_config);
>>>> +
>>>> +	git_configset_add_file(&protected_config, system_config);
>>>> +	git_configset_add_file(&protected_config, xdg_config);
>>>> +	git_configset_add_file(&protected_config, user_config);
>>>> +	git_configset_add_parameters(&protected_config);
>>>> +
>>>> +	free(system_config);
>>>> +	free(xdg_config);
>>>> +	free(user_config);
>>>> +}
>>>
>>> Noticed after it landed on master: This change fails with:
>>>
>>> 	make SANITIZE=address test T=t0410*.sh
>>>
>>> Running that manually shows that we fail like this:
>>> 	
>>> 	$ cat trash\ directory.t0410-partial-clone/httpd/error.log | grep -o AH0.*
>>> 	AH00163: Apache/2.4.54 (Debian) configured -- resuming normal operations
>>> 	AH00094: Command line: '/usr/sbin/apache2 -d /home/avar/g/git/t/trash directory.t0410-partial-clone/httpd -f /home/avar/g/git/t/lib-httpd/apache.conf -c Listen 127.0.0.1:10410'
>>> 	AH01215: AddressSanitizer:DEADLYSIGNAL: /home/avar/g/git/git-http-backend
>>> 	AH01215: =================================================================: /home/avar/g/git/git-http-backend
>>> 	AH01215: ==27820==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f7af5dc0d66 bp 0x7fff11964450 sp 0x7fff11963be8 T0): /home/avar/g/git/git-http-backend
>>> 	AH01215: ==27820==The signal is caused by a READ memory access.: /home/avar/g/git/git-http-backend
>>> 	AH01215: ==27820==Hint: address points to the zero page.: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #0 0x7f7af5dc0d66 in __sanitizer::internal_strlen(char const*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #1 0x7f7af5d512f2 in __interceptor_fopen64 ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:6220: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #2 0x562a65e37cc8 in git_fopen compat/fopen.c:22: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #3 0x562a65df3879 in fopen_or_warn wrapper.c:431: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #4 0x562a65a12476 in git_config_from_file_with_options config.c:1982: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #5 0x562a65a124f4 in git_config_from_file config.c:1993: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #6 0x562a65a15288 in git_configset_add_file config.c:2389: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #7 0x562a65a16a37 in read_protected_config config.c:2649: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #8 0x562a65a16b5c in git_protected_config config.c:2661: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #9 0x562a65dd9f9a in get_upload_pack_config upload-pack.c:1342: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #10 0x562a65ddc1cb in upload_pack_v2 upload-pack.c:1706: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #11 0x562a65d2eb8a in process_request serve.c:308: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #12 0x562a65d2ec18 in protocol_v2_serve_loop serve.c:323: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #13 0x562a6593c5ae in cmd_upload_pack builtin/upload-pack.c:55: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #14 0x562a656cf8ff in run_builtin git.c:466: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #15 0x562a656d02ab in handle_builtin git.c:720: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #16 0x562a656d09d5 in run_argv git.c:787: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #17 0x562a656d174f in cmd_main git.c:920: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #18 0x562a6594b0b9 in main common-main.c:56: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #19 0x7f7af5a5681c in __libc_start_main ../csu/libc-start.c:332: /home/avar/g/git/git-http-backend
>>> 	AH01215:     #20 0x562a656cb209 in _start (git+0x1d1209): /home/avar/g/git/git-http-backend
>>> 	AH01215: : /home/avar/g/git/git-http-backend
>>> 	AH01215: AddressSanitizer can not provide additional info.: /home/avar/g/git/git-http-backend
>>> 	AH01215: SUMMARY: AddressSanitizer: SEGV
>>> ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167
>>> in __sanitizer::internal_strlen(char const*):
>>> /home/avar/g/git/git-http-backend
>>> 	AH01215: ==27820==ABORTING: /home/avar/g/git/git-http-backend
>>> 	AH01215: error: upload-pack died of signal 6: /home/avar/g/git/git-http-backend
>>>
>>> (We really should have a SANITIZE=address in CI, but it takes a while...)
>>
>> Thanks. I narrowed the failure down to the hunk above, specifically this
>> line:
>>
>>   git_configset_add_file(&protected_config, xdg_config);
>>
>> Since xdg_config can be NULL, this results in the failing call
>> fopen_or_warn(NULL, "r").
>>
>> This logic was lifted  from do_git_config_sequence(), which checks that
>> each of the paths are not NULL. So a fix might be something like:
>>
>> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
>>
>>   diff --git a/config.c b/config.c
>>   index 015bec360f..208a3dd7a7 100644
>>   --- a/config.c
>>   +++ b/config.c
>>   @@ -2645,9 +2645,13 @@ static void read_protected_config(void)
>>     system_config = git_system_config();
>>     git_global_config(&user_config, &xdg_config);
>>
>>   -	git_configset_add_file(&protected_config, system_config);
>>   -	git_configset_add_file(&protected_config, xdg_config);
>>   -	git_configset_add_file(&protected_config, user_config);
>>   +
>>   +	if (system_config)
>>   +		git_configset_add_file(&protected_config, system_config);
>>   +	if (xdg_config)
>>   +		git_configset_add_file(&protected_config, xdg_config);
>>   +	if (user_config)
>>   +		git_configset_add_file(&protected_config, user_config);
>>     git_configset_add_parameters(&protected_config);
>>
>>     free(system_config);
>>
>> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
>>
>> I'm not sure if system_config can ever be NULL, but (xdg|user)_config is
>> NULL when $HOME is unset, and xdg_config is also unset if
>> $GIT_CONFIG_GLOBAL is set.
>
> Not having looked into it much at all: Doesn't this then introduce
> another logic error where git_protected_config() is now buggy, i.e. it's
> a "lazy load" method where we'll expect to read_protected_config()
> first.
>
> The assumption with that seems to have been that it's invariant within a
> single process, is that still the case, or can e.g. HOME be set during
> our runtime when we rely on these functions?
>
> (I don't know)

I don't think this introduces an error, or at least, not one that we
don't already have. This mimics do_git_config_sequence() (which also
assumes this invariant), which is used under the hood by
(git|repo)_read_config(),

In retrospect, it might have been a good idea to implement
read_protected_config() using do_git_config_sequence() /
config_with_options(); those functions are a bit bloated, but at least
we'd only have one implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants