git.oblomov.eu Git

Merge branch 'jk/t3419-drop-expensive-tests'

Test update.

* jk/t3419-drop-expensive-tests:
t3419: drop EXPENSIVE tests

Merge branch 'ds/doc-clone-filter'

Doc update.

* ds/doc-clone-filter:
clone: document --filter options

Merge branch 'ar/test-style-fixes'

Style fixes.

* ar/test-style-fixes:
t: fix whitespace around &&
t9500: remove spaces after redirect operators

Revert "fetch: default to protocol version 2"

This reverts commit 684ceae32dae726c6a5c693b257b156926aba8b7.

Users fetching from linux-next and other kernel remotes are reporting
that the limited ref advertisement causes negotiation to reach
MAX_IN_VAIN, resulting in too-large fetches.

Reported-by: Lubomir Rintel <lkundrak@v3.sk>
Reported-by: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config.mak.uname: Define FREAD_READS_DIRECTORIES for GNU/Hurd

GNU/Hurd is another platform that behaves like this. Set it to
UnfortunatelyYes so that config directory files are correctly processed.
This fixes the corresponding 'proper error on directory "files"' test in
t1308-config-set.sh.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

receive-pack: compilation fix

We do not use C99 "for loop initial declaration" in our codebase
(yet), but one snuck in.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

docs: fix minor glitch in illustration

In the example by Jon Loeliger the selector 'A^2' was duplicated. This
might confuse readers.

Signed-off-by: Michael F. Schönitzer <michael@schoenitzer.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff-tree.c: load notes machinery when required

Since its introduction in 7249e91 (revision.c: support --notes
command-line option, 2011-03-29), combining '--notes' with any option
that causes us to format notes (e.g., '--pretty', '--format="%N"', etc)
results in a failed assertion at runtime.

  $ git rev-list HEAD | git diff-tree --stdin --pretty=medium --notes
  commit 8f3d9f354286745c751374f5f1fcafee6b3f3136
  git: notes.c:1308: format_display_notes: Assertion `display_notes_trees' failed.
  Aborted

This failure is due to diff-tree not calling 'load_display_notes' to
initialize the notes machinery.

Ordinarily, this failure isn't triggered, because it requires passing
both '--notes' and another of the above mentioned options. In the case
of '--pretty', for example, we set 'opt->verbose_header', causing
'show_log()' to eventually call 'format_display_notes()', which expects
a non-NULL 'display_note_trees'.

Without initializing the notes machinery, 'display_note_trees' remains
NULL, and thus triggers an assertion failure.

Fix this by initializing the notes machinery after parsing our options,
and harden this behavior against regression with a test in t4013. (Note
that the added ref in this test requires updating two unrelated tests
which use 'log --all', and thus need to learn about the new refs).

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9819: don't use test_must_fail with p4

We were using `test_must_fail p4` to test that the p4 command failed as
expected. However, test_must_fail() is used to ensure that commands fail
in an expected way, not due to something like a segv. Since we are not
in the business of verifying the sanity of the external world, replace
`test_must_fail p4` with `! p4` and assume that the `p4` command does
not die unexpectedly.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9164: use test_must_fail only on git commands

The `test_must_fail` function should only be used for git commands;
we are not in the business of catching segmentation fault by external
commands. Shell helper functions test_cmp and svn_cmd used in this
script are wrappers around external commands, so just use `! cmd`
instead of `test_must_fail cmd`

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9160: use test_path_is_missing()

The test_must_fail() function should only be used for git commands since
we assume that external commands work sanely. Since, not only should
this file not exist, but there shouldn't exit _any_ filesystem entity in
these paths, replace `test_must_fail test -f` with
`test_path_is_missing`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9141: use test_path_is_missing()

The test_must_fail() function should only be used for git commands since
we assume that external commands work sanely. Since, not only should
these directories not exist, but there shouldn't exist _any_ filesystem
entity in these paths, replace `test_must_fail test -d` with
`test_path_is_missing`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7508: don't use `test_must_fail test_cmp`

The test_must_fail function should only be used for git commands since
we assume that external commands work sanely. Since test_cmp() just
wraps an external command, replace `test_must_fail test_cmp` with
`! test_cmp`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7408: replace incorrect uses of test_must_fail

According to t/README, test_must_fail() should only be used to test for
failure in Git commands.

Replace the invocation of `test_must_fail test_path_is_file` with
`test_path_is_missing` since, in this test case, the path should not
exist at all.

In all the cases where `test_must_fail test_alternate_is_used` appears,
test_alternate_is_used() fails because test_line_count() cannot open the
non-existent $alternates_file. Replace
`test_must_fail test_alternate_is_used` with `test_path_is_missing` to
test for this directly.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6030: use test_path_is_missing()

The test_must_fail() function should only be used for git commands since
we should assume that external commands work sanely. Replace
`test_must_fail test -e` with `test_path_is_missing`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

grep: follow conventions for printing paths w/ unusual chars

grep does not follow the conventions used by other Git commands when
printing paths that contain unusual characters (as double-quotes or
newlines). Commands such as ls-files, commit, status and diff will:

- Quote and escape unusual pathnames, by default.
- Print names verbatim and unquoted when "-z" is used.

But grep *never* quotes/escapes absolute paths with unusual chars and
*always* quotes/escapes relative ones, even with "-z". Besides being
inconsistent in its own output, the deviation from other Git commands
can be confusing. So let's make it follow the two rules above and add
some tests for this new behavior. Note that, making grep quote/escape
all unusual paths by default, also make it fully compliant with the
core.quotePath configuration, which is currently ignored for absolute
paths.

Reported-by: Greg Hurrell <greg@hurrell.net>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Sync with 2.26.2

Git 2.26.2

This merges up the security fix from v2.17.5.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Git 2.25.4

This merges up the security fix from v2.17.5.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Git 2.24.3

This merges up the security fix from v2.17.5.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Git 2.23.3

This merges up the security fix from v2.17.5.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Git 2.22.4

This merges up the security fix from v2.17.5.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Git 2.21.3

This merges up the security fix from v2.17.5.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Git 2.20.4

This merges up the security fix from v2.17.5.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Git 2.19.5

This merges up the security fix from v2.17.5.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Git 2.18.4

This merges up the security fix from v2.17.5.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Git 2.17.5

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck: reject URL with empty host in .gitmodules

Git's URL parser interprets

https:///example.com/repo.git

to have no host and a path of "example.com/repo.git". Curl, on the
other hand, internally redirects it to https://example.com/repo.git. As
a result, until "credential: parse URL without host as empty host, not
unset", tricking a user into fetching from such a URL would cause Git to
send credentials for another host to example.com.

Teach fsck to block and detect .gitmodules files using such a URL to
prevent sharing them with Git versions that are not yet protected.

A relative URL in a .gitmodules file could also be used to trigger this.
The relative URL resolver used for .gitmodules does not normalize
sequences of slashes and can follow ".." components out of the path part
and to the host part of a URL, meaning that such a relative URL can be
used to traverse from a https://foo.example.com/innocent superproject to
a https:///attacker.example.com/exploit submodule. Fortunately,
redundant extra slashes in .gitmodules are rare, so we can catch this by
detecting one after a leading sequence of "./" and "../" components.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>

credential: treat URL with empty scheme as invalid

Until "credential: refuse to operate when missing host or protocol",
Git's credential handling code interpreted URLs with empty scheme to
mean "give me credentials matching this host for any protocol".

Luckily libcurl does not recognize such URLs (it tries to look for a
protocol named "" and fails). Just in case that changes, let's reject
them within Git as well. This way, credential_from_url is guaranteed to
always produce a "struct credential" with protocol and host set.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

credential: treat URL without scheme as invalid

libcurl permits making requests without a URL scheme specified. In
this case, it guesses the URL from the hostname, so I can run

git ls-remote http::ftp.example.com/path/to/repo

and it would make an FTP request.

Any user intentionally using such a URL is likely to have made a typo.
Unfortunately, credential_from_url is not able to determine the host and
protocol in order to determine appropriate credentials to send, and
until "credential: refuse to operate when missing host or protocol",
this resulted in another host's credentials being leaked to the named
host.

Teach credential_from_url_gently to consider such a URL to be invalid
so that fsck can detect and block gitmodules files with such URLs,
allowing server operators to avoid serving them to downstream users
running older versions of Git.

This also means that when such URLs are passed on the command line, Git
will print a clearer error so affected users can switch to the simpler
URL that explicitly specifies the host and protocol they intend.

One subtlety: .gitmodules files can contain relative URLs, representing
a URL relative to the URL they were cloned from. The relative URL
resolver used for .gitmodules can follow ".." components out of the path
part and past the host part of a URL, meaning that such a relative URL
can be used to traverse from a https://foo.example.com/innocent
superproject to a https::attacker.example.com/exploit submodule.
Fortunately a leading ':' in the first path component after a series of
leading './' and '../' components is unlikely to show up in other
contexts, so we can catch this by detecting that pattern.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>

credential: die() when parsing invalid urls

When we try to initialize credential loading by URL and find that the
URL is invalid, we set all fields to NULL in order to avoid acting on
malicious input. Later when we request credentials, we diagonse the
erroneous input:

fatal: refusing to work with credential missing host field

This is problematic in two ways:

- The message doesn't tell the user *why* we are missing the host
  field, so they can't tell from this message alone how to recover.
  There can be intervening messages after the original warning of
  bad input, so the user may not have the context to put two and two
  together.

- The error only occurs when we actually need to get a credential.  If
  the URL permits anonymous access, the only encouragement the user gets
  to correct their bogus URL is a quiet warning.

  This is inconsistent with the check we perform in fsck, where any use
  of such a URL as a submodule is an error.

When we see such a bogus URL, let's not try to be nice and continue
without helpers. Instead, die() immediately. This is simpler and
obviously safe. And there's very little chance of disrupting a normal
workflow.

It's _possible_ that somebody has a legitimate URL with a raw newline in
it. It already wouldn't work with credential helpers, so this patch
steps that up from an inconvenience to "we will refuse to work with it
at all". If such a case does exist, we should figure out a way to work
with it (especially if the newline is only in the path component, which
we normally don't even pass to helpers). But until we see a real report,
we're better off being defensive.

Reported-by: Carlo Arenas <carenas@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck: convert gitmodules url to URL passed to curl

In 07259e74ec1 (fsck: detect gitmodules URLs with embedded newlines,
2020-03-11), git fsck learned to check whether URLs in .gitmodules could
be understood by the credential machinery when they are handled by
git-remote-curl.

However, the check is overbroad: it checks all URLs instead of only
URLs that would be passed to git-remote-curl. In principle a git:// or
file:/// URL does not need to follow the same conventions as an http://
URL; in particular, git:// and file:// protocols are not succeptible to
issues in the credential API because they do not support attaching
credentials.

In the HTTP case, the URL in .gitmodules does not always match the URL
that would be passed to git-remote-curl and the credential machinery:
Git's URL syntax allows specifying a remote helper followed by a "::"
delimiter and a URL to be passed to it, so that

git ls-remote http::https://example.com/repo.git

invokes git-remote-http with https://example.com/repo.git as its URL
argument. With today's checks, that distinction does not make a
difference, but for a check we are about to introduce (for empty URL
schemes) it will matter.

.gitmodules files also support relative URLs. To ensure coverage for the
https based embedded-newline attack, urldecode and check them directly
for embedded newlines.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>

credential: refuse to operate when missing host or protocol

The credential helper protocol was designed to be very flexible: the
fields it takes as input are treated as a pattern, and any missing
fields are taken as wildcards. This allows unusual things like:

  echo protocol=https | git credential reject

to delete all stored https credentials (assuming the helpers themselves
treat the input that way). But when helpers are invoked automatically by
Git, this flexibility works against us. If for whatever reason we don't
have a "host" field, then we'd match _any_ host. When you're filling a
credential to send to a remote server, this is almost certainly not what
you want.

Prevent this at the layer that writes to the credential helper. Add a
check to the credential API that the host and protocol are always passed
in, and add an assertion to the credential_write function that speaks
credential helper protocol to be doubly sure.

There are a few ways this can be triggered in practice:

  - the "git credential" command passes along arbitrary credential
    parameters it reads from stdin.

  - until the previous patch, when the host field of a URL is empty, we
    would leave it unset (rather than setting it to the empty string)

  - a URL like "example.com/foo.git" is treated by curl as if "http://"
    was present, but our parser sees it as a non-URL and leaves all
    fields unset

  - the recent fix for URLs with embedded newlines blanks the URL but
    otherwise continues. Rather than having the desired effect of
    looking up no credential at all, many helpers will return _any_
    credential

Our earlier test for an embedded newline didn't catch this because it
only checked that the credential was cleared, but didn't configure an
actual helper. Configuring the "verbatim" helper in the test would show
that it is invoked (it's obviously a silly helper which doesn't look at
its input, but the point is that it shouldn't be run at all). Since
we're switching this case to die(), we don't need to bother with a
helper. We can see the new behavior just by checking that the operation
fails.

We'll add new tests covering partial input as well (these can be
triggered through various means with url-parsing, but it's simpler to
just check them directly, as we know we are covered even if the url
parser changes behavior in the future).

[jn: changed to die() instead of logging and showing a manual
username/password prompt]

Reported-by: Carlo Arenas <carenas@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

credential: parse URL without host as empty host, not unset

We may feed a URL like "cert:///path/to/cert.pem" into the credential
machinery to get the key for a client-side certificate. That
credential has no hostname field, which is about to be disallowed (to
avoid confusion with protocols where a helper _would_ expect a
hostname).

This means as of the next patch, credential helpers won't work for
unlocking certs. Let's fix that by doing two things:

  - when we parse a url with an empty host, set the host field to the
    empty string (asking only to match stored entries with an empty
    host) rather than NULL (asking to match _any_ host).

  - when we build a cert:// credential by hand, similarly assign an
    empty string

It's the latter that is more likely to impact real users in practice,
since it's what's used for http connections. But we don't have good
infrastructure to test it.

The url-parsing version will help anybody using git-credential in a
script, and is easy to test.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

t0300: use more realistic inputs

Many of the tests in t0300 give partial inputs to git-credential,
omitting a protocol or hostname. We're checking only high-level things
like whether and how helpers are invoked at all, and we don't care about
specific hosts. However, in preparation for tightening up the rules
about when we're willing to run a helper, let's start using input that's
a bit more realistic: pretend as if http://example.com is being
examined.

This shouldn't change the point of any of the tests, but do note we have
to adjust the expected output to accommodate this (filling a credential
will repeat back the protocol/host fields to stdout, and the helper
debug messages and askpass prompt will change on stderr).

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

t0300: make "quit" helper more realistic

We test a toy credential helper that writes "quit=1" and confirms that
we stop running other helpers. However, that helper is unrealistic in
that it does not bother to read its stdin at all.

For now we don't send any input to it, because we feed git-credential a
blank credential. But that will change in the next patch, which will
cause this test to racily fail, as git-credential will get SIGPIPE
writing to the helper rather than exiting because it was asked to.

Let's make this one-off helper more like our other sample helpers, and
have it source the "dump" script. That will read stdin, fixing the
SIGPIPE problem. But it will also write what it sees to stderr. We can
make the test more robust by checking that output, which confirms that
we do run the quit helper, don't run any other helpers, and exit for the
reason we expected.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

transport-helper: new method reject_atomic_push()

Add new method in transport-helper to reject all references if any
reference is failed for atomic push.

This method is reused in "send-pack.c" and "transport-helper.c", one for
SSH, git and file protocols, and the other for HTTP protocol.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

transport-helper: mark failure for atomic push

Commit v2.22.0-1-g3bca1e7f9f (transport-helper: enforce atomic in
push_refs_with_push, 2019-07-11) noticed the incomplete report of
failure of an atomic push for HTTP protocol. But the implementation
has a flaw that mark all remote references as failure.

Only mark necessary references as failure in `push_refs_with_push()` of
transport-helper.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

send-pack: mark failure of atomic push properly

When pushing with SSH or other smart protocol, references are validated
by function `check_to_send_update()` before they are sent in commands
to `send_pack()` of "receve-pack".  For atomic push, if a reference is
rejected after the validation, only references pushed by user should be
marked as failure, instead of report failure on all remote references.

Commit v2.22.0-1-g3bca1e7f9f (transport-helper: enforce atomic in
push_refs_with_push, 2019-07-11) wanted to fix report issue of HTTP
protocol, but marked all remote references failure for atomic push.

In order to fix the issue of status report for SSH or other built-in
smart protocol, revert part of that commit and add additional status
for function `atomic_push_failure()`.  The additional status for it
except the "REF_STATUS_EXPECTING_REPORT" status are:

- REF_STATUS_NONE : Not marked as "REF_STATUS_EXPECTING_REPORT" yet.
- REF_STATUS_OK   : Assume OK for dryrun or status_report is disabled.

This fix won't resolve the issue of status report in transport-helper
for HTTP or other protocols, and breaks test case in t5541.  Will fix
it in additional commit.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5543: never report what we do not push

When we push some references to the git server, we expect git to report
the status of the references we are pushing; no more, no less. But when
pusing with atomic mode, if some references cannot be pushed, Git reports
the reject message on all references in the remote repository.

Add new test cases in t5543, and fix them in latter commit.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

send-pack: fix inconsistent porcelain output

The porcelain output of a failed `git-push` command is inconsistent for
different protocols.  For example, the following `git-push` command
may fail due to the failure of the `pre-receive` hook.

    git push --porcelain origin HEAD:refs/heads/master

For SSH protocol, the porcelain output does not end with a "Done"
message:

To <URL/of/upstream.git>
!  HEAD:refs/heads/master  [remote rejected] (pre-receive hook declined)

While for HTTP protocol, the porcelain output does end with a "Done"
message:

To <URL/of/upstream.git>
!  HEAD:refs/heads/master  [remote rejected] (pre-receive hook declined)
Done

The following code at the end of function `send_pack()` indicates that
`send_pack()` should not return an error if some references are rejected
in porcelain mode.

    int send_pack(...)
        ... ...

        if (args->porcelain)
            return 0;

        for (ref = remote_refs; ref; ref = ref->next) {
            switch (ref->status) {
            case REF_STATUS_NONE:
            case REF_STATUS_UPTODATE:
            case REF_STATUS_OK:
                break;
            default:
                return -1;
            }
        }
        return 0;
    }

So if atomic push failed, must check the porcelain mode before return
an error.  And `receive_status()` should not return an error for a
failed updated reference, because `send_pack()` will check them instead.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

complete: zsh: add missing sub cmd completion candidates

Add missing 'restore' and 'switch' sub commands to zsh completion
candidate output. E.g.

  $ git re<tab>
  rebase    -- forward-port local commits to the updated upstream head
  reset     -- reset current HEAD to the specified state
  restore   -- restore working tree files

  $ git s<tab>
  show      -- show various types of objects
  status    -- show the working tree status
  switch    -- switch branches

Signed-off-by: Terry Moschou <tmoschou@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

blame: use changed-path Bloom filters

The changed-path Bloom filters help reduce the amount of tree
parsing required during history queries. Before calculating a
diff, we can ask the filter if a path changed between a commit
and its first parent. If the filter says "no" then we can move
on without parsing trees. If the filter says "maybe" then we
parse trees to discover if the answer is actually "yes" or "no".

When computing a blame, there is a section in find_origin() that
computes a diff between a commit and one of its parents. When this
is the first parent, we can check the Bloom filters before calling
diff_tree_oid().

In order to make this work with the blame machinery, we need to
initialize a struct bloom_key with the initial path. But also, we
need to add more keys to a list if a rename is detected. We then
check to see if _any_ of these keys answer "maybe" in the diff.

During development, I purposefully left out this "add a new key
when a rename is detected" to see if the test suite would catch
my error. That is how I discovered the issues with
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS from the previous change.
With that change, we can feel some confidence in the coverage of
this change.

If a user requests copy detection using "git blame -C", then there
are more places where the set of "important" files can expand. I
do not know enough about how this happens in the blame machinery.
Thus, the Bloom filter integration is explicitly disabled in this
mode. A later change could expand the bloom_key data with an
appropriate call (or calls) to add_bloom_key().

If we did not disable this mode, then the following tests would
fail:

t8003-blame-corner-cases.sh
t8011-blame-split-file.sh

Generally, this is a performance enhancement and should not
change the behavior of 'git blame' in any way. If a repo has a
commit-graph file with computed changed-path Bloom filters, then
they should notice improved performance for their 'git blame'
commands.

Here are some example timings that I found by blaming some paths
in the Linux kernel repository:

git blame arch/x86/kernel/topology.c >/dev/null

Before: 0.83s
  After: 0.24s

git blame kernel/time/time.c >/dev/null

Before: 0.72s
  After: 0.24s

git blame tools/perf/ui/stdio/hist.c >/dev/null

Before: 0.27s
  After: 0.11s

I specifically looked for "deep" paths that were also edited many
times. As a counterpoint, the MAINTAINERS file was edited many
times but is located in the root tree. This means that the cost of
computing a diff relative to the pathspec is very small. Here are
the timings for that command:

git blame MAINTAINERS >/dev/null

Before: 20.1s
  After: 18.0s

These timings are the best of five. The worst-case runs were on the
order of 2.5 minutes for both cases. Note that the MAINTAINERS file
has 18,740 lines across 17,000+ commits. This happens to be one of
the cases where this change provides the least improvement.

The lack of improvement for the MAINTAINERS file and the relatively
modest improvement for the other examples can be easily explained.
The blame machinery needs to compute line-level diffs to determine
which lines were changed by each commit. That makes up a large
proportion of the computation time, and this change does not
attempt to improve on that section of the algorithm. The
MAINTAINERS file is large and changed often, so it takes time to
determine which lines were updated by which commit. In contrast,
the code files are much smaller, and it takes longer to comute
the line-by-line diff for a single patch on the Linux mailing
lists.

Outside of the "-C" integration, I believe there is little more to
gain from the changed-path Bloom filters for 'git blame' after this
patch.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tests: write commit-graph with Bloom filters

The GIT_TEST_COMMIT_GRAPH environment variable updates the commit-
graph file whenever "git commit" is run, ensuring that we always
have an updated commit-graph throughout the test suite. The
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was
introduced to write the changed-path Bloom filters whenever "git
commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH
trick doesn't launch a separate process and instead writes it
directly.

To expand the number of tests that have commits in the commit-graph
file, add a helper method that computes the commit-graph and place
that helper inside "git commit" and "git merge".

In the helper method, check GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS
to ensure we are writing changed-path Bloom filters whenever
possible.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: complicated pathspecs disable filters

The changed-path Bloom filters work only when we can compute an
explicit Bloom filter key in advance. When a pathspec is given
that allows case-insensitive checks or wildcard matching, we
must disable the Bloom filter performance checks.

By checking the pathspec in prepare_to_use_bloom_filters(), we
avoid setting up the Bloom filter data and thus revert to the
usual logic.

Before this change, the following tests would fail*:

t6004-rev-list-path-optim.sh (Tests 6-7)
t6130-pathspec-noglob.sh (Tests 3-6)
t6131-pathspec-icase.sh (Tests 3-5)

*These tests would fail when using GIT_TEST_COMMIT_GRAPH and
GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS except that the latter
environment variable was not set up correctly to write the changed-
path Bloom filters in the test suite. That will be fixed in the
next change.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

log: add log.excludeDecoration config option

In 'git log', the --decorate-refs-exclude option appends a pattern
to a string_list. This list is used to prevent showing some refs
in the decoration output, or even by --simplify-by-decoration.

Users may want to use their refs space to store utility refs that
should not appear in the decoration output. For example, Scalar [1]
runs a background fetch but places the "new" refs inside the
refs/scalar/hidden/<remote>/* refspace instead of refs/<remote>/*
to avoid updating remote refs when the user is not looking. However,
these "hidden" refs appear during regular 'git log' queries.

A similar idea to use "hidden" refs is under consideration for core
Git [2].

Add the 'log.excludeDecoration' config option so users can exclude
some refs from decorations by default instead of needing to use
--decorate-refs-exclude manually. The config value is multi-valued
much like the command-line option. The documentation is careful to
point out that the config value can be overridden by the
--decorate-refs option, even though --decorate-refs-exclude would
always "win" over --decorate-refs.

Since the 'log.excludeDecoration' takes lower precedence to
--decorate-refs, and --decorate-refs-exclude takes higher
precedence, the struct decoration_filter needed another field.
This led also to new logic in load_ref_decorations() and
ref_filter_match().

There are several tests in t4202-log.sh that test the
--decorate-refs-(include|exclude) options, so these are extended.
Since the expected output is already stored as a file, most tests
could simply replace a "--decorate-refs-exclude" option with an
in-line config setting. Other tests involve the precedence of
the config option compared to command-line options and needed more
modification.

[1] https://github.com/microsoft/scalar
[2] https://lore.kernel.org/git/77b1da5d3063a2404cd750adfe3bb8be9b6c497d.1585946894.git.gitgitgadget@gmail.com/

Helped-by: Junio C Hamano <gister@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

log-tree: make ref_filter_match() a helper method

The ref_filter_match() method is defined in refs.h and implemented
in refs.c, but is only used by add_ref_decoration() in log-tree.c.
Move it into that file as a static helper method. The
match_ref_pattern() comes along for the ride.

While moving the code, also make a slight adjustment to have
ref_filter_match() take a struct decoration_filter pointer instead
of multiple string lists. This is non-functional, but will make a
later change be much cleaner.

The diff is easier to parse when using the --color-moved option.

Reported-by: Junio C Hamano <gister@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: explain "mboxrd" pretty format

The "mboxrd" pretty format was introduced in 9f23e04061 (pretty: support
"mboxrd" output format, 2016-06-05) but wasn't mentioned in the
documentation.

Signed-off-by: Emma Brooks <me@pluvano.com>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

range-diff: avoid negative string precision

If the supplied integer for "precision" is negative in
`"%.*s", len, line` then it is ignored. So the current code is
equivalent to just `"%s", line` because it is executed only if
`len` is negative.

Fix this by saving the value of `len` before overwriting it with the
return value of `parse_git_diff_header()`.

Signed-off-by: Vasil Dimov <vd@FreeBSD.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

range-diff: fix a crash in parsing git-log output

`git range-diff` calls `git log` internally and tries to parse its
output. But `git log` output can be customized by the user in their
git config and for certain configurations either an error will be
returned by `git range-diff` or it will crash.

To fix this explicitly set the output format of the internally
executed `git log` with `--pretty=medium`. Because that cancels
`--notes`, add explicitly `--notes` at the end.

Also, make sure we never crash in the same way - trying to dereference
`util` which was never created and has remained NULL. It would happen
if the first line of `git log` output does not begin with 'commit '.

Alternative considered but discarded - somehow disable all git configs
and behave as if no config is present in the internally executed
`git log`, but that does not seem to be possible. GIT_CONFIG_NOSYSTEM
is the closest to it, but even with that we would still read
`.git/config`.

Signed-off-by: Vasil Dimov <vd@FreeBSD.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

credential: treat "?" and "#" in URLs as end of host

It's unusual to see:

  https://example.com?query-parameters

without an intervening slash, like:

  https://example.com/some-path?query-parameters

or even:

  https://example.com/?query-parameters

but it is a valid end to the hostname (actually "authority component")
according to RFC 3986. Likewise for "#".

And curl will parse the URL according to the standard, meaning it will
contact example.com, but our credential code would ask about a bogus
hostname with a "?" in it. Let's make sure we follow the standard, and
more importantly ask about the same hosts that curl will be talking to.

It would be nice if we could just ask curl to parse the URL for us. But
it didn't grow a URL-parsing API until 7.62, so we'd be stuck with
fallback code either way. Plus we'd need this code in the main Git
binary, where we've tried to avoid having a link dependency on libcurl.

But let's at least fix our parser. Moving to curl's parser would prevent
other potential discrepancies, but this gives us immediate relief for
the known problem, and would help our fallback code if we eventually use
curl.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

freshen_file(): use NULL `times' for implicit current-time

Update freshen_file() to use a NULL `times', semantically equivalent to
the currently setup, with an explicit `actime' and `modtime' set to the
"current time", but with the advantage that it works with other files
not owned by the current user.

Fixes an issue on shared repos with a split index, where eventually a
user's operation creates a shared index, and another user will later do
an operation that will try to update its freshness, but will instead
raise a warning:
$ git status
warning: could not freshen shared index '.git/sharedindex.bd736fa10e0519593fefdb2aec253534470865b2'

Signed-off-by: Luciano Miguel Ferreira Rocha <luciano.rocha@booking.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph.c: introduce '--[no-]check-oids'

When operating on a stream of commit OIDs on stdin, 'git commit-graph
write' checks that each OID refers to an object that is indeed a commit.
This is convenient to make sure that the given input is well-formed, but
can sometimes be undesirable.

For example, server operators may wish to feed the refnames that were
updated during a push to 'git commit-graph write --input=stdin-commits',
and silently discard refs that don't point at commits. This can be done
by combing the output of 'git for-each-ref' with '--format
%(*objecttype)', but this requires opening up a potentially large number
of objects. Instead, it is more convenient to feed the updated refs to
the commit-graph machinery, and let it throw out refs that don't point
to commits.

Introduce '--[no-]check-oids' to make such a behavior possible. With
'--check-oids' (the default behavior to retain backwards compatibility),
'git commit-graph write' will barf on a non-commit line in its input.
With 'no-check-oids', such lines will be silently ignored, making the
above possible by specifying this option.

No matter which is supplied, 'git commit-graph write' retains the
behavior from the previous commit of rejecting non-OID inputs like
"HEAD" and "refs/heads/foo" as before.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph.h: replace 'commit_hex' with 'commits'

The 'write_commit_graph()' function takes in either a string list of
pack indices, or a string list of hexadecimal commit OIDs. These
correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from
'git commit-graph write'.

Using a string_list of hexadecimal commit IDs is not the most efficient
use of memory, since we can instead use the 'struct oidset', which is
more well-suited for this case.

This has another benefit which will become apparent in the following
commit. This is that we are about to disambiguate the kinds of errors we
produce with '--stdin-commits' into "non-hex input" and "hex-input, but
referring to a non-commit object". By having 'write_commit_graph' take
in a 'struct oidset *' of commits, we place the burden on the caller (in
this case, the builtin) to handle the first case, and the commit-graph
machinery can handle the second case.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

oidset: introduce 'oidset_size'

Occasionally, it may be useful for callers to know the number of object
IDs in an oidset. Right now, the only way to compute this is to call
'kh_size' on the internal 'kh_set_oid_t'.

Similar to how we wrap other 'kh_*' functions over the 'oidset' type,
let's allow callers to compute this value by introducing 'oidset_size'.

We will add its first caller in the subsequent commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/commit-graph.c: introduce split strategy 'replace'

When using split commit-graphs, it is sometimes useful to completely
replace the commit-graph chain with a new base.

For example, consider a scenario in which a repository builds a new
commit-graph incremental for each push. Occasionally (say, after some
fixed number of pushes), they may wish to rebuild the commit-graph chain
with all reachable commits.

They can do so with

$ git commit-graph write --reachable

but this removes the chain entirely and replaces it with a single
commit-graph in 'objects/info/commit-graph'. Unfortunately, this means
that the next push will have to move this commit-graph into the first
layer of a new chain, and then write its new commits on top.

Avoid such copying entirely by allowing the caller to specify that they
wish to replace the entirety of their commit-graph chain, while also
specifying that the new commit-graph should become the basis of a fresh,
length-one chain.

This addresses the above situation by making it possible for the caller
to instead write:

$ git commit-graph write --reachable --split=replace

which writes a new length-one chain to 'objects/info/commit-graphs',
making the commit-graph incremental generated by the subsequent push
relatively cheap by avoiding the aforementioned copy.

In order to do this, remove an assumption in 'write_commit_graph_file'
that chains are always at least two incrementals long.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/commit-graph.c: introduce split strategy 'no-merge'

In the previous commit, we laid the groundwork for supporting different
splitting strategies. In this commit, we introduce the first splitting
strategy: 'no-merge'.

Passing '--split=no-merge' is useful for callers which wish to write a
new incremental commit-graph, but do not want to spend effort condensing
the incremental chain [1]. Previously, this was possible by passing
'--size-multiple=0', but this no longer the case following 63020f175f
(commit-graph: prefer default size_mult when given zero, 2020-01-02).

When '--split=no-merge' is given, the commit-graph machinery will never
condense an existing chain, and it will always write a new incremental.

[1]: This might occur when, for example, a server administrator running
some program after each push may want to ensure that each job runs
proportional in time to the size of the push, and does not "jump" when
the commit-graph machinery decides to trigger a merge.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/commit-graph.c: support for '--split[=<strategy>]'

With '--split', the commit-graph machinery writes new commits in another
incremental commit-graph which is part of the existing chain, and
optionally decides to condense the chain into a single commit-graph.
This is done to ensure that the asymptotic behavior of looking up a
commit in an incremental chain is not dominated by the number of
incrementals in that chain. It can be controlled by the '--max-commits'
and '--size-multiple' options.

In the next two commits, we will introduce additional splitting
strategies that can exert additional control over:

  - when a split commit-graph is and isn't written, and

  - when the existing commit-graph chain is discarded completely and
    replaced with another graph

To prepare for this, make '--split' take an optional strategy (as in
'--split[=<strategy>]'), and add a new enum to describe which strategy
is being used. For now, no strategies are given, and the only enumerated
value is 'COMMIT_GRAPH_SPLIT_UNSPECIFIED', indicating the absence of a
strategy.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/helper/test-read-graph.c: support commit-graph chains

In 61df89c8e5 (commit-graph: don't early exit(1) on e.g. "git status",
2019-03-25), the former 'load_commit_graph_one' was refactored into
'open_commit_graph' and 'load_commit_graph_one_fd_st' as a means of
avoiding an early-exit from non-library code.

However, 'load_commit_graph_one' does not support commit-graph chains,
and hence the 'read-graph' test tool does not work with them.

Replace 'load_commit_graph_one' with 'read_commit_graph_one' in order to
support commit-graph chains. In the spirit of 61df89c8e5,
'read_commit_graph_one' does not ever 'die()', making it a suitable
replacement here.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Sync with v2.26.1

t: restrict `is_hidden` to be called only on Windows

The function won't work anywhere else, so let's mark it as an explicit
bug if it is called on a non-Windows platform.

Let's also rename the function to avoid cluttering the global namespace
with an overly-generic function name.

While at it, we also fix the code comment above that function: the
lower-case `windows` refers to something different than `Windows`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: make test_path_is_hidden more robust

This function uses Windows' system tool `attrib` to determine the state
of the hidden flag of a file or directory.

We should not actually expect the first `attrib.exe` in the PATH to
be the one we are looking for. Or that it is in the PATH, for that
matter.

Let's use the full path to the tool instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: consolidate the `is_hidden` functions

The `is_hidden` function can be used (only on Windows) to determine
whether a directory or file have their `hidden` flag set.

This function is duplicated between two test scripts. It is better to
move it into `test-lib-functions.sh` so that it is reused.

This patch is best viewed with `--color-moved`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge: use skip_prefix to parse config key

Instead of using `starts_with()`, the magic number 7, `strlen()` and a
fair number of additions to verify the three parts of the config key
"branch.<branch>.mergeoptions", use `skip_prefix()` to jump through them
more explicitly.

We need to introduce a new variable for this (we certainly can't modify
`k` just because we see "branch."!). With `skip_prefix()` we often use
quite bland names like `p` or `str`. Let's do the same. If and when this
function needs to do more prefix-skipping, we'll have a generic variable
ready for this.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase --merge: optionally skip upstreamed commits

When rebasing against an upstream that has had many commits since the
original branch was created:

O -- O -- ... -- O -- O (upstream)
\
-- O (my-dev-branch)

it must read the contents of every novel upstream commit, in addition to
the tip of the upstream and the merge base, because "git rebase"
attempts to exclude commits that are duplicates of upstream ones. This
can be a significant performance hit, especially in a partial clone,
wherein a read of an object may end up being a fetch.

Add a flag to "git rebase" to allow suppression of this feature. This
flag only works when using the "merge" backend.

This flag changes the behavior of sequencer_make_script(), called from
do_interactive_rebase() <- run_rebase_interactive() <-
run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
(indirectly called from sequencer_make_script() through
prepare_revision_walk()) will no longer call cherry_pick_list(), and
thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
means that the intermediate commits in upstream are no longer read (as
shown by the test) and means that no PATCHSAME-caused skipping of
commits is done by sequencer_make_script(), either directly or through
make_script_with_merges().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase: fix an incompatible-options error message

When the user specifies the apply backend with options that only work
with the merge backend, such as

    git rebase --apply --exec /bin/true HEAD~3

the error message has always been

    fatal: --exec requires an interactive rebase

This error message is misleading and was one of the reasons we renamed
the interactive backend to the merge backend.  Update the error message
to state that these options merely require use of the merge backend.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase: reinstate --no-keep-empty

Commit d48e5e21da ("rebase (interactive-backend): make --keep-empty the
default", 2020-02-15) turned --keep-empty (for keeping commits which
start empty) into the default.  The logic underpinning that commit was:

  1) 'git commit' errors out on the creation of empty commits without an
     override flag
  2) Once someone determines that the override is worthwhile, it's
     annoying and/or harmful to required them to take extra steps in
     order to keep such commits around (and to repeat such steps with
     every rebase).

While the logic on which the decision was made is sound, the result was
a bit of an overcorrection.  Instead of jumping to having --keep-empty
being the default, it jumped to making --keep-empty the only available
behavior.  There was a simple workaround, though, which was thought to
be good enough at the time.  People could still drop commits which
started empty the same way the could drop any commits: by firing up an
interactive rebase and picking out the commits they didn't want from the
list.  However, there are cases where external tools might create enough
empty commits that picking all of them out is painful.  As such, having
a flag to automatically remove start-empty commits may be beneficial.

Provide users a way to drop commits which start empty using a flag that
existed for years: --no-keep-empty.  Interpret --keep-empty as
countermanding any previous --no-keep-empty, but otherwise leaving
--keep-empty as the default.

This might lead to some slight weirdness since commands like
  git rebase --empty=drop --keep-empty
  git rebase --empty=keep --no-keep-empty
look really weird despite making perfect sense (the first will drop
commits which become empty, but keep commits that started empty; the
second will keep commits which become empty, but drop commits which
started empty).  However, --no-keep-empty was named years ago and we are
predominantly keeping it for backward compatibility; also we suspect it
will only be used rarely since folks already have a simple way to drop
commits they don't want with an interactive rebase.

Reported-by: Bryan Turner <bturner@atlassian.com>
Reported-by: Sami Boukortt <sami@boukortt.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase -i: mark commits that begin empty in todo editor

While many users who intentionally create empty commits do not want them
thrown away by a rebase, there are third-party tools that generate empty
commits that a user might not want.  In the past, users have used rebase
to get rid of such commits (a side-effect of the fact that the --apply
backend is not currently capable of keeping them).  While such users
could fire up an interactive rebase and just remove the lines
corresponding to empty commits, that might be difficult if the
third-party tool generates many of them.  Simplify this task for users
by marking such lines with a suffix of " # empty" in the todo list.

Suggested-by: Sami Boukortt <sami@boukortt.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: reject parsing of files over INT_MAX

While the last few commits have made it possible for the config parser
to handle config files up to the limits of size_t, the rest of the code
isn't really ready for this. In particular, we often feed the keys as
strings into printf "%s" format specifiers. And because the printf
family of functions must return an int to specify the result, they
complain. Here are two concrete examples (using glibc; we're in
uncharted territory here so results may vary):

Generate a gigantic .gitmodules file like this:

   git submodule add /some/other/repo foo
   {
           printf '[submodule "'
           perl -e 'print "a" x 2**31'
   echo '"]path = foo'
   } >.gitmodules
   git commit -m 'huge gitmodule'

then try this:

   $ git show
   BUG: strbuf.c:397: your vsnprintf is broken (returned -1)

The problem is that we end up calling:

   strbuf_addf(&sb, "submodule.%s.ignore", submodule_name);

which relies on vsnprintf(), and that function has no way to report back
a size larger than INT_MAX.

Taking that same file, try this:

  git config --file=.gitmodules --list --name-only

On my system it produces an output with exactly 4GB of spaces. I
confirmed in a debugger that we reach the config callback with the key
intact: it's 2147483663 bytes and full of a's. But when we print it with
this call:

  printf("%s%c", key_, term);

we just get the spaces.

So given the fact that these are insane cases which we have no need to
support, the weird behavior from feeding the results to printf even if
the code is careful, and the possibility of uncareful code introducing
its own integer truncation issues, let's just declare INT_MAX as a limit
for parsing config files.

We'll enforce the limit in get_next_char(), which generalizes over all
sources (blobs, files, etc) and covers any element we're parsing
(whether section, key, value, etc). For simplicity, the limit is over
the length of the _whole_ file, so you couldn't have two 1GB values in
the same file. This should be perfectly fine, as the expected size for
config files is generally kilobytes at most.

With this patch both cases above will yield:

  fatal: bad config line 1 in file .gitmodules

That's not an amazing error message, but the parser isn't set up to
provide specific messages (it just breaks out of the parsing loop and
gives that generic error even if see a syntactic issue). And we really
wouldn't expect to see this case outside of somebody maliciously probing
the limits of the config system.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: use size_t to store parsed variable baselen

Most of the config parsing infrastructure is limited in what it can
parse only by the size of memory, because it parses character by
character, building up strbufs for keys, values, etc. One exception is
the "baselen" value we keep in git_parse_source(), which is an int.

That stores the length of the section.subsection base, to which we can
then append individual key names (by truncating back to the baselen with
strbuf_setlen(), and then appending characters for the key name). But
because it's an int, if we see an absurdly long section or subsection,
we may overflow the integer, wrapping negative. That negative value is
then implicitly cast to a size_t when we pass it to strbuf_setlen(),
creating a very large value and triggering a BUG. For example:

  $ {
       printf '[foo "'
       perl -e 'print "a" x 2**31'
       echo '"]bar = value'
    } >huge
  $ git config --file=huge --list
  fatal: BUG: strbuf_setlen() beyond buffer

While this is obviously a silly case that we don't care about
supporting, it's worth fixing it by switching to a size_t for a few
reasons:

  - we should try to avoid hitting BUG assertions at all

  - avoiding integer truncation or overflow sets a good example and
    makes it easier to audit the code for more important issues

  - the BUG outcome is what happens in _this_ instance, because we wrap
    negative. If we used a 2**32 subsection, we'd wrap to a small
    positive value and actually generate wrong output (the subsection of
    our key would be truncated).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git_config_parse_key(): return baselen as size_t

As with the recent change to parse_config_key(), the best type to return
a string length is a size_t, as it won't cause integer truncation for a
gigantic key. And as with that change, this is mostly a clarity /
hygiene issue for now, as our config parser would choke on such a large
key anyway.

There are a few ripple effects within the config code, as callers switch
to using size_t. I also adjusted a few related variables that iterate
over strings. The most unexpected change is that a call to strbuf_addf()
had to switch to strbuf_add(). We can't use a size_t with "%.*s",
because printf precisions must have type "int" (we could cast, of
course, but that would miss the point of using size_t in the first
place).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: drop useless length variable in write_pair()

We compute the length of a subset of a string, but then use that length
only to feed a "%.*s" printf placeholder for the same string. We can
just use "%s" to achieve the same thing.

The variable became useless in cb891a5989 (Use a strbuf for building up
section header and key/value pair strings., 2007-12-14), which swapped
out a write() which _did_ use the length for a strbuf_addf() call.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

parse_config_key(): return subsection len as size_t

We return the length to a subset of a string using an "int *"
out-parameter. This is fine most of the time, as we'd expect config keys
to be relatively short, but it could behave oddly if we had a gigantic
config key. A more appropriate type is size_t.

Let's switch over, which lets our callers use size_t as appropriate
(they are bound by our type because they must pass the out-parameter as
a pointer). This is mostly just a cleanup to make it clear this code
handles long strings correctly. In practice, our config parser already
chokes on long key names (because of a similar int/size_t mixup!).

When doing an int/size_t conversion, we have to be careful that nobody
was trying to assign a negative value to the variable. I manually
confirmed that for each case here. They tend to just feed the result to
xmemdupz() or similar; in a few cases I adjusted the parameter types for
helper functions to make sure the size_t is preserved.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: drop auto-strlen behavior of make_branch() and make_rewrite()

The make_branch() and make_rewrite() functions can take a NUL-terminated
string or a ptr/len pair. They use a sentinel value of "0" for the len
to tell the difference between the two. However, when parsing config
like:

  [branch ""]
  merge = whatever

whose key flattens to:

  branch..merge

we might actually have a zero-length branch name. This is obviously
nonsense, but the current code would consider it as a NUL-terminated
string and use the branch name ".merge".

We could use a better sentinel value here (like "-1"), but that gets in
the way of moving to size_t, which is a more appropriate type for a
ptr/len combo.

Let's instead just drop this feature and have the callers (of which
there are only two total) use strlen() themselves. This simplifies the
code, and lets us move to using size_t.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: cope with the Isilon network file system

On certain network filesystems (currently encountered with Isilon, but
in theory more network storage solutions could be causing the same
issue), when the directory in question is missing,
`raceproof_create_file()` fails with an `ERROR_INVALID_PARAMETER`
instead of an `ERROR_PATH_NOT_FOUND`.

Since it is highly unlikely that we produce such an error by mistake
(the parameters we pass are fairly benign), we can be relatively certain
that the directory is missing in this instance. So let's just translate
that error automagically.

This fixes https://github.com/git-for-windows/git/issues/1345.

Signed-off-by: Nathan Sanders <spekbukkem@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: let GitHub Actions upload failed tests' directories

Arguably, CI builds' most important task is to not only identify
regressions, but to make it as easy as possible to investigate what went
wrong.

In that light, we will want to provide users with a way to inspect the
tests' output as well as the corresponding directories.

This commit adds build steps that are only executed when tests failed,
uploading the relevant information as build artifacts. These artifacts
can then be downloaded by interested parties to diagnose the failures
more efficiently.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: add a problem matcher for GitHub Actions

With this patch, test failures will be annotated with a helpful,
clickable message in GitHub Actions. For details, see
https://github.com/actions/toolkit/blob/master/docs/problem-matchers.md

Note: we need to set `TEST_SHELL_PATH` to Bash so that the problem
matcher is fed a file and line number for each test failure.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tests: when run in Bash, annotate test failures with file name/line number

When a test fails, it is nice to see where the corresponding code lives
in the worktree. Sadly, it seems that only Bash allows us to infer this
information. Let's do it when we detect that we're running in a Bash.

This will come in handy in the next commit, where we teach the GitHub
Actions workflow to annotate failed test runs with this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: retire the Azure Pipelines definition

We have GitHub Actions now. Running the same builds and tests in Azure
Pipelines would be redundant, and a waste of energy.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

README: add a build badge for the GitHub Actions runs

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: configure GitHub Actions for CI/PR

This patch adds CI builds via GitHub Actions. While the underlying
technology is at least _very_ similar to that of Azure Pipelines, GitHub
Actions are much easier to set up than Azure Pipelines:

- no need to install a GitHub App,

- no need to set up an Azure DevOps account,

- all you need to do is push to your fork on GitHub.

Therefore, it makes a lot of sense for us to have a working GitHub
Actions setup.

While copy/editing `azure-pipelines.yml` into
`.github/workflows/main.yml`, we also use the opportunity to accelerate
the step that sets up a minimal subset of Git for Windows' SDK in the
Windows-build job:

- we now download a `.tar.xz` stored in Azure Blobs and extract it
simultaneously by calling `curl` and piping the result to `tar`,

- decompressing via `xz`,

- all three utilities are installed together with Git for Windows

At the same time, we also make use of the matrix build feature, which
reduces the amount of repeated text by quite a bit.

Also, we do away with the parts that try to mount a file share on which
`prove` can store data between runs. It is just too complicated to set
up, and most times the tree changes anyway, so there is little return on
investment there.

Initial-patch-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: run gem with sudo to install asciidoctor

In a later patch, we will run Documentation job in GitHub Actions.
The job will run without elevated permission.

Run `gem` with `sudo` to elevate permission in order to be able to
install to system location.
This will also keep this installation in-line with other installation in
our Linux system for CI.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[Danh: reword commit message]
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: explicit install all required packages

In a later patch, we will support GitHub Action.

Explicitly install all of our build dependencies on Linux.
Since GitHub Action's Linux VM hasn't installed our build dependencies.
And there're no harm to reinstall them (in Travis)

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

interactive: explicitly `fflush` stdout before expecting input

At least one interactive command writes a prompt to `stdout` and then
reads user input on `stdin`: `git clean --interactive`. If the prompt is
left in the buffer, the user will not realize the program is waiting for
their input.

So let's just flush `stdout` before reading the user's input.

Signed-off-by: 마누엘 <nalla@hamal.uberspace.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

interactive: refactor code asking the user for interactive input

There are quite a few code locations (e.g. `git clean --interactive`)
where Git asks the user for an answer. In preparation for fixing a bug
shared by all of them, and also to DRY up the code, let's refactor it.

Please note that most of these callers trimmed white-space both at the
beginning and at the end of the answer, instead of trimming only the
end (as the caller in `add-patch.c` does).

Therefore, technically speaking, we change behavior in this patch. At
the same time, it can be argued that this is actually a bug fix.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: help debugging by optionally executing bash with strace

MSYS2's strace facility is very useful for debugging... With this patch,
the bash will be executed through strace if the environment variable
GIT_STRACE_COMMANDS is set, which comes in real handy when investigating
issues in the test suite.

Also support passing a path to a log file via GIT_STRACE_COMMANDS to
force Git to call strace.exe with the `-o <path>` argument, i.e. to log
into a file rather than print the log directly.

That comes in handy when the output would otherwise misinterpreted by a
calling process as part of Git's output.

Note: the values "1", "yes" or "true" are *not* specifying paths, but
tell Git to let strace.exe log directly to the console.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: --show-pulls adds helpful merges

The default file history simplification of "git log -- <path>" or
"git rev-list -- <path>" focuses on providing the smallest set of
commits that first contributed a change. The revision walk greatly
restricts the set of walked commits by visiting only the first
TREESAME parent of a merge commit, when one exists. This means
that portions of the commit-graph are not walked, which can be a
performance benefit, but can also "hide" commits that added changes
but were ignored by a merge resolution.

The --full-history option modifies this by walking all commits and
reporting a merge commit as "interesting" if it has _any_ parent
that is not TREESAME. This tends to be an over-representation of
important commits, especially in an environment where most merge
commits are created by pull request completion.

Suppose we have a commit A and we create a commit B on top that
changes our file. When we merge the pull request, we create a merge
commit M. If no one else changed the file in the first-parent
history between M and A, then M will not be TREESAME to its first
parent, but will be TREESAME to B. Thus, the simplified history
will be "B". However, M will appear in the --full-history mode.

However, suppose that a number of topics T1, T2, ..., Tn were
created based on commits C1, C2, ..., Cn between A and M as
follows:

  A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn
   \     \     \            \  /      /    /            /
    \     \__.. \            \/ ..__T1    /           Tn
     \           \__..       /\     ..__T2           /
      \_____________________B  \____________________/

If the commits T1, T2, ... Tn did not change the file, then all of
P1 through Pn will be TREESAME to their first parent, but not
TREESAME to their second. This means that all of those merge commits
appear in the --full-history view, with edges that immediately
collapse into the lower history without introducing interesting
single-parent commits.

The --simplify-merges option was introduced to remove these extra
merge commits. By noticing that the rewritten parents are reachable
from their first parents, those edges can be simplified away. Finally,
the commits now look like single-parent commits that are TREESAME to
their "only" parent. Thus, they are removed and this issue does not
cause issues anymore. However, this also ends up removing the commit
M from the history view! Even worse, the --simplify-merges option
requires walking the entire history before returning a single result.

Many Git users are using Git alongside a Git service that provides
code storage alongside a code review tool commonly called "Pull
Requests" or "Merge Requests" against a target branch.  When these
requests are accepted and merged, they typically create a merge
commit whose first parent is the previous branch tip and the second
parent is the tip of the topic branch used for the request. This
presents a valuable order to the parents, but also makes that merge
commit slightly special. Users may want to see not only which
commits changed a file, but which pull requests merged those commits
into their branch. In the previous example, this would mean the
users want to see the merge commit "M" in addition to the single-
parent commit "C".

Users are even more likely to want these merge commits when they
use pull requests to merge into a feature branch before merging that
feature branch into their trunk.

In some sense, users are asking for the "first" merge commit to
bring in the change to their branch. As long as the parent order is
consistent, this can be handled with the following rule:

  Include a merge commit if it is not TREESAME to its first
  parent, but is TREESAME to a later parent.

These merges look like the merge commits that would result from
running "git pull <topic>" on a main branch. Thus, the option to
show these commits is called "--show-pulls". This has the added
benefit of showing the commits created by closing a pull request or
merge request on any of the Git hosting and code review platforms.

To test these options, extend the standard test example to include
a merge commit that is not TREESAME to its first parent. It is
surprising that that option was not already in the example, as it
is instructive.

In particular, this extension demonstrates a common issue with file
history simplification. When a user resolves a merge conflict using
"-Xours" or otherwise ignoring one side of the conflict, they create
a TREESAME edge that probably should not be TREESAME. This leads
users to become frustrated and complain that "my change disappeared!"
In my experience, showing them history with --full-history and
--simplify-merges quickly reveals the problematic merge. As mentioned,
this option is expensive to compute. The --show-pulls option
_might_ show the merge commit (usually titled "resolving conflicts")
more quickly. Of course, this depends on the user having the correct
parent order, which is backwards when using "git pull master" from a
topic branch.

There are some special considerations when combining the --show-pulls
option with --simplify-merges. This requires adding a new PULL_MERGE
object flag to store the information from the initial TREESAME
comparisons. This helps avoid dropping those commits in later filters.
This is covered by a test, including how the parents can be simplified.
Since "struct object" has already ruined its 32-bit alignment by using
33 bits across parsed, type, and flags member, let's not make it worse.
PULL_MERGE is used in revision.c with the same value (1u<<15) as
REACHABLE in commit-graph.c. The REACHABLE flag is only used when
writing a commit-graph file, and a revision walk using --show-pulls
does not happen in the same process. Care must be taken in the future
to ensure this remains the case.

Update Documentation/rev-list-options.txt with significant details
around this option. This requires updating the example in the
History Simplification section to demonstrate some of the problems
with TREESAME second parents.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pull: pass --autostash to merge

Before, `--autostash` only worked with `git pull --rebase`. However, in
the last patch, merge learned `--autostash` as well so there's no reason
why we should have this restriction anymore. Teach pull to pass
`--autostash` to merge, just like it did for rebase.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5520: make test_pull_autostash() accept expect_parent_num

Before, test_pull_autostash() was hardcoded to run
`test_cmp_rev HEAD^ copy` to test that a rebase happened. However, in a
future patch, we plan on testing merging as well. Make
test_pull_autostash() accept a parent number as an argument so that, in
the future, we can test if a merge happened in addition to a rebase.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge: teach --autostash option

In rebase, one can pass the `--autostash` option to cause the worktree
to be automatically stashed before continuing with the rebase. This
option is missing in merge, however.

Implement the `--autostash` option and corresponding `merge.autoStash`
option in merge which stashes before merging and then pops after.

This option is useful when a developer has some local changes on a topic
branch but they realize that their work depends on another branch.
Previously, they had to run something like

git fetch ...
git stash push
git merge FETCH_HEAD
git stash pop

but now, that is reduced to

git fetch ...
git merge --autostash FETCH_HEAD

When an autostash is generated, it is automatically reapplied to the
worktree only in three explicit situations:

1. An incomplete merge is commit using `git commit`.
2. A merge completes successfully.
3. A merge is aborted using `git merge --abort`.

In all other situations where the merge state is removed using
remove_merge_branch_state() such as aborting a merge via
`git reset --hard`, the autostash is saved into the stash reflog
instead keeping the worktree clean.

Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Suggested-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: implement apply_autostash_oid()

Split apply_save_autostash() into apply_autostash_oid() and
apply_save_autostash() where the former operates on an OID string and
the latter reads the OID from a file before passing it into
apply_save_autostash_oid().

This function is required for a future commmit which will rely on being
able to apply an autostash whose OID is stored as a string.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: implement save_autostash()

Extract common functionality of apply_autostash() into
apply_save_autostash() and use it to implement save_autostash(). This
function will be used in a future commit.

The difference between save_autostash() and apply_autostash() is that
the former does not try to apply the stash. It skips that step and
just stores the created entry in the stash reflog.

This is useful in the case where we abort an operation when an autostash
is present but we don't want to dirty the worktree with the application
of the stash. For example, in a future commit, we will implement
`git merge --autostash`. Since merges can be aborted using
`git reset --hard`, we'd make use of save_autostash() to save the
autostash entry instead of applying it to the worktree thus keeping the
worktree undirtied.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: unlink autostash in apply_autostash()

Explicitly remove autostash file in apply_autostash() once it has been
applied successfully.

This is currently a no-op because the only users of this function will unlink
the state (including the autostash file) after this function runs.
However, in the future, we will introduce a user of the function that
does not explicitly remove the state so we do it here.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: extract perform_autostash() from rebase

Lib-ify the autostash code by extracting perform_autostash() from rebase
into sequencer. In a future commit, this will be used to implement
`--autostash` in other builtins.

This patch is best viewed with `--color-moved`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase: generify create_autostash()

In the future, we plan on lib-ifying create_autostash() so we need it to
be more generic. Make it more generic by making it accept a
`struct repository` argument instead of implicitly using the non-repo
functions and `the_repository`. Also, make it accept a `path` argument
so that we no longer rely have to rely on `struct rebase_options`.
Finally, make it accept a `default_reflog_action` argument so we no
longer have to rely on `DEFAULT_REFLOG_ACTION`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase: extract create_autostash()

In a future commit, we will lib-ify this code. In preparation for
this, extract the code into the create_autostash() function so that it
can be cleaned up before it is finally lib-ified.

This patch is best viewed with `--color-moved` and
`--color-moved-ws=allow-indentation-change`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: extract reset_head() from rebase

Continue the process of lib-ifying the autostash code. In a future
commit, this will be used to implement `--autostash` in other builtins.

This patch is best viewed with `--color-moved`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase: generify reset_head()

In the future, we plan on lib-ifying reset_head() so we need it to
be more generic. Make it more generic by making it accept a
`struct repository` argument instead of implicitly using the non-repo
functions. Also, make it accept a `const char *default_reflog_action`
argument so that the default action of "rebase" isn't hardcoded in.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase: use apply_autostash() from sequencer.c

The apply_autostash() function in builtin/rebase.c is similar enough to
the apply_autostash() function in sequencer.c that they are almost
interchangeable, except for the type of arg they accept. Make the
sequencer.c version extern and use it in rebase.

The rebase version was introduced in 6defce2b02 (builtin rebase: support
`--autostash` option, 2018-09-04) as part of the shell to C conversion.
It opted to duplicate the function because, at the time, there was
another in-progress project converting interactive rebase from shell to
C as well and they did not want to clash with them by refactoring
sequencer.c version of apply_autostash(). Since both efforts are long
done, we can freely combine them together now.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: rename stash_sha1 to stash_oid

The preferred terminology is to refer to object identifiers as "OIDs".
Rename the `stash_sha1` variable to `stash_oid` in order to conform to
this.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>