git.oblomov.eu Git

commit-graph: introduce 'commitGraph.maxNewFilters'

Introduce a configuration variable to specify a default value for the
recently-introduce '--max-new-filters' option of 'git commit-graph
write'.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/commit-graph.c: introduce '--max-new-filters=<n>'

Introduce a command-line flag to specify the maximum number of new Bloom
filters that a 'git commit-graph write' is willing to compute from
scratch.

Prior to this patch, a commit-graph write with '--changed-paths' would
compute Bloom filters for all selected commits which haven't already
been computed (i.e., by a previous commit-graph write with '--split'
such that a roll-up or replacement is performed).

This behavior can cause prohibitively-long commit-graph writes for a
variety of reasons:

  * There may be lots of filters whose diffs take a long time to
    generate (for example, they have close to the maximum number of
    changes, diffing itself takes a long time, etc).

  * Old-style commit-graphs (which encode filters with too many entries
    as not having been computed at all) cause us to waste time
    recomputing filters that appear to have not been computed only to
    discover that they are too-large.

This can make the upper-bound of the time it takes for 'git commit-graph
write --changed-paths' to be rather unpredictable.

To make this command behave more predictably, introduce
'--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters
from scratch. This lets "computing" already-known filters proceed
quickly, while bounding the number of slow tasks that Git is willing to
do.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: rename 'split_commit_graph_opts'

In the subsequent commit, additional options will be added to the
commit-graph API which have nothing to do with splitting.

Rename the 'split_commit_graph_opts' structure to the more-generic
'commit_graph_opts' to encompass both. Likewise, rename the 'flags'
member to instead be 'split_flags' to clarify that it only has to do
with the behavior implied by '--split'.

Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bloom: encode out-of-bounds filters as non-empty

When a changed-path Bloom filter has either zero, or more than a
certain number (commonly 512) of entries, the commit-graph machinery
encodes it as "missing". More specifically, it sets the indices adjacent
in the BIDX chunk as equal to each other to indicate a "length 0"
filter; that is, that the filter occupies zero bytes on disk.

This has heretofore been fine, since the commit-graph machinery has no
need to care about these filters with too few or too many changed paths.
Both cases act like no filter has been generated at all, and so there is
no need to store them.

In a subsequent commit, however, the commit-graph machinery will learn
to only compute Bloom filters for some commits in the current
commit-graph layer. This is a change from the current implementation
which computes Bloom filters for all commits that are in the layer being
written. Critically for this patch, only computing some of the Bloom
filters means adding a third state for length 0 Bloom filters: zero
entries, too many entries, or "hasn't been computed".

It will be important for that future patch to distinguish between "not
representable" (i.e., zero or too-many changed paths), and "hasn't been
computed". In particular, we don't want to waste time recomputing
filters that have already been computed.

To that end, change how we store Bloom filters in the "computed but not
representable" category:

  - Bloom filters with no entries are stored as a single byte with all
    bits low (i.e., all queries to that Bloom filter will return
    "definitely not")

  - Bloom filters with too many entries are stored as a single byte with
    all bits set high (i.e., all queries to that Bloom filter will
    return "maybe").

These rules are sufficient to not incur a behavior change by changing
the on-disk representation of these two classes. Likewise, no
specification changes are necessary for the commit-graph format, either:

  - Filters that were previously empty will be recomputed and stored
    according to the new rules, and

  - old clients reading filters generated by new clients will interpret
    the filters correctly and be none the wiser to how they were
    generated.

Clients will invoke the Bloom machinery in more cases than before, but
this can be addressed by returning a NULL filter when all bits are set
high. This can be addressed in a future patch.

Note that this does increase the size of on-disk commit-graphs, but far
less than other proposals. In particular, this is generally more
efficient than storing a bitmap for which commits haven't computed their
Bloom filters. Storing a bitmap incurs a penalty of one bit per commit,
whereas storing explicit filters as above incurs a penalty of one byte
per too-large or empty commit.

In practice, these boundary commits likely occupy a small proportion of
the overall number of commits, and so the size penalty is likely smaller
than storing a bitmap for all commits.

See, for example, these relative proportions of such boundary commits
(collected by SZEDER Gábor):

                  |     Percentage of     |    commit-graph   |           |
                  |   commits modifying   |     file size     |           |
                  ├────────┬──────────────┼───────────────────┤    pct.   |
                  | 0 path | >= 512 paths | before  |  after  |   change  |
┌────────────────┼────────┼──────────────┼─────────┼─────────┼───────────┤
| android-base   | 13.20% |        0.13% | 37.468M | 37.534M | +0.1741 % |
| cmssw          |  0.15% |        0.23% | 17.118M | 17.119M | +0.0091 % |
| cpython        |  3.07% |        0.01% |  7.967M |  7.971M | +0.0423 % |
| elasticsearch  |  0.70% |        1.00% |  8.833M |  8.835M | +0.0128 % |
| gcc            |  0.00% |        0.08% | 16.073M | 16.074M | +0.0030 % |
| gecko-dev      |  0.14% |        0.64% | 59.868M | 59.874M | +0.0105 % |
| git            |  0.11% |        0.02% |  3.895M |  3.895M | +0.0020 % |
| glibc          |  0.02% |        0.10% |  3.555M |  3.555M | +0.0021 % |
| go             |  0.00% |        0.07% |  3.186M |  3.186M | +0.0018 % |
| homebrew-cask  |  0.40% |        0.02% |  7.035M |  7.035M | +0.0065 % |
| homebrew-core  |  0.01% |        0.01% | 11.611M | 11.611M | +0.0002 % |
| jdk            |  0.26% |        5.64% |  5.537M |  5.540M | +0.0590 % |
| linux          |  0.01% |        0.51% | 63.735M | 63.740M | +0.0073 % |
| llvm-project   |  0.12% |        0.03% | 25.515M | 25.516M | +0.0050 % |
| rails          |  0.10% |        0.10% |  6.252M |  6.252M | +0.0027 % |
| rust           |  0.07% |        0.17% |  9.364M |  9.364M | +0.0033 % |
| tensorflow     |  0.09% |        1.02% |  7.009M |  7.010M | +0.0158 % |
| webkit         |  0.05% |        0.31% | 17.405M | 17.406M | +0.0047 % |

(where the above increase is determined by computing a non-split
commit-graph before and after this patch).

Given that these projects are all "large" by commit count, the storage
cost by writing these filters explicitly is negligible. In the most
extreme example, android-base (which has 494,848 commits at the time of
writing) would have its commit-graph increase by a modest 68.4 KB.

Finally, a test to exercise filters which contain too many changed path
entries will be introduced in a subsequent patch.

Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: Jakub Narębski <jnareb@gmail.com>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bloom/diff: properly short-circuit on max_changes

Commit e3696980 (diff: halt tree-diff early after max_changes,
2020-03-30) intended to create a mechanism to short-circuit a diff
calculation after a certain number of paths were modified. By
incrementing a "num_changes" counter throughout the recursive
ll_diff_tree_paths(), this was supposed to match the number of changes
that would be written into the changed-path Bloom filters.
Unfortunately, this was not implemented correctly and instead misses
simple cases like file modifications. This then does not stop very
large changed-path filters from being written (unless they add or remove
many files).

To start, change the implementation in ll_diff_tree_paths() to instead
use the global diff_queue_diff struct's 'nr' member as the count. This
is a way to simplify the logic instead of making more mistakes in the
complicated diff code.

This has a drawback: the diff_queue_diff struct only lists the paths
corresponding to blob changes, not their leading directories. Thus,
get_or_compute_bloom_filter() needs an additional check to see if the
hashmap with the leading directories becomes too large.

One reason why this was not caught by test cases was that the test in
t4216-log-bloom.sh that was supposed to check this "too many changes"
condition only checked this on the initial commit of a repository. The
old logic counted these values correctly. Update this test in a few
ways:

1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit,
   allowing smaller commits to engage with this logic.

2. Create several interesting cases of edits, adds, removes, and mode
   changes (in the second commit). By testing both sides of the
   inequality with the *_MAX_CHANGED_PATHS variable, we can see that
   the count is exactly correct, so none of these changes are missed
   or over-counted.

3. Use the trace2 data value filter_found_large to verify that these
   commits are on the correct side of the limit.

Another way to verify the behavior is correct is through performance
tests. By testing on my local copies of the Git repository and the Linux
kernel repository, I could measure the effect of these short-circuits
when computing a fresh commit-graph file with changed-path Bloom filters
using the command

  GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \
    git commit-graph write --reachable --changed-paths

and reporting the wall time and resulting commit-graph size.

For Git, the results are

|        |      N=1       |       N=10     |      N=512     |
|--------|----------------|----------------|----------------|
| HEAD~1 | 10.90s  9.18MB | 11.11s  9.34MB | 11.31s  9.35MB |
| HEAD   |  9.21s  8.62MB | 11.11s  9.29MB | 11.29s  9.34MB |

For Linux, the results are

|        |       N=1      |     N=20      |     N=512     |
|--------|----------------|---------------|---------------|
| HEAD~1 | 61.28s  64.3MB | 76.9s  72.6MB | 77.6s  72.6MB |
| HEAD   | 49.44s  56.3MB | 68.7s  65.9MB | 69.2s  65.9MB |

Naturally, the improvement becomes much less as the limit grows, as
fewer commits satisfy the short-circuit.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bloom: use provided 'struct bloom_filter_settings'

When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter
from scratch, it looks to the default 'struct bloom_filter_settings' in
order to determine the maximum number of changed paths, number of bits
per entry, and so on.

All of these values have so far been constant, and so there was no need
to pass in a pointer from the caller (eg., the one that is stored in the
'struct write_commit_graph_context').

Start passing in a 'struct bloom_filter_settings *' instead of using the
default values to respect graph-specific settings (eg., in the case of
setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS').

In order to have an initialized value for these settings, move its
initialization to earlier in the commit-graph write.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bloom: split 'get_bloom_filter()' in two

'get_bloom_filter' takes a flag to control whether it will compute a
Bloom filter if the requested one is missing. In the next patch, we'll
add yet another parameter to this method, which would force all but one
caller to specify an extra 'NULL' parameter at the end.

Instead of doing this, split 'get_bloom_filter' into two functions:
'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only
looks up a Bloom filter (and does not compute one if it's missing,
thus dropping the 'compute_if_not_present' flag). The latter does
compute missing Bloom filters, with an additional parameter to store
whether or not it needed to do so.

This simplifies many call-sites, since the majority of existing callers
to 'get_bloom_filter' do not want missing Bloom filters to be computed
(so they can drop the parameter entirely and use the simpler version of
the function).

While we're at it, instrument the new 'get_or_compute_bloom_filter()'
with counters in the 'write_commit_graph_context' struct which store
the number of filters that we did and didn't compute, as well as filters
that were truncated.

It would be nice to drop the 'compute_if_not_present' flag entirely,
since all remaining callers of 'get_or_compute_bloom_filter' pass it as
'1', but this will change in a future patch and hence cannot be removed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph.c: store maximum changed paths

For now, we assume that there is a fixed constant describing the
maximum number of changed paths we are willing to store in a Bloom
filter.

Prepare for that to (at least partially) not be the case by making it a
member of the 'struct bloom_filter_settings'. This will be helpful in
the subsequent patches by reducing the size of test cases that exercise
storing too many changed paths, as well as preparing for an eventual
future in which this value might change.

This patch alone does not cause newly generated Bloom filters to use
a custom upper-bound on the maximum number of changed paths a single
Bloom filter can hold, that will occur in a later patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: respect 'commitGraph.readChangedPaths'

Git uses the 'core.commitGraph' configuration value to control whether
or not the commit graph is used when parsing commits or performing a
traversal.

Now that commit-graphs can also contain a section for changed-path Bloom
filters, administrators that already have commit-graphs may find it
convenient to use those graphs without relying on their changed-path
Bloom filters. This can happen, for example, during a staged roll-out,
or in the event of an incident.

Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom
filters are read. Note that this configuration is independent from both:

  - 'core.commitGraph', to allow flexibility in using all parts of a
    commit-graph _except_ for its Bloom filters.

  - The '--changed-paths' option for 'git commit-graph write', to allow
    reading and writing Bloom filters to be controlled independently.

When the variable is set, pretend as if no Bloom data was specified at
all. This avoids adding additional special-casing outside of the
commit-graph internals.

Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/helper/test-read-graph.c: prepare repo settings

The read-graph test-tool is used by a number of the commit-graph test to
assert various properties about a commit-graph. Previously, this program
never ran 'prepare_repo_settings()'. There was no need to do so, since
none of the commit-graph machinery is affected by the repo settings.

In the next patch, the commit-graph machinery's behavior will become
dependent on the repo settings, and so loading them before running the
rest of the test tool is critical.

As such, teach the test tool to call 'prepare_repo_settings()'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: pass a 'struct repository *' in more places

In a future commit, some commit-graph internals will want access to
'r->settings', but we only have the 'struct object_directory *'
corresponding to that repository.

Add an additional parameter to pass the repository around in more
places.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4216: use an '&&'-chain

In a759bfa9ee (t4216: add end to end tests for git log with Bloom
filters, 2020-04-06), a 'rm' invocation was added without a
corresponding '&&' chain.

When 'trace.perf' already exists, everything works fine. However, the
function can be executed without 'trace.perf' on disk (eg., when the
subset of tests run is altered with '--run'), and so the bare 'rm'
complains about a missing file.

To remove some noise from the test log, invoke 'rm' with '-f', at which
point it is sensible to place the 'rm -f' in an '&&'-chain, which is
both (1) our usual style, and (2) avoids a broken chain in the future if
more commands are added at the beginning of the function.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: introduce 'get_bloom_filter_settings()'

Many places in the code often need a pointer to the commit-graph's
'struct bloom_filter_settings', in which case they often take the value
from the top-most commit-graph.

In the non-split case, this works as expected. In the split case,
however, things get a little tricky. Not all layers in a chain of
incremental commit-graphs are required to themselves have Bloom data,
and so whether or not some part of the code uses Bloom filters depends
entirely on whether or not the top-most level of the commit-graph chain
has Bloom filters.

This has been the behavior since Bloom filters were introduced, and has
been codified into the tests since a759bfa9ee (t4216: add end to end
tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130
requires that Bloom filters are not used in exactly the case described
earlier.

There is no reason that this needs to be the case, since it is perfectly
valid for commits in an earlier layer to have Bloom filters when commits
in a newer layer do not.

Since Bloom settings are guaranteed in practice to be the same for any
layer in a chain that has Bloom data, it is sufficient to traverse the
'->base_graph' pointer until either (1) a non-null 'struct
bloom_filter_settings *' is found, or (2) until we are at the root of
the commit-graph chain.

Introduce a 'get_bloom_filter_settings()' function that does just this,
and use it instead of purely dereferencing the top-most graph's
'->bloom_filter_settings' pointer.

While we're at it, add an additional test in t5324 to guard against code
in the commit-graph writing machinery that doesn't correctly handle a
NULL 'struct bloom_filter *'.

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Fourth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jt/pretend-object-never-come-from-elsewhere'

The pretend-object mechanism checks if the given object already
exists in the object store before deciding to keep the data
in-core, but the check would have triggered lazy fetching of such
an object from a promissor remote.

* jt/pretend-object-never-come-from-elsewhere:
sha1-file: make pretend_object_file() not prefetch

Merge branch 'jt/pack-objects-prefetch-in-batch'

While packing many objects in a repository with a promissor remote,
lazily fetching missing objects from the promissor remote one by
one may be inefficient---the code now attempts to fetch all the
missing objects in batch (obviously this won't work for a lazy
clone that lazily fetches tree objects as you cannot even enumerate
what blobs are missing until you learn which trees are missing).

* jt/pack-objects-prefetch-in-batch:
pack-objects: prefetch objects to be packed
pack-objects: refactor to oid_object_info_extended

Merge branch 'mp/complete-show-color-moved'

Command line completion (in contrib/) update.

* mp/complete-show-color-moved:
completion: add show --color-moved[-ws]

Third batch

A couple of brown-paper-bag fixes, plus the other "The branch
'master' no longer is special" fix.

Now we are ready to rewind 'next'.

Merge branch 'cc/pretty-contents-size' into master

Brown-paper-bag fix.

* cc/pretty-contents-size:
t6300: fix issues related to %(contents:size)

Merge branch 'hn/reftable' into master

Brown-paper-bag fix.

* hn/reftable:
refs: move the logic to add \t to reflog to the files backend

Merge branch 'jc/fmt-merge-msg-suppress-destination' into master

"git merge" learned to selectively omit " into <branch>" at the end
of the title of default merge message with merge.suppressDest
configuration.

* jc/fmt-merge-msg-suppress-destination:
fmt-merge-msg: allow merge destination to be omitted again
Revert "fmt-merge-msg: stop treating `master` specially"

t6300: fix issues related to %(contents:size)

b6839fda68 (ref-filter: add support for %(contents:size), 2020-07-16)
added a new format for ref-filter, and added a function to generate
tests for this new feature in t6300.  Unfortunately, it tries to run
`test_expect_sucess' instead of `test_expect_success', and writes
$expect to `expected', but tries to read `expect'.  Those two issues
were probably unnoticed because the script only printed errors, but did
not crash.  This fixes these issues.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: move the logic to add \t to reflog to the files backend

523fa69c (reflog: cleanse messages in the refs.c layer, 2020-07-10)
centralized reflog normalizaton. However, the normalizaton added a
leading "\t" to the message. This is an artifact of the reflog
storage format in the files backend, so it should be added there.

Routines that parse back the reflog (such as grab_nth_branch_switch)
expect the "\t" to not be in the message, so without this fix, git
with reftable cannot process the "@{-1}" syntax.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The second batch -- mostly minor typofixes

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jb/doc-packfile-name' into master

Doc update.

* jb/doc-packfile-name:
pack-write/docs: update regarding pack naming

Merge branch 'sg/ci-git-path-fix-with-pyenv' into master

CI fixup---tests of Python scripts didn't use the version of Git
that is being tested.

* sg/ci-git-path-fix-with-pyenv:
ci: use absolute PYTHON_PATH in the Linux jobs

Merge branch 'en/typofixes' into master

* en/typofixes:
hashmap: fix typo in usage docs
Remove doubled words in various comments

Merge branch 'rs/grep-simpler-parse-object-or-die-call' into master

* rs/grep-simpler-parse-object-or-die-call:
grep: avoid using oid_to_hex() with parse_object_or_die()

Merge branch 'ar/help-guides-doc' into master

* ar/help-guides-doc:
git-help.txt: fix mentions of option --guides

Merge branch 'sk/typofixes' into master

* sk/typofixes:
comment: fix spelling mistakes inside comments

First batch post 2.28

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'en/fill-directory-exponential' into master

Fix to a regression introduced during 2.27 cycle.

* en/fill-directory-exponential:
dir: check pathspecs before returning `path_excluded`

Merge branch 'ct/mv-unmerged-path-error' into master

"git mv src dst", when src is an unmerged path, errored out
correctly but with an incorrect error message to claim that src is
not tracked, which has been clarified.

* ct/mv-unmerged-path-error:
git-mv: improve error message for conflicted file

Merge branch 'bc/push-cas-cquoted-refname' into master

Pushing a ref whose name contains non-ASCII character with the
"--force-with-lease" option did not work over smart HTTP protocol,
which has been corrected.

* bc/push-cas-cquoted-refname:
remote-curl: make --force-with-lease work with non-ASCII ref names

Merge branch 'cc/pretty-contents-size' into master

"git for-each-ref --format=<>" learned %(contents:size).

* cc/pretty-contents-size:
  ref-filter: add support for %(contents:size)
  t6300: test refs pointing to tree and blob
  Documentation: clarify %(contents:XXXX) doc

Merge branch 'rs/add-index-entry-optim-fix' into master

Fix to an ancient bug caused by an over-eager attempt for
optimization.

* rs/add-index-entry-optim-fix:
read-cache: remove bogus shortcut

Merge branch 'jt/avoid-lazy-fetching-upon-have-check' into master

Fetching from a lazily cloned repository resulted at the server
side in attempts to lazy fetch objects that the client side has,
many of which will not be available from the third-party anyway.

* jt/avoid-lazy-fetching-upon-have-check:
upload-pack: do not lazy-fetch "have" objects

Merge branch 'dl/test-must-fail-fixes-6' into master

Dev support to limit the use of test_must_fail to only git commands.

* dl/test-must-fail-fixes-6:
  test-lib-functions: restrict test_must_fail usage
  t9400: don't use test_must_fail with cvs
  t9834: remove use of `test_might_fail p4`
  t7107: don't use test_must_fail()
  t5324: reorder `run_with_limited_open_files test_might_fail`
  t3701: stop using `env` in force_color()

Merge branch 'jk/reject-newer-extensions-in-v0' into master

With the base fix to 2.27 regresion, any new extensions in a v0
repository would still be silently honored, which is not quite
right. Instead, complain and die loudly.

* jk/reject-newer-extensions-in-v0:
verify_repository_format(): complain about new extensions in v0 repo

Merge branch 'hn/reftable' into master

Preliminary clean-up of the refs API in preparation for adding a
new refs backend "reftable".

* hn/reftable:
  reflog: cleanse messages in the refs.c layer
  bisect: treat BISECT_HEAD as a pseudo ref
  t3432: use git-reflog to inspect the reflog for HEAD
  lib-t6000.sh: write tag using git-update-ref

Merge branch 'bw/fail-cloning-into-non-empty' into master

"git clone --separate-git-dir=$elsewhere" used to stomp on the
contents of the existing directory $elsewhere, which has been
taught to fail when $elsewhere is not an empty directory.

* bw/fail-cloning-into-non-empty:
git clone: don't clone into non-empty directory

Merge branch 'pb/log-rev-list-doc' into master

"git help log" has been enhanced by sharing more material from the
documentation for the underlying "git rev-list" command.

* pb/log-rev-list-doc:
  git-log.txt: include rev-list-description.txt
  git-rev-list.txt: move description to separate file
  git-rev-list.txt: tweak wording in set operations
  git-rev-list.txt: fix Asciidoc syntax
  revisions.txt: describe 'rev1 rev2 ...' meaning for ranges
  git-log.txt: add links to 'rev-list' and 'diff' docs

Merge branch 'jk/tests-timestamp-fix' into master

The test framework has been updated so that most tests will run
with predictable (artificial) timestamps.

* jk/tests-timestamp-fix:
  t9100: stop depending on commit timestamps
  test-lib: set deterministic default author/committer date
  t9100: explicitly unset GIT_COMMITTER_DATE
  t5539: make timestamp requirements more explicit
  t9700: loosen ident timezone regex
  t6000: use test_tick consistently

Merge branch 'ds/commit-graph-bloom-updates' into master

Updates to the changed-paths bloom filter.

* ds/commit-graph-bloom-updates:
  commit-graph: check all leading directories in changed path Bloom filters
  revision: empty pathspecs should not use Bloom filters
  revision.c: fix whitespace
  commit-graph: check chunk sizes after writing
  commit-graph: simplify chunk writes into loop
  commit-graph: unify the signatures of all write_graph_chunk_*() functions
  commit-graph: persist existence of changed-paths
  bloom: fix logic in get_bloom_filter()
  commit-graph: change test to die on parse, not load
  commit-graph: place bloom_settings in context

Merge branch 'sg/commit-graph-cleanups' into master

The changed-path Bloom filter is improved using ideas from an
independent implementation.

* sg/commit-graph-cleanups:
  commit-graph: simplify write_commit_graph_file() #2
  commit-graph: simplify write_commit_graph_file() #1
  commit-graph: simplify parse_commit_graph() #2
  commit-graph: simplify parse_commit_graph() #1
  commit-graph: clean up #includes
  diff.h: drop diff_tree_oid() & friends' return value
  commit-slab: add a function to deep free entries on the slab
  commit-graph-format.txt: all multi-byte numbers are in network byte order
  commit-graph: fix parsing the Chunk Lookup table
  tree-walk.c: don't match submodule entries for 'submod/anything'

fmt-merge-msg: allow merge destination to be omitted again

In Git 2.28, we stopped special casing 'master' when producing the
default merge message by just removing the code to squelch "into
'master'" at the end of the message.

Introduce multi-valued merge.suppressDest configuration variable
that gives a set of globs to match against the name of the branch
into which the merge is being made, to let users specify for which
branch fmt-merge-msg's output should be shortened. When it is not
set, 'master' is used as the sole value of the variable by default.

The above move mostly reverts the pre-2.28 default in repositories
that have no relevant configuration.

Add a few tests to protect the behaviour with the new configuration
variable from future regression.

Helped-by: Linus Torvalds <torvalds@linux-foundation.org>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Revert "fmt-merge-msg: stop treating `master` specially"

This reverts commit 489947cee5095b168cbac111ff7bd1eadbbd90dd, which
stopped treating merges into the 'master' branch as special when
preparing the default merge message. As the goal was not to have
any single branch designated as special, it solved it by leaving the
"into <branchname>" at the end of the title of the default merge
message for any and all branches. An obvious and easy alternative
to treat everybody equally could have been to remove it for every
branch, but that involves loss of information.

We'll introduce a new mechanism to let end-users specify merges into
which branches would omit the "into <branchname>" from the title of
the default merge message, and make the mechanism, when unconfigured,
treat the traditional 'master' special again, so all the changes to
the tests we made earlier will become unnecessary, as these tests
will be run without configuring the said new mechanism.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

comment: fix spelling mistakes inside comments

This commit fixes a couple of minor spelling mistakes inside
comments.

Signed-off-by: Steve Kemp <steve@steve.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-help.txt: fix mentions of option --guides

Fix typos introduced in commit a133737b80 ("doc: include --guide option
description for "git help"", 2013-04-02).

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

grep: avoid using oid_to_hex() with parse_object_or_die()

parse_object_or_die() is passed an object ID and a name to show if the
object cannot be parsed. If the name is NULL then it shows the
hexadecimal object ID. Use that feature instead of preparing and
passing the hexadecimal representation to the function proactively.
That's shorter and a bit more efficient.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hashmap: fix typo in usage docs

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Remove doubled words in various comments

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Git 2.28

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge tag 'l10n-2.28.0-rnd1' of https://github.com/git-l10n/git-po into master

l10n-2.28.0-rnd1

* tag 'l10n-2.28.0-rnd1' of https://www.github.com/git-l10n/git-po:
  l10n: es: 2.28.0 round 1
  l10n: de.po: Update German translation for Git v2.28.0
  l10n: de.po: fix grammar
  l10n: zh_CN: for git v2.28.0 l10n round 1
  l10n: zh_TW.po: v2.28.0 round 1 (0 untranslated)
  l10n: vi.po: correct "ident line" translation
  l10n: vi.po(4931t): Updated translation for v2.28.0
  l10n: fr v2.28.0 round 1
  l10n: sv.po: Update Swedish translation (4931t0f0u)
  l10n: it.po: update the Italian translation for Git 2.28.0 round 1
  l10n: tr: v2.28.0 round 1
  l10n: git.pot: v2.28.0 round 1 (70 new, 14 removed)
  l10n: Update Catalan translation

Merge branch 'master' of github.com:Softcatala/git-po

* 'master' of github.com:Softcatala/git-po:
l10n: Update Catalan translation

l10n: es: 2.28.0 round 1

Signed-off-by: Christopher Diaz Riveros <christopher.diaz.riv@gmail.com>

Merge branch 'ps/ref-transaction-hook' into master

A new hook.

* ps/ref-transaction-hook:
githooks.txt: use correct "reference-transaction" hook name

githooks.txt: use correct "reference-transaction" hook name

The "reference transaction" hook was introduced in commit 6754159767
(refs: implement reference transaction hook, 2020-06-19). The name of
the hook is declared as "reference-transaction" in "refs.c" and
testcases, but the name declared in "githooks.txt" is different.

Signed-off-by: Bojun Chen <bojun.cbj@alibaba-inc.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

l10n: de.po: Update German translation for Git v2.28.0

Reviewed-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Matthias Rüster <matthias.ruester@gmail.com>

l10n: de.po: fix grammar

Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Matthias Rüster <matthias.ruester@gmail.com>

ci: use absolute PYTHON_PATH in the Linux jobs

In our test suite, when 'git p4' invokes a Git command as a
subprocesses, then it should run the 'git' binary we are testing.
Unfortunately, this is not the case in the 'linux-clang' and
'linux-gcc' jobs on Travis CI, where 'git p4' runs the system
'/usr/bin/git' instead.

Travis CI's default Linux image includes 'pyenv', and all Python
invocations that involve PATH lookup go through 'pyenv', e.g. our
'PYTHON_PATH=$(which python3)' sets '/opt/pyenv/shims/python3' as
PYTHON_PATH, which in turn will invoke '/usr/bin/python3'. Alas, the
'pyenv' version included in this image is buggy, and prepends the
directory containing the Python binary to PATH even if that is a
system directory already in PATH near the end. Consequently, 'git p4'
in those jobs ends up with its PATH starting with '/usr/bin', and then
runs '/usr/bin/git'.

So use the absolute paths '/usr/bin/python{2,3}' explicitly when
setting PYTHON_PATH in those Linux jobs to avoid the PATH lookup and
thus the bogus 'pyenv' from interfering with our 'git p4' tests.
Don't bother with special-casing Travis CI: while this issue doesn't
affect the corresponding Linux jobs on GitHub Actions, both CI systems
use Ubuntu LTS-based images, so we can safely rely on these Python
paths.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-write/docs: update regarding pack naming

The index-pack documentation explicitly states that the pack
name is derived from the sorted list of object names, but
since commit 1190a1acf800 ("pack-objects: name pack files
after trailer hash") that isn't true anymore.

Be less explicit in the docs as to what the exact output is,
and just say that it's whatever goes into the pack name.

Also update a comment on write_idx_file() since it no longer
modifies the sha1 variable (it's const now anyway), as noted
by Junio.

Fixes: 1190a1acf800 ("pack-objects: name pack files after trailer hash")
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation/RelNotes: fix a typo in 2.28's relnotes

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Git 2.28-rc2

Signed-off-by: Junio C Hamano <gitster@pobox.com>

sha1-file: make pretend_object_file() not prefetch

When pretend_object_file() is invoked with an object that does not exist
(as is the typical case), there is no need to fetch anything from the
promisor remote, because the caller already knows what the object is
supposed to contain. Therefore, suppress the fetch. (The
OBJECT_INFO_QUICK flag is added for the same reason.)

This was noticed at $DAYJOB when "blame" was run on a file that had
uncommitted modifications.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-objects: prefetch objects to be packed

When an object to be packed is noticed to be missing, prefetch all
to-be-packed objects in one batch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-objects: refactor to oid_object_info_extended

Use oid_object_info_extended() instead of oid_object_info() because a
subsequent commit needs to specify an additional flag here.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'en/sparse-status' into master

Fix to a "git prompt" regression during this development cycle.

* en/sparse-status:
git-prompt: change == to = for zsh's sake

l10n: zh_CN: for git v2.28.0 l10n round 1

Translate 70 new messages (4931t0f0u) for git 2.28.0.

Reviewed-by: Fangyi Zhou <me@fangyi.io>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>

Merge branch 'l10n/zh_TW/200716' of github.com:l10n-tw/git-po

* 'l10n/zh_TW/200716' of github.com:l10n-tw/git-po:
l10n: zh_TW.po: v2.28.0 round 1 (0 untranslated)

remote-curl: make --force-with-lease work with non-ASCII ref names

When we invoke a remote transport helper and pass an option with an
argument, we quote the argument as a C-style string if necessary.  This
is the case for the cas option, which implements the --force-with-lease
command-line flag, when we're passing a non-ASCII refname.

However, the remote curl helper isn't designed to parse such an
argument, meaning that if we try to use --force-with-lease with an HTTP
push and a non-ASCII refname, we get an error like this:

  error: cannot parse expected object name '0000000000000000000000000000000000000000"'

Note the double quote, which get_oid has reminded us is not valid in an
hex object ID.

Even if we had been able to parse it, we would send the wrong data to
the server: we'd send an escaped ref, which would not behave as the user
wanted and might accidentally result in updating or deleting a ref we
hadn't intended.

Since we need to expect a quoted C-style string here, just check if the
first argument is a double quote, and if so, unquote it.  Note that if
the refname contains a double quote, then we will have double-quoted it
already, so there is no ambiguity.

We test for this case only in the smart protocol, since the DAV-based
protocol is not capable of handling this capability.  We use UTF-8
because this is nicer in our tests and friendlier to Windows, but the
code should work for all non-ASCII refs.

While we're at it, since the name of the option is now well established
and isn't going to change, let's inline it instead of using the #define
constant.

Reported-by: Frej Bjon <frej.bjon@nemit.fi>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-prompt: change == to = for zsh's sake

When using git-prompt.sh with zsh, __git_ps1 currently errs
when inside a repo with:

__git_ps1:96: = not found

Avoid using non-portable "==" that is only understood by bash
and not zsh. Change to "=" so that the prompt script becomes
usable with zsh again.

Signed-off-by: David J. Malan <malan@harvard.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-mv: improve error message for conflicted file

'git mv' has always complained about renaming a conflicted
file, as it cannot handle multiple index entries for one file.
However, the error message it uses has been the same as the
one for an untracked file:

fatal: not under version control, src=...

which is patently wrong. Distinguish the two cases and
add a test to make sure we produce the correct message.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir: check pathspecs before returning `path_excluded`

In 95c11ecc73 ("Fix error-prone fill_directory() API; make it only
return matches", 2020-04-01), we taught `fill_directory()`, or more
specifically `treat_path()`, to check against any pathspecs so that we
could simplify the callers.

But in doing so, we added a slightly-too-early return for the "excluded"
case. We end up not checking the pathspecs, meaning we return
`path_excluded` when maybe we should return `path_none`. As a result,
`git status --ignored -- pathspec` might show paths that don't actually
match "pathspec".

Move the "excluded" check down to after we've checked any pathspecs.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge https://github.com/prati0100/git-gui into master

* https://github.com/prati0100/git-gui:
git-gui: allow opening work trees from the startup dialog

l10n: zh_TW.po: v2.28.0 round 1 (0 untranslated)

Signed-off-by: Yi-Jyun Pan <pan93412@gmail.com>

l10n: vi.po: correct "ident line" translation

While we're at it, fix some minor misspelling
and improve translation for 3-way-merging.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Tran Ngoc Quan <vnwildman@gmail.com>

l10n: vi.po(4931t): Updated translation for v2.28.0

Signed-off-by: Tran Ngoc Quan <vnwildman@gmail.com>

Merge branch 'dl/branch-cleanup' into master

Last minute fix-up to tests for portability.

* dl/branch-cleanup:
t3200: don't grep for `strerror()` string

Merge branch 'js/pu-to-seen' into master

Last minute fix-up to documentation.

* js/pu-to-seen:
gitworkflows.txt: fix broken subsection underline

Merge branch 'jc/relnotes-v0-extension-update' into master

Last minute fix-up to the release notes.

* jc/relnotes-v0-extension-update:
RelNotes: update the v0 with extension situation

t3200: don't grep for `strerror()` string

In 6b7093064a ("t3200: test for specific errors", 2020-06-15), we
learned to grep stderr to ensure that the failing `git branch`
invocations fail for the right reason. In two of these tests, we grep
for "File exists", expecting the string to show up there since config.c
calls `error_errno()`, which ends up including `strerror(errno)` in the
error message.

But as we saw in 4605a73073 ("t1091: don't grep for `strerror()`
string", 2020-03-08), there exists at least one implementation where
`strerror()` yields a slightly different string than the one we're
grepping for. In particular, these tests fail on the NonStop platform.

Similar to 4605a73073, grep for the beginning of the string instead to
avoid relying on `strerror()` behavior.

Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitworkflows.txt: fix broken subsection underline

AsciiDoctor renders the "~~~~~~~~~" literally. That's not our intention:
it is supposed to indicate a level 2 subsection. In 828197de8f ("docs:
adjust for the recent rename of `pu` to `seen`", 2020-06-25), the length
of this section header grew by two characters but we didn't adjust the
number of ~ characters accordingly. AsciiDoc handles this discrepancy ok
and still picks this up as a subsection title, but Asciidoctor is not as
forgiving.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

RelNotes: update the v0 with extension situation

With the two-patch series for regression fix, to the users from 2.27
days, there is no visible behaviour change---we do not warn and fail
use of v0 repositories with newer extensions yet, so there is nothing
to note in the backward compatibility section.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Git 2.28-rc1

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jn/v0-with-extensions-fix' into master

In 2.28-rc0, we corrected a bug that some repository extensions are
honored by mistake even in a version 0 repositories (these
configuration variables in extensions.* namespace were supposed to
have special meaning in repositories whose version numbers are 1 or
higher), but this was a bit too big a change.

* jn/v0-with-extensions-fix:
repository: allow repository format upgrade with extensions
Revert "check_repository_format_gently(): refuse extensions for old repositories"

upload-pack: do not lazy-fetch "have" objects

When upload-pack receives a request containing "have" hashes, it (among
other things) checks if the served repository has the corresponding
objects. However, it does not do so with the
OBJECT_INFO_SKIP_FETCH_OBJECT flag, so if serving a partial clone, a
lazy fetch will be triggered first.

This was discovered at $DAYJOB when a user fetched from a partial clone
(into another partial clone - although this would also happen if the
repo to be fetched into is not a partial clone).

Therefore, whenever "have" hashes are checked for existence, pass the
OBJECT_INFO_SKIP_FETCH_OBJECT flag. Also add the OBJECT_INFO_QUICK flag
to improve performance, as it is typical that such objects do not exist
in the serving repo, and the consequences of a false negative are minor
(usually, a slightly larger pack sent).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ref-filter: add support for %(contents:size)

It's useful and efficient to be able to get the size of the
contents directly without having to pipe through `wc -c`.

Also the result of the following:

`git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c`

is off by one as `git for-each-ref` appends a newline character
after the contents, which can be seen by comparing its output
with the output from `git cat-file`.

As with %(contents), %(contents:size) is silently ignored, if a
ref points to something other than a commit or a tag:

```
$ git update-ref refs/mytrees/first HEAD^{tree}
$ git for-each-ref --format='%(contents)' refs/mytrees/first

$ git for-each-ref --format='%(contents:size)' refs/mytrees/first

```

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

read-cache: remove bogus shortcut

has_dir_name() has some optimizations for the case where entries are
added to an index in the correct order.  They kick in if the new entry
sorts after the last one.  One of them exits early if the last entry has
a longer name than the directory of the new entry.  Here's its comment:

/*
* The directory prefix lines up with part of
* a longer file or directory name, but sorts
* after it, so this sub-directory cannot
* collide with a file.
*
* last: xxx/yy-file (because '-' sorts before '/')
* this: xxx/yy/abc
*/

However, a file named xxx/yy would be sorted before xxx/yy-file because
'-' sorts after NUL, so the length check against the last entry is not
sufficient to rule out a collision.  Remove it.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

verify_repository_format(): complain about new extensions in v0 repo

We made the mistake in the past of respecting extensions.* even when the
repository format version was set to 0. This is bad because forgetting
to bump the repository version means that older versions of Git (which
do not know about our extensions) won't complain. I.e., it's not a
problem in itself, but it means your repository is in a state which does
not give you the protection you think you're getting from older
versions.

For compatibility reasons, we are stuck with that decision for existing
extensions. However, we'd prefer not to extend the damage further. We
can do that by catching any newly-added extensions and complaining about
the repository format.

Note that this is a pretty heavy hammer: we'll refuse to work with the
repository at all. A lesser option would be to ignore (possibly with a
warning) any new extensions. But because of the way the extensions are
handled, that puts the burden on each new extension that is added to
remember to "undo" itself (because they are handled before we know
for sure whether we are in a v1 repo or not, since we don't insist on a
particular ordering of config entries).

So one option would be to rewrite that handling to record any new
extensions (and their values) during the config parse, and then only
after proceed to handle new ones only if we're in a v1 repository. But
I'm not sure if it's worth the trouble:

  - ignoring extensions is likely to end up with broken results anyway
    (e.g., ignoring a proposed objectformat extension means parsing any
    object data is likely to encounter errors)

  - this is a sign that whatever tool wrote the extension field is
    broken. We may be better off notifying immediately and forcefully so
    that such tools don't even appear to work accidentally.

The only downside is that fixing the situation is a little tricky,
because programs like "git config" won't want to work with the
repository. But:

  git config --file=.git/config core.repositoryformatversion 1

should still suffice.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repository: allow repository format upgrade with extensions

Now that we officially permit repository extensions in repository
format v0, permit upgrading a repository with extensions from v0 to v1
as well.

For example, this means a repository where the user has set
"extensions.preciousObjects" can use "git fetch --filter=blob:none
origin" to upgrade the repository to use v1 and the partial clone
extension.

To avoid mistakes, continue to forbid repository format upgrades in v0
repositories with an unrecognized extension. This way, a v0 user
using a misspelled extension field gets a chance to correct the
mistake before updating to the less forgiving v1 format.

While we're here, make the error message for failure to upgrade the
repository format a bit shorter, and present it as an error, not a
warning.

Reported-by: Huan Huan Chen <huanhuanchen@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Revert "check_repository_format_gently(): refuse extensions for old repositories"

This reverts commit 14c7fa269e42df4133edd9ae7763b678ed6594cd.

The core.repositoryFormatVersion field was introduced in ab9cb76f661
(Repository format version check., 2005-11-25), providing a welcome
bit of forward compatibility, thanks to some welcome analysis by
Martin Atukunda.  The semantics are simple: a repository with
core.repositoryFormatVersion set to 0 should be comprehensible by all
Git implementations in active use; and Git implementations should
error out early instead of trying to act on Git repositories with
higher core.repositoryFormatVersion values representing new formats
that they do not understand.

A new repository format did not need to be defined until 00a09d57eb8
(introduce "extensions" form of core.repositoryformatversion,
2015-06-23).  This provided a finer-grained extension mechanism for
Git repositories.  In a repository with core.repositoryFormatVersion
set to 1, Git implementations can act on "extensions.*" settings that
modify how a repository is interpreted.  In repository format version
1, unrecognized extensions settings cause Git to error out.

What happens if a user sets an extension setting but forgets to
increase the repository format version to 1?  The extension settings
were still recognized in that case; worse, unrecognized extensions
settings do *not* cause Git to error out.  So combining repository
format version 0 with extensions settings produces in some sense the
worst of both worlds.

To improve that situation, since 14c7fa269e4
(check_repository_format_gently(): refuse extensions for old
repositories, 2020-06-05) Git instead ignores extensions in v0 mode.
This way, v0 repositories get the historical (pre-2015) behavior and
maintain compatibility with Git implementations that do not know about
the v1 format.  Unfortunately, users had been using this sort of
configuration and this behavior change came to many as a surprise:

- users of "git config --worktree" that had followed its advice
  to enable extensions.worktreeConfig (without also increasing the
  repository format version) would find their worktree configuration
  no longer taking effect

- tools such as copybara[*] that had set extensions.partialClone in
  existing repositories (without also increasing the repository format
  version) would find that setting no longer taking effect

The behavior introduced in 14c7fa269e4 might be a good behavior if we
were traveling back in time to 2015, but we're far too late.  For some
reason I thought that it was what had been originally implemented and
that it had regressed.  Apologies for not doing my research when
14c7fa269e4 was under development.

Let's return to the behavior we've had since 2015: always act on
extensions.* settings, regardless of repository format version.  While
we're here, include some tests to describe the effect on the "upgrade
repository version" code path.

[*] https://github.com/google/copybara/commit/ca76c0b1e13c4e36448d12c2aba4a5d9d98fb6e7

Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Hopefully the last batch before -rc1

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'tb/commit-graph-no-check-oids' into master

Fix to the code to produce progress bar, which is new in the
upcoming release.

* tb/commit-graph-no-check-oids:
commit-graph: fix "Collecting commits from input" progress line

Merge branch 'ct/diff-with-merge-base-clarification' into master

Doc update.

* ct/diff-with-merge-base-clarification:
git-diff.txt: reorder possible usages
git-diff.txt: don't mark required argument as optional

Merge branch 'sg/commit-graph-progress-fix' into master

The code to produce progress output from "git commit-graph --write"
had a few breakages, which have been fixed.

* sg/commit-graph-progress-fix:
commit-graph: fix "Writing out commit graph" progress counter
commit-graph: fix progress of reachable commits

Merge branch 'ta/wait-on-aliased-commands-upon-signal' into master

When an aliased command, whose output is piped to a pager by git,
gets killed by a signal, the pager got into a funny state, which
has been corrected (again).

* ta/wait-on-aliased-commands-upon-signal:
Wait for child on signal death for aliases to externals
Wait for child on signal death for aliases to builtins

completion: add show --color-moved[-ws]

The completion for diff command was added in fd0bc175576 but
missed the show command which also supports --color-moved[-ws].

This suffers from the very same problem [1] as the referenced
commit: no comma-separated list completion for --color-moved-ws.

[1]: https://github.com/scop/bash-completion/issues/240

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: fix "Collecting commits from input" progress line

To display a progress line while reading commits from standard input
and looking them up, 5b6653e523 (builtin/commit-graph.c: dereference
tags in builtin, 2020-05-13) should have added a pair of
start_delayed_progress() and stop_progress() calls around the loop
reading stdin.  Alas, the stop_progress() call ended up at the wrong
place, after write_commit_graph(), which does all the commit-graph
computation and writing, and has several progress lines of its own.
Consequently, that new

  Collecting commits from input: 1234

progress line is overwritten by the first progress line shown by
write_commit_graph(), and its final "done" line is shown last, after
everything is finished:

  $ { sleep 3 ; git rev-list -3 HEAD ; sleep 1 ; } | ~/src/git/git commit-graph write --stdin-commits
  Expanding reachable commits in commit graph: 873402, done.
  Writing out commit graph in 4 passes: 100% (3493608/3493608), done.
  Collecting commits from input: 3, done.

Furthermore, that stop_progress() call was added after the 'cleanup'
label, where that loop reading stdin jumps in case of an error.  In
case of invalid input this then results in the "done" line shown after
the error message:

  $ { sleep 3 ; git rev-list -3 HEAD ; echo junk ; } | ~/src/git/git commit-graph write --stdin-commits
  error: unexpected non-hex object ID: junk
  Collecting commits from input: 3, done.

Move that stop_progress() call to the right place.

While at it, drop the unnecessary 'if (progress)' condition protecting
the stop_progress() call, because that function is prepared to handle
a NULL progress struct.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9100: stop depending on commit timestamps

An earlier "fix" to this script gave up updating it not to rely on
the current time because we cannot control what timestamp subversion
gives its commits.  We however could solve the issue in a different
way and still use deterministic timestamps on Git commits.

One fix would be to sort the list of trees before removing duplicates,
but that loses information:

  - we do care that the fetched history is in the same order

  - there's a tree which appears twice in the history, and we'd want to
    make sure that it's there both times

So instead, let's de-duplicate using a hash (preserving the order), and
drop only lines with identical trees and subjects (preserving the tree
which appears twice, since it has different subjects each time).

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>