Taylor Blau [Tue, 8 Dec 2020 22:05:09 +0000 (17:05 -0500)]
pack-bitmap: factor out 'bitmap_for_commit()'
A couple of callers within pack-bitmap.c duplicate logic to lookup a
given object id in the bitamps khash. Factor this out into a new
function, 'bitmap_for_commit()' to reduce some code duplication.
Make this new function non-static, since it will be used in later
commits from outside of pack-bitmap.c.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:04:34 +0000 (17:04 -0500)]
pack-bitmap-write: ignore BITMAP_FLAG_REUSE
The on-disk bitmap format has a flag to mark a bitmap to be "reused".
This is a rather curious feature, and works like this:
- a run of pack-objects would decide to mark the last 80% of the
bitmaps it generates with the reuse flag
- the next time we generate bitmaps, we'd see those reuse flags from
the last run, and mark those commits as special:
- we'd be more likely to select those commits to get bitmaps in
the new output
- when generating the bitmap for a selected commit, we'd reuse the
old bitmap as-is (rearranging the bits to match the new pack, of
course)
However, neither of these behaviors particularly makes sense.
Just because a commit happened to be bitmapped last time does not make
it a good candidate for having a bitmap this time. In particular, we may
choose bitmaps based on how recent they are in history, or whether a ref
tip points to them, and those things will change. We're better off
re-considering fresh which commits are good candidates.
Reusing the existing bitmap _is_ a reasonable thing to do to save
computation. But only reusing exact bitmaps is a weak form of this. If
we have an old bitmap for A and now want a new bitmap for its child, we
should be able to compute that only by looking at trees and that are new
to the child. But this code would consider only exact reuse (which is
perhaps why it was eager to select those commits in the first place).
Furthermore, the recent switch to the reverse-edge algorithm for
generating bitmaps dropped this optimization entirely (and yet still
performs better).
So let's do a few cleanups:
- drop the whole "reusing bitmaps" phase of generating bitmaps. It's
not helping anything, and is mostly unused code (or worse, code that
is using CPU but not doing anything useful)
- drop the use of the on-disk reuse flag to select commits to bitmap
- stop setting the on-disk reuse flag in bitmaps we generate (since
nothing respects it anymore)
We will keep a few innards of the reuse code, which will help us
implement a more capable version of the "reuse" optimization:
- simplify rebuild_existing_bitmaps() into a function that only builds
the mapping of bits between the old and new orders, but doesn't
actually convert any bitmaps
- make rebuild_bitmap() public; we'll call it lazily to convert bitmaps
as we traverse (using the mapping created above)
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 8 Dec 2020 22:04:30 +0000 (17:04 -0500)]
pack-bitmap-write: build fewer intermediate bitmaps
The bitmap_writer_build() method calls bitmap_builder_init() to
construct a list of commits reachable from the selected commits along
with a "reverse graph". This reverse graph has edges pointing from a
commit to other commits that can reach that commit. After computing a
reachability bitmap for a commit, the values in that bitmap are then
copied to the reachability bitmaps across the edges in the reverse
graph.
We can now relax the role of the reverse graph to greatly reduce the
number of intermediate reachability bitmaps we compute during this
reverse walk. The end result is that we walk objects the same number of
times as before when constructing the reachability bitmaps, but we also
spend much less time copying bits between bitmaps and have much lower
memory pressure in the process.
The core idea is to select a set of "important" commits based on
interactions among the sets of commits reachable from each selected commit.
The first technical concept is to create a new 'commit_mask' member in the
bb_commit struct. Note that the selected commits are provided in an
ordered array. The first thing to do is to mark the ith bit in the
commit_mask for the ith selected commit. As we walk the commit-graph, we
copy the bits in a commit's commit_mask to its parents. At the end of
the walk, the ith bit in the commit_mask for a commit C stores a boolean
representing "The ith selected commit can reach C."
As we walk, we will discover non-selected commits that are important. We
will get into this later, but those important commits must also receive
bit positions, growing the width of the bitmasks as we walk. At the true
end of the walk, the ith bit means "the ith _important_ commit can reach
C."
MAXIMAL COMMITS
---------------
We use a new 'maximal' bit in the bb_commit struct to represent whether
a commit is important or not. The term "maximal" comes from the
partially-ordered set of commits in the commit-graph where C >= P if P
is a parent of C, and then extending the relationship transitively.
Instead of taking the maximal commits across the entire commit-graph, we
instead focus on selecting each commit that is maximal among commits
with the same bits on in their commit_mask. This definition is
important, so let's consider an example.
Suppose we have three selected commits A, B, and C. These are assigned
bitmasks 100, 010, and 001 to start. Each of these can be marked as
maximal immediately because they each will be the uniquely maximal
commit that contains their own bit. Keep in mind that that these commits
may have different bitmasks after the walk; for example, if B can reach
C but A cannot, then the final bitmask for C is 011. Even in these
cases, C would still be a maximal commit among all commits with the
third bit on in their masks.
Now define sets X, Y, and Z to be the sets of commits reachable from A,
B, and C, respectively. The intersections of these sets correspond to
different bitmasks:
* 100: X - (Y union Z)
* 010: Y - (X union Z)
* 001: Z - (X union Y)
* 110: (X intersect Y) - Z
* 101: (X intersect Z) - Y
* 011: (Y intersect Z) - X
* 111: X intersect Y intersect Z
This can be visualized with the following Hasse diagram:
100 010 001
| \ / \ / |
| \/ \/ |
| /\ /\ |
| / \ / \ |
110 101 011
\___ | ___/
\ | /
111
Some of these bitmasks may not be represented, depending on the topology
of the commit-graph. In fact, we are counting on it, since the number of
possible bitmasks is exponential in the number of selected commits, but
is also limited by the total number of commits. In practice, very few
bitmasks are possible because most commits converge on a common "trunk"
in the commit history.
With this three-bit example, we wish to find commits that are maximal
for each bitmask. How can we identify this as we are walking?
As we walk, we visit a commit C. Since we are walking the commits in
topo-order, we know that C is visited after all of its children are
visited. Thus, when we get C from the revision walk we inspect the
'maximal' property of its bb_data and use that to determine if C is truly
important. Its commit_mask is also nearly final. If C is not one of the
originally-selected commits, then assign a bit position to C (by
incrementing num_maximal) and set that bit on in commit_mask. See
"MULTIPLE MAXIMAL COMMITS" below for more detail on this.
Now that the commit C is known to be maximal or not, consider each
parent P of C. Compute two new values:
* c_not_p : true if and only if the commit_mask for C contains a bit
that is not contained in the commit_mask for P.
* p_not_c : true if and only if the commit_mask for P contains a bit
that is not contained in the commit_mask for P.
If c_not_p is false, then P already has all of the bits that C would
provide to its commit_mask. In this case, move on to other parents as C
has nothing to contribute to P's state that was not already provided by
other children of P.
We continue with the case that c_not_p is true. This means there are
bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or()
to add those bits.
If p_not_c is also true, then set the maximal bit for P to one. This means
that if no other commit has P as a parent, then P is definitely maximal.
This is because no child had the same bitmask. It is important to think
about the maximal bit for P at this point as a temporary state: "P is
maximal based on current information."
In contrast, if p_not_c is false, then set the maximal bit for P to
zero. Further, clear all reverse_edges for P since any edges that were
previously assigned to P are no longer important. P will gain all
reverse edges based on C.
The final thing we need to do is to update the reverse edges for P.
These reverse edges respresent "which closest maximal commits
contributed bits to my commit_mask?" Since C contributed bits to P's
commit_mask in this case, C must add to the reverse edges of P.
If C is maximal, then C is a 'closest' maximal commit that contributed
bits to P. Add C to P's reverse_edges list.
Otherwise, C has a list of maximal commits that contributed bits to its
bitmask (and this list is exactly one element). Add all of these items
to P's reverse_edges list. Be careful to ignore duplicates here.
After inspecting all parents P for a commit C, we can clear the
commit_mask for C. This reduces the memory load to be limited to the
"width" of the commit graph.
Consider our ABC/XYZ example from earlier and let's inspect the state of
the commits for an interesting bitmask, say 011. Suppose that D is the
only maximal commit with this bitmask (in the first three bits). All
other commits with bitmask 011 have D as the only entry in their
reverse_edges list. D's reverse_edges list contains B and C.
COMPUTING REACHABILITY BITMAPS
------------------------------
Now that we have our definition, let's zoom out and consider what
happens with our new reverse graph when computing reachability bitmaps.
We walk the reverse graph in reverse-topo-order, so we visit commits
with largest commit_masks first. After we compute the reachability
bitmap for a commit C, we push the bits in that bitmap to each commit D
in the reverse edge list for C. Then, when we finally visit D we already
have the bits for everything reachable from maximal commits that D can
reach and we only need to walk the objects in the set-difference.
In our ABC/XYZ example, when we finally walk for the commit A we only
need to walk commits with bitmask equal to A's bitmask. If that bitmask
is 100, then we are only walking commits in X - (Y union Z) because the
bitmap already contains the bits for objects reachable from (X intersect
Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps
for the maximal commits with bitmasks 110 and 101).
The behavior is intended to walk each commit (and the trees that commit
introduces) at most once while allocating and copying fewer reachability
bitmaps. There is one caveat: what happens when there are multiple
maximal commits with the same bitmask, with respect to the initial set
of selected commits?
MULTIPLE MAXIMAL COMMITS
------------------------
Earlier, we mentioned that when we discover a new maximal commit, we
assign a new bit position to that commit and set that bit position to
one for that commit. This is absolutely important for interesting
commit-graphs such as git/git and torvalds/linux. The reason is due to
the existence of "butterflies" in the commit-graph partial order.
Here is an example of four commits forming a butterfly:
I J
|\ /|
| \/ |
| /\ |
|/ \|
M N
\ /
|/
Q
Here, I and J both have parents M and N. In general, these do not need
to be exact parent relationships, but reachability relationships. The
most important part is that M and N cannot reach each other, so they are
independent in the partial order. If I had commit_mask 10 and J had
commit_mask 01, then M and N would both be assigned commit_mask 11 and
be maximal commits with the bitmask 11. Then, what happens when M and N
can both reach a commit Q? If Q is also assigned the bitmask 11, then it
is not maximal but is reachable from both M and N.
While this is not necessarily a deal-breaker for our abstract definition
of finding maximal commits according to a given bitmask, we have a few
issues that can come up in our larger picture of constructing
reachability bitmaps.
In particular, if we do not also consider Q to be a "maximal" commit,
then we will walk commits reachable from Q twice: once when computing
the reachability bitmap for M and another time when computing the
reachability bitmap for N. This becomes much worse if the topology
continues this pattern with multiple butterflies.
The solution has already been mentioned: each of M and N are assigned
their own bits to the bitmask and hence they become uniquely maximal for
their bitmasks. Finally, Q also becomes maximal and thus we do not need
to walk its commits multiple times. The final bitmasks for these commits
are as follows:
I:10 J:01
|\ /|
| \ _____/ |
| /\____ |
|/ \ |
M:111 N:1101
\ /
Q:1111
Further, Q's reverse edge list is { M, N }, while M and N both have
reverse edge list { I, J }.
PERFORMANCE MEASUREMENTS
------------------------
Now that we've spent a LOT of time on the theory of this algorithm,
let's show that this is actually worth all that effort.
To test the performance, use GIT_TRACE2_PERF=1 when running
'git repack -abd' in a repository with no existing reachability bitmaps.
This avoids any issues with keeping existing bitmaps to skew the
numbers.
Inspect the "building_bitmaps_total" region in the trace2 output to
focus on the portion of work that is affected by this change. Here are
the performance comparisons for a few repositories. The timings are for
the following versions of Git: "multi" is the timing from before any
reverse graph is constructed, where we might perform multiple
traversals. "reverse" is for the previous change where the reverse graph
has every reachable commit. Finally "maximal" is the version introduced
here where the reverse graph only contains the maximal commits.
Repository: git/git
multi: 2.628 sec
reverse: 2.344 sec
maximal: 2.047 sec
Repository: torvalds/linux
multi: 64.7 sec
reverse: 205.3 sec
maximal: 44.7 sec
So in all cases we've not only recovered any time lost to switching to
the reverse-edge algorithm, but we come out ahead of "multi" in all
cases. Likewise, peak heap has gone back to something reasonable:
Repository: torvalds/linux
multi: 2.087 GB
reverse: 3.141 GB
maximal: 2.288 GB
While I do not have access to full fork networks on GitHub, Peff has run
this algorithm on the chromium/chromium fork network and reported a
change from 3 hours to ~233 seconds. That network is particularly
beneficial for this approach because it has a long, linear history along
with many tags. The "multi" approach was obviously quadratic and the new
approach is linear.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 8 Dec 2020 22:04:26 +0000 (17:04 -0500)]
pack-bitmap.c: check reads more aggressively when loading
Before 'load_bitmap_entries_v1()' reads an actual EWAH bitmap, it should
check that it can safely do so by ensuring that there are at least 6
bytes available to be read (four for the commit's index position, and
then two more for the xor offset and flags, respectively).
Likewise, it should check that the commit index it read refers to a
legitimate object in the pack.
The first fix catches a truncation bug that was exposed when testing,
and the second is purely precautionary.
There are some possible future improvements, not pursued here. They are:
- Computing the correct boundary of the bitmap itself in the caller
and ensuring that we don't read past it. This may or may not be
worth it, since in a truncation situation, all bets are off: (is the
trailer still there and the bitmap entries malformed, or is the
trailer truncated?). The best we can do is try to read what's there
as if it's correct data (and protect ourselves when it's obviously
bogus).
- Avoid the magic "6" by teaching read_be32() and read_u8() (both of
which are custom helpers for this function) to check sizes before
advancing the pointers.
- Adding more tests in this area. Testing these truncation situations
are remarkably fragile to even subtle changes in the bitmap
generation. So, the resulting tests are likely to be quite brittle.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 8 Dec 2020 22:04:22 +0000 (17:04 -0500)]
pack-bitmap-write: rename children to reverse_edges
The bitmap_builder_init() method walks the reachable commits in
topological order and constructs a "reverse graph" along the way. At the
moment, this reverse graph contains an edge from commit A to commit B if
and only if A is a parent of B. Thus, the name "children" is appropriate
for for this reverse graph.
In the next change, we will repurpose the reverse graph to not be
directly-adjacent commits in the commit-graph, but instead a more
abstract relationship. The previous changes have already incorporated
the necessary updates to fill_bitmap_commit() that allow these edges to
not be immediate children.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 8 Dec 2020 22:04:17 +0000 (17:04 -0500)]
t5310: add branch-based checks
The current rev-list tests that check the bitmap data only work on HEAD
instead of multiple branches. Expand the test cases to handle both
'master' and 'other' branches.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 8 Dec 2020 22:04:13 +0000 (17:04 -0500)]
commit: implement commit_list_contains()
It can be helpful to check if a commit_list contains a commit. Use
pointer equality, assuming lookup_commit() was used.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 8 Dec 2020 22:04:08 +0000 (17:04 -0500)]
bitmap: implement bitmap_is_subset()
The bitmap_is_subset() function checks if the 'self' bitmap contains any
bitmaps that are not on in the 'other' bitmap. Up until this patch, it
had a declaration, but no implementation or callers. A subsequent patch
will want this function, so implement it here.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 8 Dec 2020 22:04:03 +0000 (17:04 -0500)]
pack-bitmap-write: fill bitmap with commit history
The current implementation of bitmap_writer_build() creates a
reachability bitmap for every walked commit. After computing a bitmap
for a commit, those bits are pushed to an in-progress bitmap for its
children.
fill_bitmap_commit() assumes the bits corresponding to objects
reachable from the parents of a commit are already set. This means that
when visiting a new commit, we only have to walk the objects reachable
between it and any of its parents.
A future change to bitmap_writer_build() will relax this condition so
not all parents have their bits set. Prepare for that by having
'fill_bitmap_commit()' walk parents until reaching commits whose bits
are already set. Then, walk the trees for these commits as well.
This has no functional change with the current implementation of
bitmap_writer_build().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:59 +0000 (17:03 -0500)]
pack-bitmap-write: pass ownership of intermediate bitmaps
Our algorithm to generate reachability bitmaps walks through the commit
graph from the bottom up, passing bitmap data from each commit to its
descendants. For a linear stretch of history like:
A -- B -- C
our sequence of steps is:
- compute the bitmap for A by walking its trees, etc
- duplicate A's bitmap as a starting point for B; we can now free A's
bitmap, since we only needed it as an intermediate result
- OR in any extra objects that B can reach into its bitmap
- duplicate B's bitmap as a starting point for C; likewise, free B's
bitmap
- OR in objects for C, and so on...
Rather than duplicating bitmaps and immediately freeing the original, we
can just pass ownership from commit to commit. Note that this doesn't
always work:
- the recipient may be a merge which already has an intermediate
bitmap from its other ancestor. In that case we have to OR our
result into it. Note that the first ancestor to reach the merge does
get to pass ownership, though.
- we may have multiple children; we can only pass ownership to one of
them
However, it happens often enough and copying bitmaps is expensive enough
that this provides a noticeable speedup. On a clone of linux.git, this
reduces the time to generate bitmaps from 205s to 70s. This is about the
same amount of time it took to generate bitmaps using our old "many
traversals" algorithm (the previous commit measures the identical
scenario as taking 63s). It unfortunately provides only a very modest
reduction in the peak memory usage, though.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:55 +0000 (17:03 -0500)]
pack-bitmap-write: reimplement bitmap writing
The bitmap generation code works by iterating over the set of commits
for which we plan to write bitmaps, and then for each one performing a
traditional traversal over the reachable commits and trees, filling in
the bitmap. Between two traversals, we can often reuse the previous
bitmap result as long as the first commit is an ancestor of the second.
However, our worst case is that we may end up doing "n" complete
complete traversals to the root in order to create "n" bitmaps.
In a real-world case (the shared-storage repo consisting of all GitHub
forks of chromium/chromium), we perform very poorly: generating bitmaps
takes ~3 hours, whereas we can walk the whole object graph in ~3
minutes.
This commit completely rewrites the algorithm, with the goal of
accessing each object only once. It works roughly like this:
- generate a list of commits in topo-order using a single traversal
- invert the edges of the graph (so have parents point at their
children)
- make one pass in reverse topo-order, generating a bitmap for each
commit and passing the result along to child nodes
We generate correct results because each node we visit has already had
all of its ancestors added to the bitmap. And we make only two linear
passes over the commits.
We also visit each tree usually only once. When filling in a bitmap, we
don't bother to recurse into trees whose bit is already set in the
bitmap (since we know we've already done so when setting their bit).
That means that if commit A references tree T, none of its descendants
will need to open T again. I say "usually", though, because it is
possible for a given tree to be mentioned in unrelated parts of history
(e.g., cherry-picking to a parallel branch).
So we've accomplished our goal, and the resulting algorithm is pretty
simple to understand. But there are some downsides, at least with this
initial implementation:
- we no longer reuse the results of any on-disk bitmaps when
generating. So we'd expect to sometimes be slower than the original
when bitmaps already exist. However, this is something we'll be able
to add back in later.
- we use much more memory. Instead of keeping one bitmap in memory at
a time, we're passing them up through the graph. So our memory use
should scale with the graph width (times the size of a bitmap).
So how does it perform?
For a clone of linux.git, generating bitmaps from scratch with the old
algorithm took 63s. Using this algorithm it takes 205s. Which is much
worse, but _might_ be acceptable if it behaved linearly as the size
grew. It also increases peak heap usage by ~1G. That's not impossibly
large, but not encouraging.
On the complete fork-network of torvalds/linux, it increases the peak
RAM usage by 40GB. Yikes. (I forgot to record the time it took, but the
memory usage was too much to consider this reasonable anyway).
On the complete fork-network of chromium/chromium, I ran out of memory
before succeeding. Some back-of-the-envelope calculations indicate it
would need 80+GB to complete.
So at this stage, we've managed to make things much worse. But because
of the way this new algorithm is structured, there are a lot of
opportunities for optimization on top. We'll start implementing those in
the follow-on patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:50 +0000 (17:03 -0500)]
ewah: add bitmap_dup() function
There's no easy way to make a copy of a bitmap. Obviously a caller can
iterate over the bits and set them one by one in a new bitmap, but we
can go much faster by copying whole words with memcpy().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:46 +0000 (17:03 -0500)]
ewah: implement bitmap_or()
We have a function to bitwise-OR an ewah into an uncompressed bitmap,
but not to OR two uncompressed bitmaps. Let's add it.
Interestingly, we have a public header declaration going back to
e1273106f6 (ewah: compressed bitmap implementation, 2013-11-14), but the
function was never implemented. That was all OK since there were no
users of 'bitmap_or()', but a first caller will be added in a couple of
patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:42 +0000 (17:03 -0500)]
ewah: make bitmap growth less aggressive
If you ask to set a bit in the Nth word and we haven't yet allocated
that many slots in our array, we'll increase the bitmap size to 2*N.
This means we might frequently end up with bitmaps that are twice the
necessary size (as soon as you ask for the biggest bit, we'll size up to
twice that).
But if we just allocate as many words as were asked for, we may not grow
fast enough. The worst case there is setting bit 0, then 1, etc. Each
time we grow we'd just extend by one more word, giving us linear
reallocations (and quadratic memory copies).
A middle ground is relying on alloc_nr(), which causes us to grow by a
factor of roughly 3/2 instead of 2. That's less aggressive than
doubling, and it may help avoid fragmenting memory. (If we start with N,
then grow twice, our total is N*(3/2)^2 = 9N/4. After growing twice,
that array of size 9N/4 can fit into the space vacated by the original
array and first growth, N+3N/2 = 10N/4 > 9N/4, leading to less
fragmentation in memory).
Our worst case is still 3/2N wasted bits (you set bit N-1, then setting
bit N causes us to grow by 3/2), but our average should be much better.
This isn't usually that big a deal, but it will matter as we shift the
reachability bitmap generation code to store more bitmaps in memory.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:38 +0000 (17:03 -0500)]
ewah: factor out bitmap growth
We auto-grow bitmaps when somebody asks to set a bit whose position is
outside of our currently allocated range. Other operations besides
single bit-setting might need to do this, too, so let's pull it into its
own function.
Note that we change the semantics a little: you now ask for the number
of words you'd like to have, not the id of the block you'd like to write
to.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:33 +0000 (17:03 -0500)]
rev-list: die when --test-bitmap detects a mismatch
You can use "git rev-list --test-bitmap HEAD" to check that bitmaps
produce the same answer we'd get from a regular traversal. But if we
detect an error, we only print "mismatch", and still exit with a
successful error code.
That makes the uses of --test-bitmap in the test suite (e.g., in t5310)
mostly pointless: even if we saw an error, the tests wouldn't notice.
Let's instead call die(), which will let these tests work as designed,
and alert us if the bitmaps are bogus.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:28 +0000 (17:03 -0500)]
t5310: drop size of truncated ewah bitmap
We truncate the .bitmap file to 512 bytes and expect to run into
problems reading an individual ewah file. But this length is somewhat
arbitrary, and just happened to work when the test was added in
9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14).
An upcoming commit will change the size of the history we create in the
test repo, which will cause this test to fail. We can future-proof it a
bit more by reducing the size of the truncated bitmap file.
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:24 +0000 (17:03 -0500)]
pack-bitmap: bounds-check size of cache extension
A .bitmap file may have a "name hash cache" extension, which puts a
sequence of uint32_t values (one per object) at the end of the file.
When we see a flag indicating this extension, we blindly subtract the
appropriate number of bytes from our available length. However, if the
.bitmap file is too short, we'll underflow our length variable and wrap
around, thinking we have a very large length. This can lead to reading
out-of-bounds bytes while loading individual ewah bitmaps.
We can fix this by checking the number of available bytes when we parse
the header. The existing "truncated bitmap" test is now split into two
tests: one where we don't have this extension at all (and hence actually
do try to read a truncated ewah bitmap) and one where we realize
up-front that we can't even fit in the cache structure. We'll check
stderr in each case to make sure we hit the error we're expecting.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Dec 2020 22:03:19 +0000 (17:03 -0500)]
pack-bitmap: fix header size check
When we parse a .bitmap header, we first check that we have enough bytes
to make a valid header. We do that based on sizeof(struct
bitmap_disk_header). However, as of
0f4d6cada8 (pack-bitmap: make bitmap
header handling hash agnostic, 2019-02-19), that struct oversizes its
checksum member to GIT_MAX_RAWSZ. That means we need to adjust for the
difference between that constant and the size of the actual hash we're
using. That commit adjusted the code which moves our pointer forward,
but forgot to update the size check.
This meant we were overly strict about the header size (requiring room
for a 32-byte worst-case hash, when sha1 is only 20 bytes). But in
practice it didn't matter because bitmap files tend to have at least 12
bytes of actual data anyway, so it was unlikely for a valid file to be
caught by this.
Let's fix it by pulling the header size into a separate variable and
using it in both spots. That fixes the bug and simplifies the code to make
it harder to have a mismatch like this in the future. It will also come
in handy in the next patch for more bounds checking.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 8 Dec 2020 22:03:14 +0000 (17:03 -0500)]
ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW()
'ewah/ewah_bitmap.c:buffer_grow()' is responsible for growing the buffer
used to store the bits of an EWAH bitmap. It is essentially doing the
same task as the 'ALLOC_GROW()' macro, so use that instead.
This simplifies the callers of 'buffer_grow()', who no longer have to
ask for a specific size, but rather specify how much of the buffer they
need. They also no longer need to guard 'buffer_grow()' behind an if
statement, since 'ALLOC_GROW()' (and, by extension, 'buffer_grow()') is
a noop if the buffer is already large enough.
But, the most significant change is that this fixes a bug when calling
buffer_grow() with both 'alloc_size' and 'new_size' set to 1. In this
case, truncating integer math will leave the new size set to 1, causing
the buffer to never grow.
Instead, let alloc_nr() handle this, which asks for '(new_size + 16) * 3
/ 2' instead of 'new_size * 3 / 2'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 11 Nov 2020 21:16:45 +0000 (13:16 -0800)]
Fifth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 11 Nov 2020 21:18:39 +0000 (13:18 -0800)]
Merge branch 'jc/sequencer-stopped-sha-simplify'
Recently the format of an internal state file "rebase -i" uses has
been tightened up for consistency, which would hurt those who start
"rebase -i" with old git and then continue with new git. Loosen
the reader side a bit (which we may want to tighten again in a year
or so).
* jc/sequencer-stopped-sha-simplify:
sequencer: tolerate abbreviated stopped-sha file
Junio C Hamano [Wed, 11 Nov 2020 21:18:39 +0000 (13:18 -0800)]
Merge branch 'js/test-file-size'
Test clean-up.
* js/test-file-size:
tests: consolidate the `file_size` function into `test-lib-functions.sh`
Junio C Hamano [Wed, 11 Nov 2020 21:18:39 +0000 (13:18 -0800)]
Merge branch 'js/ci-github-set-env'
CI update.
* js/ci-github-set-env:
ci: avoid using the deprecated `set-env` construct
Junio C Hamano [Wed, 11 Nov 2020 21:18:38 +0000 (13:18 -0800)]
Merge branch 'js/p4-default-branch'
"git p4" now honors init.defaultBranch configuration.
* js/p4-default-branch:
p4: respect init.defaultBranch
Junio C Hamano [Wed, 11 Nov 2020 21:18:38 +0000 (13:18 -0800)]
Merge branch 'js/test-whitespace-fixes'
Test code clean-up.
* js/test-whitespace-fixes:
t9603: use tabs for indentation
t5570: remove trailing padding
t5400,t5402: consistently indent with tabs, not with spaces
t3427: adjust stale comment
t3406: indent with tabs, not spaces
t1004: insert missing "branch" in a message
Junio C Hamano [Wed, 11 Nov 2020 21:18:38 +0000 (13:18 -0800)]
Merge branch 'mc/typofix'
Docfix.
* mc/typofix:
doc: fixing two trivial typos in Documentation/
Junio C Hamano [Wed, 11 Nov 2020 21:18:38 +0000 (13:18 -0800)]
Merge branch 'jc/abbrev-doc'
The documentation on the "--abbrev=<n>" option did not say the
output may be longer than "<n>" hexdigits, which has been
clarified.
* jc/abbrev-doc:
doc: clarify that --abbrev=<n> is about the minimum length
Junio C Hamano [Wed, 11 Nov 2020 21:18:38 +0000 (13:18 -0800)]
Merge branch 'cw/ci-ghwf-check-ws-errors'
Dev support update.
* cw/ci-ghwf-check-ws-errors:
ci: make the whitespace checker more robust
Junio C Hamano [Wed, 11 Nov 2020 21:18:38 +0000 (13:18 -0800)]
Merge branch 'rs/worktree-list-show-locked'
Typofix.
* rs/worktree-list-show-locked:
t2402: fix typo
Junio C Hamano [Wed, 11 Nov 2020 21:18:38 +0000 (13:18 -0800)]
Merge branch 'rs/pack-write-hashwrite-simplify'
Code clean-up.
* rs/pack-write-hashwrite-simplify:
pack-write: use hashwrite_be32() instead of double-buffering array
Junio C Hamano [Wed, 11 Nov 2020 21:18:38 +0000 (13:18 -0800)]
Merge branch 'sd/prompt-local-variable'
Code clean-up.
* sd/prompt-local-variable:
git-prompt.sh: localize `option` in __git_ps1_show_upstream
Junio C Hamano [Wed, 11 Nov 2020 21:18:37 +0000 (13:18 -0800)]
Merge branch 'rs/clear-commit-marks-in-repo'
Code clean-up.
* rs/clear-commit-marks-in-repo:
bisect: clear flags in passed repository
object: allow clear_commit_marks_all to handle any repo
Junio C Hamano [Wed, 11 Nov 2020 21:18:37 +0000 (13:18 -0800)]
Merge branch 'so/format-patch-doc-on-default-diff-format'
Docfix.
* so/format-patch-doc-on-default-diff-format:
doc/diff-options: fix out of place mentions of '--patch/-p'
Junio C Hamano [Mon, 9 Nov 2020 21:31:58 +0000 (13:31 -0800)]
Fourth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 9 Nov 2020 22:06:29 +0000 (14:06 -0800)]
Merge branch 'js/default-branch-name-adjust-t5411'
Prepare a test script to transition of the default branch name to
'main'.
* js/default-branch-name-adjust-t5411:
t5411: finish preparing for `main` being the default branch name
t5411: adjust the remaining support files for init.defaultBranch=main
t5411: start adjusting the support files for init.defaultBranch=main
t5411: start using the default branch name "main"
Junio C Hamano [Mon, 9 Nov 2020 22:06:29 +0000 (14:06 -0800)]
Merge branch 'fc/zsh-completion'
Zsh autocompletion (in contrib/) update.
* fc/zsh-completion: (29 commits)
zsh: update copyright notices
completion: bash: remove old compat wrappers
completion: bash: cleanup cygwin check
completion: bash: trivial cleanup
completion: zsh: add simple version check
completion: zsh: trivial simplification
completion: zsh: add alias descriptions
completion: zsh: improve command tags
completion: zsh: refactor command completion
completion: zsh: shuffle functions around
completion: zsh: simplify file_direct
completion: zsh: simplify nl_append
completion: zsh: trivial cleanup
completion: zsh: simplify direct compadd
completion: zsh: simplify compadd functions
completion: zsh: fix splitting of words
completion: zsh: add missing direct_append
completion: fix conflict with bashcomp
completion: zsh: fix completion for --no-.. options
completion: bash: remove zsh wrapper
...
Junio C Hamano [Mon, 9 Nov 2020 22:06:29 +0000 (14:06 -0800)]
Merge branch 'jk/sideband-more-error-checking'
The code to detect premature EOF in the sideband demultiplexer has
been cleaned up.
* jk/sideband-more-error-checking:
sideband: diagnose more sideband anomalies
Junio C Hamano [Mon, 9 Nov 2020 22:06:27 +0000 (14:06 -0800)]
Merge branch 'jk/committer-date-is-author-date-fix-simplify'
Code simplification.
* jk/committer-date-is-author-date-fix-simplify:
am, sequencer: stop parsing our own committer ident
Junio C Hamano [Mon, 9 Nov 2020 22:06:26 +0000 (14:06 -0800)]
Merge branch 'ab/git-remote-exit-code'
Exit codes from "git remote add" etc. were not usable by scripted
callers.
* ab/git-remote-exit-code:
remote: add meaningful exit code on missing/existing
Junio C Hamano [Mon, 9 Nov 2020 22:06:26 +0000 (14:06 -0800)]
Merge branch 'pb/ref-filter-with-crlf'
A commit and tag object may have CR at the end of each and
every line (you can create such an object with hash-object or
using --cleanup=verbatim to decline the default clean-up
action), but it would make it impossible to have a blank line
to separate the title from the body of the message. Be lenient
and accept a line with lone CR on it as a blank line, too.
* pb/ref-filter-with-crlf:
log, show: add tests for messages containing CRLF
ref-filter: handle CRLF at end-of-line more gracefully
Junio C Hamano [Mon, 9 Nov 2020 22:06:26 +0000 (14:06 -0800)]
Merge branch 'jk/checkout-index-errors'
"git checkout-index" did not consistently signal an error with its
exit status.
* jk/checkout-index-errors:
checkout-index: propagate errors to exit code
checkout-index: drop error message from empty --stage=all
Junio C Hamano [Mon, 9 Nov 2020 22:06:25 +0000 (14:06 -0800)]
Merge branch 'jk/perl-warning'
Dev support.
* jk/perl-warning:
perl: check for perl warnings while running tests
Junio C Hamano [Mon, 9 Nov 2020 22:06:25 +0000 (14:06 -0800)]
Merge branch 'nk/diff-files-vs-fsmonitor'
"git diff" and other commands that share the same machinery to
compare with working tree files have been taught to take advantage
of the fsmonitor data when available.
* nk/diff-files-vs-fsmonitor:
p7519-fsmonitor: add a git add benchmark
p7519-fsmonitor: refactor to avoid code duplication
perf lint: add make test-lint to perf tests
t/perf: add fsmonitor perf test for git diff
t/perf/p7519-fsmonitor.sh: warm cache on first git status
t/perf/README: elaborate on output format
fsmonitor: use fsmonitor data in `git diff`
Junio C Hamano [Mon, 9 Nov 2020 22:06:25 +0000 (14:06 -0800)]
Merge branch 'as/tests-cleanup'
Micro clean-up of a couple of test scripts.
* as/tests-cleanup:
t2200,t9832: avoid using 'git' upstream in a pipe
Junio C Hamano [Mon, 9 Nov 2020 22:06:25 +0000 (14:06 -0800)]
Merge branch 'en/dir-rename-tests'
More preliminary tests have been added to document desired outcome
of various "directory rename" situations.
* en/dir-rename-tests:
t6423: more involved rules for renaming directories into each other
t6423: update directory rename detection tests with new rule
t6423: more involved directory rename test
directory-rename-detection.txt: update references to regression tests
Junio C Hamano [Mon, 9 Nov 2020 22:06:25 +0000 (14:06 -0800)]
Merge branch 'mr/bisect-in-c-3'
Rewriting "git bisect" in C continues.
* mr/bisect-in-c-3:
bisect--helper: retire `--bisect-autostart` subcommand
bisect--helper: retire `--write-terms` subcommand
bisect--helper: retire `--check-expected-revs` subcommand
bisect--helper: reimplement `bisect_state` & `bisect_head` shell functions in C
bisect--helper: retire `--next-all` subcommand
bisect--helper: retire `--bisect-clean-state` subcommand
bisect--helper: finish porting `bisect_start()` to C
Johannes Schindelin [Mon, 9 Nov 2020 00:09:25 +0000 (00:09 +0000)]
t9603: use tabs for indentation
This patch will let the new `check-whitespace` GitHub workflow be happy
with the upcoming patch series that wants to search-and-replace `master`
with `main` in t9603 and some other test scripts.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Mon, 9 Nov 2020 00:09:24 +0000 (00:09 +0000)]
t5570: remove trailing padding
Two blocks in t5570 want to align the closing double quotes, padding
with spaces if needed. Since the maximum length of those lines is
defined by the branch name `master`, the upcoming rename to `main` would
unalign the quotes.
But then, it is unclear how those aligned closing quotes should help
readability anyway, so let's just remove that padding altogether.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Mon, 9 Nov 2020 00:09:23 +0000 (00:09 +0000)]
t5400,t5402: consistently indent with tabs, not with spaces
This patch actually prepares for the upcoming patches to replace
`master` with `main` in these tests: we do not want those changes to be
flagged by the new `check-whitespace` GitHub workflow (even if those
changes do not introduce the whitespace issues, they touch lines
affected by those issues without fixing them).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Mon, 9 Nov 2020 00:09:22 +0000 (00:09 +0000)]
t3427: adjust stale comment
In
b6211b89eb3 (tests: avoid variations of the `master` branch name,
2020-09-26), the `master[123]` branch names were renamed to
`topic_[123]`. A non-literal mention of the corresponding files was
missed in that commit.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Mon, 9 Nov 2020 00:09:21 +0000 (00:09 +0000)]
t3406: indent with tabs, not spaces
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Mon, 9 Nov 2020 00:09:20 +0000 (00:09 +0000)]
t1004: insert missing "branch" in a message
The message in question reads awkward with the name "master", but will
be even more confusing once that is renamed to "main". Let's adjust it
in advance of said rename.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Sun, 8 Nov 2020 08:41:51 +0000 (08:41 +0000)]
p4: respect init.defaultBranch
In `git p4 clone`, we hard-code the branch name `master` instead of
looking what the _actual_ initial branch name is. Let's fix that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Sat, 7 Nov 2020 01:21:45 +0000 (01:21 +0000)]
ci: avoid using the deprecated `set-env` construct
The `set-env` construct was deprecated as of the announcement in
https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/
Let's use the recommended alternative instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Sat, 7 Nov 2020 01:12:57 +0000 (01:12 +0000)]
tests: consolidate the `file_size` function into `test-lib-functions.sh`
In
8de7eeb54b6 (compression: unify pack.compression configuration
parsing, 2016-11-15), we introduced identical copies of the `file_size`
helper into three test scripts, with the plan to eventually consolidate
them into a single copy.
Let's do that, and adjust the function name to adhere to the `test_*`
naming convention.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Marlon Rac Cambasis [Thu, 5 Nov 2020 20:48:14 +0000 (12:48 -0800)]
doc: fixing two trivial typos in Documentation/
Fix misspelled "specified" and "occurred" in documentation and
comments.
Signed-off-by: Marlon Rac Cambasis <marlonrc08@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 4 Nov 2020 22:01:37 +0000 (14:01 -0800)]
doc: clarify that --abbrev=<n> is about the minimum length
Early text written in 2006 explains the "--abbrev=<n>" option to
"show only a partial prefix", without saying that the length of the
partial prefix is not necessarily the number given to the option to
ensure that the output names the object uniquely.
Update documentation for the diff family of commands, "blame",
"branch --verbose", "ls-files" and "ls-tree" to stress that the
short prefix must uniquely refer to an object, and <n> is merely
the mininum number of hexdigits used in the prefix.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Tue, 3 Nov 2020 15:55:31 +0000 (15:55 +0000)]
ci: make the whitespace checker more robust
In
32c83afc2c69 (ci: github action - add check for whitespace errors,
2020-09-22), we introduced a GitHub workflow that automatically checks
Pull Requests for whitespace problems.
However, when affected lines contain one or more double quote
characters, this workflow failed to attach the informative comment
because the Javascript snippet incorrectly interpreted these quotes
instead of using the `git log` output as-is.
Let's fix that.
While at it, let's `await` the result of the `createComment()` function.
Finally, we enclose the log in the comment with ```...``` to avoid
having the diff marker be misinterpreted as an enumeration bullet.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Tue, 3 Nov 2020 11:48:07 +0000 (11:48 +0000)]
t2402: fix typo
In
c57b3367bed (worktree: teach `list` to annotate locked worktree,
2020-10-11), we introduced a test case that wanted to talk about
"worktrees" but talked about "worktress" instead. Let's fix that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 2 Nov 2020 21:17:20 +0000 (13:17 -0800)]
Third batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 2 Nov 2020 21:17:47 +0000 (13:17 -0800)]
Merge branch 'jc/doc-final-resend'
Update developer doc.
* jc/doc-final-resend:
SubmittingPatches: clarify the purpose of the final resend
Junio C Hamano [Mon, 2 Nov 2020 21:17:46 +0000 (13:17 -0800)]
Merge branch 'es/tutorial-mention-asciidoc-early'
Doc update.
* es/tutorial-mention-asciidoc-early:
MyFirstContribution: clarify asciidoc dependency
Junio C Hamano [Mon, 2 Nov 2020 21:17:46 +0000 (13:17 -0800)]
Merge branch 'js/default-branch-name-part-4-minus-1'
Adjust tests so that they won't scream when the default initial
branch name is changed to 'main'.
* js/default-branch-name-part-4-minus-1:
t1400: prepare for `main` being default branch name
tests: prepare aligned mentions of the default branch name
t9902: prepare a test for the upcoming default branch name
t3200: prepare for `main` being shorter than `master`
t5703: adjust a test case for the upcoming default branch name
t6200: adjust suppression pattern to also match "main"
tests: start moving to a different default main branch name
t9801: use `--` in preparation for default branch rename
fmt-merge-msg: also suppress "into main" by default
Junio C Hamano [Mon, 2 Nov 2020 21:17:46 +0000 (13:17 -0800)]
Merge branch 've/userdiff-bash'
The userdiff pattern learned to identify the function definition in
POSIX shells and bash.
* ve/userdiff-bash:
userdiff: support Bash
Junio C Hamano [Mon, 2 Nov 2020 21:17:45 +0000 (13:17 -0800)]
Merge branch 'bc/svn-hash-oid-fix'
A recent oid->hash conversion missed one spot, breaking "git svn".
* bc/svn-hash-oid-fix:
svn: use correct variable name for short OID
Junio C Hamano [Mon, 2 Nov 2020 21:17:45 +0000 (13:17 -0800)]
Merge branch 'js/t7006-cleanup'
Code clean-up.
* js/t7006-cleanup:
t7006: Use test_path_is_* functions in test script
Junio C Hamano [Mon, 2 Nov 2020 21:17:44 +0000 (13:17 -0800)]
Merge branch 'en/sequencer-rollback-lock-cleanup'
Code clean-up.
* en/sequencer-rollback-lock-cleanup:
sequencer: remove duplicate rollback_lock_file() call
Junio C Hamano [Mon, 2 Nov 2020 21:17:43 +0000 (13:17 -0800)]
Merge branch 'mk/diff-ignore-regex'
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.
* mk/diff-ignore-regex:
diff: add -I<regex> that ignores matching changes
merge-base, xdiff: zero out xpparam_t structures
Junio C Hamano [Mon, 2 Nov 2020 21:17:43 +0000 (13:17 -0800)]
Merge branch 'jt/apply-reverse-twice'
"git apply -R" did not handle patches that touch the same path
twice correctly, which has been corrected. This is most relevant
in a patch that changes a path from a regular file to a symbolic
link (and vice versa).
* jt/apply-reverse-twice:
apply: when -R, also reverse list of sections
Junio C Hamano [Mon, 2 Nov 2020 21:17:43 +0000 (13:17 -0800)]
Merge branch 'sc/sequencer-gpg-octopus'
"git rebase --rebase-merges" did not correctly pass --gpg-sign
command line option to underlying "git merge" when replaying a merge
using non-default merge strategy or when replaying an octopus merge
(because replaying a two-head merge with the default strategy was
done in a separate codepath, the problem did not trigger for most
users), which has been corrected.
* sc/sequencer-gpg-octopus:
t3435: add tests for rebase -r GPG signing
sequencer: pass explicit --no-gpg-sign to merge
sequencer: fix gpg option passed to merge subcommand
Junio C Hamano [Mon, 2 Nov 2020 21:17:42 +0000 (13:17 -0800)]
Merge branch 'en/test-selector'
Our test scripts can be told to run only individual pieces while
skipping others with the "--run=..." option; they were taught to
take a substring of test title, in addition to numbers, to name the
test pieces to run.
* en/test-selector:
test-lib: reduce verbosity of skipped tests
t6006, t6012: adjust tests to use 'setup' instead of synonyms
test-lib: allow selecting tests by substring/glob with --run
Junio C Hamano [Mon, 2 Nov 2020 21:17:42 +0000 (13:17 -0800)]
Merge branch 'jk/report-fn-typedef'
Code clean-up.
* jk/report-fn-typedef:
usage: define a type for a reporting function
Junio C Hamano [Mon, 2 Nov 2020 21:17:41 +0000 (13:17 -0800)]
Merge branch 'nk/dir-c-comment-update'
Update stale in-code comment.
* nk/dir-c-comment-update:
dir.c: fix comments to agree with argument name
Junio C Hamano [Mon, 2 Nov 2020 21:17:41 +0000 (13:17 -0800)]
Merge branch 'jk/no-common'
Dev support to catch a tentative definition of a variable in our C
code as an error.
* jk/no-common:
config.mak.dev: build with -fno-common
Junio C Hamano [Mon, 2 Nov 2020 21:17:40 +0000 (13:17 -0800)]
Merge branch 'as/sample-push-to-checkout-hook'
Add a sample 'push-to-checkout' hook, that performs the same as
what the built-in default action does.
* as/sample-push-to-checkout-hook:
hook: add sample template for push-to-checkout
Junio C Hamano [Mon, 2 Nov 2020 21:17:40 +0000 (13:17 -0800)]
Merge branch 'jk/fast-import-marks-cleanup'
Code clean-up.
* jk/fast-import-marks-cleanup:
fast-import: remove duplicated option-parsing line
Junio C Hamano [Mon, 2 Nov 2020 21:17:40 +0000 (13:17 -0800)]
Merge branch 'lo/zsh-completion'
Update instructions for command line completion (in contrib/) for zsh.
* lo/zsh-completion:
completion: fix zsh installation instructions
Junio C Hamano [Mon, 2 Nov 2020 21:17:39 +0000 (13:17 -0800)]
Merge branch 'tk/credential-config'
"git credential' didn't honor the core.askPass configuration
variable (among other things), which has been corrected.
* tk/credential-config:
credential: load default config
Junio C Hamano [Mon, 2 Nov 2020 21:17:39 +0000 (13:17 -0800)]
Merge branch 'dl/diff-merge-base'
"git diff A...B" learned "git diff --merge-base A B", which is a
longer short-hand to say the same thing.
* dl/diff-merge-base:
contrib/completion: complete `git diff --merge-base`
builtin/diff-tree: learn --merge-base
builtin/diff-index: learn --merge-base
t4068: add --merge-base tests
diff-lib: define diff_get_merge_base()
diff-lib: accept option flags in run_diff_index()
contrib/completion: extract common diff/difftool options
git-diff.txt: backtick quote command text
git-diff-index.txt: make --cached description a proper sentence
t4068: remove unnecessary >tmp
Junio C Hamano [Mon, 2 Nov 2020 21:17:39 +0000 (13:17 -0800)]
Merge branch 'bk/sob-dco'
Document that the meaning of a Signed-off-by trailer can vary from
project to project in the end-user documentation, and clarify what
it means to this project.
* bk/sob-dco:
Documentation: stylistically normalize references to Signed-off-by:
SubmittingPatches: clarify DCO is our --signoff rule
Documentation: clarify and expand description of --signoff
doc: preparatory clean-up of description on the sign-off option
Junio C Hamano [Mon, 2 Nov 2020 21:17:39 +0000 (13:17 -0800)]
Merge branch 'ds/maintenance-commit-graph-auto-fix'
Test-coverage enhancement of running commit-graph task "git
maintenance" as needed led to discovery and fix of a bug.
* ds/maintenance-commit-graph-auto-fix:
maintenance: core.commitGraph=false prevents writes
maintenance: test commit-graph auto condition
Junio C Hamano [Mon, 2 Nov 2020 21:17:39 +0000 (13:17 -0800)]
Merge branch 'ds/commit-graph-merging-fix'
When "git commit-graph" detects the same commit recorded more than
once while it is merging the layers, it used to die. The code now
ignores all but one of them and continues.
* ds/commit-graph-merging-fix:
commit-graph: don't write commit-graph when disabled
commit-graph: ignore duplicates when merging layers
Junio C Hamano [Mon, 2 Nov 2020 21:17:38 +0000 (13:17 -0800)]
Merge branch 'es/test-cmp-typocatcher'
A test helper "test_cmp A B" was taught to diagnose missing files A
or B as a bug in test, but some tests legitimately wanted to notice
a failure to even create file B as an error, in addition to leaving
the expected result in it, and were misdiagnosed as a bug. This
has been corrected.
* es/test-cmp-typocatcher:
Revert "test_cmp: diagnose incorrect arguments"
Junio C Hamano [Mon, 2 Nov 2020 21:17:37 +0000 (13:17 -0800)]
Merge branch 'jk/fast-import-marks-alloc-fix'
"git fast-import" wasted a lot of memory when many marks were in use.
* jk/fast-import-marks-alloc-fix:
fast-import: fix over-allocation of marks storage
Junio C Hamano [Mon, 2 Nov 2020 21:17:37 +0000 (13:17 -0800)]
Merge branch 'js/avoid-split-sideband-message'
The side-band status report can be sent at the same time as the
primary payload multiplexed, but the demultiplexer on the receiving
end incorrectly split a single status report into two, which has
been corrected.
* js/avoid-split-sideband-message:
test-pkt-line: drop colon from sideband identity
sideband: report unhandled incomplete sideband messages as bugs
sideband: avoid reporting incomplete sideband messages
Sibo Dong [Sat, 31 Oct 2020 22:09:46 +0000 (22:09 +0000)]
git-prompt.sh: localize `option` in __git_ps1_show_upstream
The variable 'option' is used in __git_ps1_show_upstream()
without being localized.
This clobbers the variable the user may be using for other
purposes, which is bad. Luckily, $option is not used to carry
information around in the script as a global variable. The use
of it in this script has very limited scope (namely, only inside
this function), so just declare that it is "local".
Signed-off-by: Sibo Dong <sibo.dong@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 1 Nov 2020 08:52:12 +0000 (09:52 +0100)]
pack-write: use hashwrite_be32() instead of double-buffering array
hashwrite() already buffers writes, so pass the fanout table entries
individually via hashwrite_be32(), which also does the endianess
conversion for us. This avoids a memory copy, shortens the code and
reduces the number of magic numbers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Sat, 31 Oct 2020 19:46:03 +0000 (19:46 +0000)]
t5411: finish preparing for `main` being the default branch name
In addition to the trivial search-and-replace performed over the course
of the previous three commits, there is one test in t5411 that depends
on the length of the default branch name.
Adjust it and use `main` as the default branch name in this test.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Sat, 31 Oct 2020 19:46:02 +0000 (19:46 +0000)]
t5411: adjust the remaining support files for init.defaultBranch=main
This trick was performed via
$ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t/t5411/*
In the previous commit, we adjusted roughly half of the support files,
to stay under the 100kB limit (mails larger than that are rejected by
the Git mailing list).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Sat, 31 Oct 2020 19:46:01 +0000 (19:46 +0000)]
t5411: start adjusting the support files for init.defaultBranch=main
This trick was performed via
$ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t/t5411/test-00[3-5]*
We do not convert the files in `t/t5411/` in one go because the patch
would be too big (mails larger than 100kB are rejected by the Git
mailing list). Instead, we start with roughly half of the support files.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Schindelin [Sat, 31 Oct 2020 19:46:00 +0000 (19:46 +0000)]
t5411: start using the default branch name "main"
This is a straight-forward search-and-replace in the test script;
However, this is not yet complete because it requires many more
replacements in `t/t5411/`, too many for a single patch (the Git mailing
list rejects mails larger than 100kB). For that reason, we disable this
test script temporarily via the `PREPARE_FOR_MAIN_BRANCH` prereq.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sergey Organov [Sat, 31 Oct 2020 19:37:34 +0000 (22:37 +0300)]
doc/diff-options: fix out of place mentions of '--patch/-p'
First, references to --patch and -p appeared in the description of
git-format-patch, where the options themselves are not included.
Next, the description of --unified option elsewhere had duplicate implied
statements: "Implies --patch. Implies -p."
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sat, 31 Oct 2020 12:47:58 +0000 (13:47 +0100)]
bisect: clear flags in passed repository
69d2cfe6e8 (bisect.c: remove the_repository reference, 2018-11-10) kept
the implicit the_repository reference in clear_commit_marks_all, which
was made explicit by the previous commit (and which also renamed it to
repo_clear_commit_marks). Replace it as well.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sat, 31 Oct 2020 12:46:08 +0000 (13:46 +0100)]
object: allow clear_commit_marks_all to handle any repo
Allow callers to specify the repository to use. Rename the function to
repo_clear_commit_marks to document its new scope. No functional change
intended.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 30 Oct 2020 20:04:01 +0000 (13:04 -0700)]
Second batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 30 Oct 2020 20:04:24 +0000 (13:04 -0700)]
Merge branch 'js/ci-ghwf-dedup-tests'
GitHub Actions automated test improvement to skip tests on a tree
identical to what has already been tested.
* js/ci-ghwf-dedup-tests:
ci: make the "skip-if-redundant" check more defensive
ci: work around old records of GitHub runs
Junio C Hamano [Fri, 30 Oct 2020 20:04:24 +0000 (13:04 -0700)]
Merge branch 'dl/resurrect-update-for-sha256'
"git resurrect" script (in contrib/) learned that the object names
may be longer than 40-hex depending on the hash function in use.
* dl/resurrect-update-for-sha256:
contrib/git-resurrect.sh: use hash-agnostic OID pattern
contrib/git-resurrect.sh: indent with tabs
Junio C Hamano [Fri, 30 Oct 2020 20:04:24 +0000 (13:04 -0700)]
Merge branch 'cm/t7xxx-cleanup'
Micro clean-up.
* cm/t7xxx-cleanup:
t7102: prepare expected output inside test_expect_* block
t7201: put each command on a separate line
t7201: use 'git -C' to avoid subshell
t7102,t7201: remove whitespace after redirect operator
t7102,t7201: remove unnecessary blank spaces in test body
t7101,t7102,t7201: modernize test formatting
Junio C Hamano [Fri, 30 Oct 2020 20:04:24 +0000 (13:04 -0700)]
Merge branch 'ct/t0000-use-test-path-is-file'
Micro clean-up of a test script.
* ct/t0000-use-test-path-is-file:
t0000: use test_path_is_file instead of "test -f"