git.oblomov.eu Git

refs/files-backend: add longer-scoped copy of string to list

split_symref_update() receives a string-pointer `referent` and adds it
to the list of `affected_refnames`. The list simply holds on to the
pointers it is given, it does not copy the strings and it does not ever
free them. The `referent` string in split_symref_update() belongs to a
string buffer in the caller. After we return, the string will be leaked.

In the next patch, we want to properly release the string buffer in the
caller, but we can't safely do so until we've made sure that
`affected_refnames` will not be holding on to a pointer to the string.
We could configure the list to handle its own resources, but it would
mean some alloc/free-churning. The list is already handling other
strings (through other code paths) which we do not need to worry about,
and we'd be memory-churning those strings too, completely unnecessary.

Observe that split_symref_update() creates a `new_update`-object through
ref_transaction_add_update(), after which `new_update->refname` is a
copy of `referent`. The difference is, this copy will be freed, and it
will be freed *after* `affected_refnames` has been cleared.

Rearrange the handling of `referent`, so that we don't add it directly
to `affected_refnames`. Instead, first just check whether `referent`
exists in the string list, and later add `new_update->refname`.

Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-tree: do not complete line in -F input

"git commit-tree -F <file>", unlike "cat <file> | git
commit-tree" (i.e. feeding the same contents from the standard
input), added a missing final newline when the input ended in an
incomplete line.

Correct this inconsistency by leaving the incomplete line as-is,
as erring on the side of not touching the input is preferrable
and expected for a plumbing command like "commit-tree".

Signed-off-by: Ross Kabus <rkabus@aerotech.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

files_transaction_finish(): delete reflogs before references

If the deletion steps unexpectedly fail, it is less bad to leave a
reference without its reflog than it is to leave a reflog without its
reference, since the latter is an invalid repository state.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packed-backend: rip out some now-unused code

Now the outside world interacts with the packed ref store only via the
generic refs API plus a few lock-related functions. This allows us to
delete some functions that are no longer used, thereby completing the
encapsulation of the packed ref store.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

files_ref_store: use a transaction to update packed refs

When processing a `files_ref_store` transaction, it is sometimes
necessary to delete some references from the "packed-refs" file. Do
that using a reference transaction conducted against the
`packed_ref_store`.

This change further decouples `files_ref_store` from
`packed_ref_store`. It also fixes multiple problems, including the two
revealed by test cases added in the previous commit.

First, the old code didn't obtain the `packed-refs` lock until
`files_transaction_finish()`. This means that a failure to acquire the
`packed-refs` lock (e.g., due to contention with another process)
wasn't detected until it was too late (problems like this are supposed
to be detected in the "prepare" phase). The new code acquires the
`packed-refs` lock in `files_transaction_prepare()`, the same stage of
the processing when the loose reference locks are being acquired,
removing another reason why the "prepare" phase might succeed and the
"finish" phase might nevertheless fail.

Second, the old code deleted the loose version of a reference before
deleting any packed version of the same reference. This left a moment
when another process might think that the packed version of the
reference is current, which is incorrect. (Even worse, the packed
version of the reference can be arbitrarily old, and might even point
at an object that has since been garbage-collected.)

Third, if a reference deletion fails to acquire the `packed-refs` lock
altogether, then the old code might leave the repository in the
incorrect state (possibly corrupt) described in the previous
paragraph.

Now we activate the new "packed-refs" file (sans any references that
are being deleted) *before* deleting the corresponding loose
references. But we hold the "packed-refs" lock until after the loose
references have been finalized, thus preventing a simultaneous
"pack-refs" process from packing the loose version of the reference in
the time gap, which would otherwise defeat our attempt to delete it.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t1404: demonstrate two problems with reference transactions

Currently, a loose reference is deleted even before locking the
`packed-refs` file, let alone deleting any packed version of the
reference. This leads to two problems, demonstrated by two new tests:

* While a reference is being deleted, other processes might see the
  old, packed value of the reference for a moment before the packed
  version is deleted. Normally this would be hard to observe, but we
  can prolong the window by locking the `packed-refs` file externally
  before running `update-ref`, then unlocking it before `update-ref`'s
  attempt to acquire the lock times out.

* If the `packed-refs` file is locked so long that `update-ref` fails
  to lock it, then the reference can be left permanently in the
  incorrect state described in the previous point.

In a moment, both problems will be fixed.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

files_initial_transaction_commit(): use a transaction for packed refs

Use a `packed_ref_store` transaction in the implementation of
`files_initial_transaction_commit()` rather than using internal
features of the packed ref store. This further decouples
`files_ref_store` from `packed_ref_store`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

prune_refs(): also free the linked list

At least since v1.7, the elements of the `refs_to_prune` linked list
have been leaked. Fix the leak by teaching `prune_refs()` to free the
list elements as it processes them.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

files_pack_refs(): use a reference transaction to write packed refs

Now that the packed reference store supports transactions, we can use
a transaction to write the packed versions of references that we want
to pack. This decreases the coupling between `files_ref_store` and
`packed_ref_store`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packed_delete_refs(): implement method

Implement `packed_delete_refs()` using a reference transaction. This
means that `files_delete_refs()` can use `refs_delete_refs()` instead
of `repack_without_refs()` to delete any packed references, decreasing
the coupling between the classes.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packed_ref_store: implement reference transactions

Implement the methods needed to support reference transactions for
the packed-refs backend. The new methods are not yet used.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

struct ref_transaction: add a place for backends to store data

`packed_ref_store` is going to want to store some transaction-wide
data, so make a place for it.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packed-backend: don't adjust the reference count on lock/unlock

The old code incremented the packed ref cache reference count when
acquiring the packed-refs lock, and decremented the count when
releasing the lock. This is unnecessary because:

* Another process cannot change the packed-refs file because it is
  locked.

* When we ourselves change the packed-refs file, we do so by first
  modifying the packed ref-cache, and then writing the data from the
  ref-cache to disk. So the packed ref-cache remains fresh because any
  changes that we plan to make to the file are made in the cache first
  anyway.

So there is no reason for the cache to become stale.

Moreover, the extra reference count causes a problem if we
intentionally clear the packed refs cache, as we sometimes need to do
if we change the cache in anticipation of writing a change to disk,
but then the write to disk fails. In that case, `packed_refs_unlock()`
would have no easy way to find the cache whose reference count it
needs to decrement.

This whole issue will soon become moot due to upcoming changes that
avoid changing the in-memory cache as part of updating the packed-refs
on disk, but this change makes that transition easier.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

load_subtree(): check that `prefix_len` is in the expected range

This value, which is stashed in the last byte of an object_id hash,
gets handed around a lot. So add a sanity check before using it in
`load_subtree()`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

shortlog: skip format/parse roundtrip for internal traversal

The original git-shortlog command parsed the output of
git-log, and the logic went something like this:

  1. Read stdin looking for "author" lines.

  2. Parse the identity into its name/email bits.

  3. Apply mailmap to the name/email.

  4. Reformat the identity into a single buffer that is our
     "key" for grouping entries (either a name by default,
     or "name <email>" if --email was given).

The first part happens in read_from_stdin(), and the other
three steps are part of insert_one_record().

When we do an internal traversal, we just swap out the stdin
read in step 1 for reading the commit objects ourselves.
Prior to 2db6b83d18 (shortlog: replace hand-parsing of
author with pretty-printer, 2016-01-18), that made sense; we
still had to parse the ident in the commit message.

But after that commit, we use pretty.c's "%an <%ae>" to get
the author ident (for simplicity). Which means that the
pretty printer is doing a parse/format under the hood, and
then we parse the result, apply the mailmap, and format the
result again.

Instead, we can just ask pretty.c to do all of those steps
for us (including the mailmap via "%aN <%aE>", and not
formatting the address when --email is missing).

And then we can push steps 2-4 into read_from_stdin(). This
speeds up "git shortlog -ns" on linux.git by about 3%, and
eliminates a leak in insert_one_record() of the namemailbuf
strbuf.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

add UNLEAK annotation for reducing leak false positives

It's a common pattern in git commands to allocate some
memory that should last for the lifetime of the program and
then not bother to free it, relying on the OS to throw it
away.

This keeps the code simple, and it's fast (we don't waste
time traversing structures or calling free at the end of the
program). But it also triggers warnings from memory-leak
checkers like valgrind or LSAN. They know that the memory
was still allocated at program exit, but they don't know
_when_ the leaked memory stopped being useful. If it was
early in the program, then it's probably a real and
important leak. But if it was used right up until program
exit, it's not an interesting leak and we'd like to suppress
it so that we can see the real leaks.

This patch introduces an UNLEAK() macro that lets us do so.
To understand its design, let's first look at some of the
alternatives.

Unfortunately the suppression systems offered by
leak-checking tools don't quite do what we want. A
leak-checker basically knows two things:

  1. Which blocks were allocated via malloc, and the
     callstack during the allocation.

  2. Which blocks were left un-freed at the end of the
     program (and which are unreachable, but more on that
     later).

Their suppressions work by mentioning the function or
callstack of a particular allocation, and marking it as OK
to leak.  So imagine you have code like this:

  int cmd_foo(...)
  {
/* this allocates some memory */
char *p = some_function();
printf("%s", p);
return 0;
  }

You can say "ignore allocations from some_function(),
they're not leaks". But that's not right. That function may
be called elsewhere, too, and we would potentially want to
know about those leaks.

So you can say "ignore the callstack when main calls
some_function".  That works, but your annotations are
brittle. In this case it's only two functions, but you can
imagine that the actual allocation is much deeper. If any of
the intermediate code changes, you have to update the
suppression.

What we _really_ want to say is that "the value assigned to
p at the end of the function is not a real leak". But
leak-checkers can't understand that; they don't know about
"p" in the first place.

However, we can do something a little bit tricky if we make
some assumptions about how leak-checkers work. They
generally don't just report all un-freed blocks. That would
report even globals which are still accessible when the
leak-check is run.  Instead they take some set of memory
(like BSS) as a root and mark it as "reachable". Then they
scan the reachable blocks for anything that looks like a
pointer to a malloc'd block, and consider that block
reachable. And then they scan those blocks, and so on,
transitively marking anything reachable from a global as
"not leaked" (or at least leaked in a different category).

So we can mark the value of "p" as reachable by putting it
into a variable with program lifetime. One way to do that is
to just mark "p" as static. But that actually affects the
run-time behavior if the function is called twice (you
aren't likely to call main() twice, but some of our cmd_*()
functions are called from other commands).

Instead, we can trick the leak-checker by putting the value
into _any_ reachable bytes. This patch keeps a global
linked-list of bytes copied from "unleaked" variables. That
list is reachable even at program exit, which confers
recursive reachability on whatever values we unleak.

In other words, you can do:

  int cmd_foo(...)
  {
char *p = some_function();
printf("%s", p);
UNLEAK(p);
return 0;
  }

to annotate "p" and suppress the leak report.

But wait, couldn't we just say "free(p)"? In this toy
example, yes. But UNLEAK()'s byte-copying strategy has
several advantages over actually freeing the memory:

  1. It's recursive across structures. In many cases our "p"
     is not just a pointer, but a complex struct whose
     fields may have been allocated by a sub-function. And
     in some cases (e.g., dir_struct) we don't even have a
     function which knows how to free all of the struct
     members.

     By marking the struct itself as reachable, that confers
     reachability on any pointers it contains (including those
     found in embedded structs, or reachable by walking
     heap blocks recursively.

  2. It works on cases where we're not sure if the value is
     allocated or not. For example:

       char *p = argc > 1 ? argv[1] : some_function();

     It's safe to use UNLEAK(p) here, because it's not
     freeing any memory. In the case that we're pointing to
     argv here, the reachability checker will just ignore
     our bytes.

  3. Likewise, it works even if the variable has _already_
     been freed. We're just copying the pointer bytes. If
     the block has been freed, the leak-checker will skip
     over those bytes as uninteresting.

  4. Because it's not actually freeing memory, you can
     UNLEAK() before we are finished accessing the variable.
     This is helpful in cases like this:

       char *p = some_function();
       return another_function(p);

     Writing this with free() requires:

       int ret;
       char *p = some_function();
       ret = another_function(p);
       free(p);
       return ret;

     But with unleak we can just write:

       char *p = some_function();
       UNLEAK(p);
       return another_function(p);

This patch adds the UNLEAK() macro and enables it
automatically when Git is compiled with SANITIZE=leak.  In
normal builds it's a noop, so we pay no runtime cost.

It also adds some UNLEAK() annotations to show off how the
feature works. On top of other recent leak fixes, these are
enough to get t0000 and t0001 to pass when compiled with
LSAN.

Note the case in commit.c which actually converts a
strbuf_release() into an UNLEAK. This code was already
non-leaky, but the free didn't do anything useful, since
we're exiting. Converting it to an annotation means that
non-leak-checking builds pay no runtime cost. The cost is
minimal enough that it's probably not worth going on a
crusade to convert these kinds of frees to UNLEAKS. I did it
here for consistency with the "sb" leak (though it would
have been equally correct to go the other way, and turn them
both into strbuf_release() calls).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6120: test describe and name-rev with deep repos

Depending on the implementation of walks, limitted stack size may lead
to problems (for recursion).

Test name-rev and describe with deep repos and limitted stack size and
mark the former with known failure.

We add these tests (which add gazillions of commits) last so as to keep
the runtime of other subtests the same.

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6120: clean up state after breaking repo

t6120 breaks the repo state intentionally in the last tests.

Clean up the breakage afterwards (and before adding more tests).

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6120: test name-rev --all and --stdin

name-rev is used in a few tests, but tested only in t6120 along with
describe so far.

Add tests for name-rev with --all and --stdin.

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7004: move limited stack prereq to test-lib

The lazy prerequisite ULIMIT_STACK_SIZE is used only in t7004 so far.

Move it to test-lib.sh so that it can be used in other tests (which it will
be in a follow-up commit).

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: change current file dir string_lists to hashmap

The code was using two string_lists, one for the directories and
one for the files.  The code never checks the lists independently
so we should be able to only use one list.  The string_list also
is a O(log n) for lookup and insertion.  Switching this to use a
hashmap will give O(1) which will save some time when there are
millions of paths that will be checked.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/merge: honor commit-msg hook for merges

Similar to 65969d43d1 (merge: honor prepare-commit-msg hook, 2011-02-14)
merge should also honor the commit-msg hook: When a merge is stopped due
to conflicts or --no-commit, the subsequent commit calls the commit-msg
hook. However, it is not called after a clean merge. Fix this
inconsistency by invoking the hook after clean merges as well.

This change is motivated by Gerrit's commit-msg hook to install a ChangeId
trailer into the commit message. Without such a ChangeId, Gerrit refuses
to accept any commit by default, such that the inconsistency of (not)
running the commit-msg hook between commit and merge leads to confusion
and might block people from getting their work done.

As the githooks man page is very vocal about the possibility of skipping
the commit-msg hook via the --no-verify option, implement the option
in merge, too.

'git merge --continue' is currently implemented as calling cmd_commit
with no further arguments. This works for most other merge related options,
such as demonstrated via the --allow-unrelated-histories flag in the
test. The --no-verify option however is not remembered across invocations
of git-merge. Originally the author assumed an alternative in which the
'git merge --continue' command accepts the --no-verify flag, but that
opens up the discussion which flags are allows to the continued merge
command and which must be given in the first invocation.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

read-cache: fix index corruption with index v4

ce012deb98 ("read-cache: avoid allocating every ondisk entry when
writing", 2017-08-21) changed the way cache entries are written to the
index file.  While previously it wrote the name to an struct that was
allocated using xcalloc(), it now uses ce_write() directly.  Previously
ce_namelen - common bytes were written to the cache entry, which would
automatically make it nul terminated, as it was allocated using calloc.

Now we are writing ce_namelen - common + 1 bytes directly from the
ce->name to the index.  If CE_STRIP_NAME however gets set in the split
index case ce->ce_namelen is set to 0 without changing the actual
ce->name buffer.  When index-v4, this results in the first character of
ce->name being written out instead of just a terminating nul charcter.

As index-v4 requires the terminating nul character as terminator of
the name when reading it back, this results in a corrupted index.

Fix that by only writing ce_namelen - common bytes directly from
ce->name to the index, and adding the nul terminator in an extra call to
ce_write.

This bug was turned up by setting TEST_GIT_INDEX_VERSION = 4 in
config.mak and running the test suite (t1700 specifically broke).

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pull: honor submodule.recurse config option

"git pull" supports a --recurse-submodules option but does not parse the
submodule.recurse configuration item to set the default for that option.
Meanwhile "git fetch" does support submodule.recurse, producing
confusing behavior: when submodule.recurse is enabled, "git pull"
recursively fetches submodules but does not update them after fetch.

Handle submodule.recurse in "git pull" to fix this.

Reported-by: Magnus Homann <magnus@homann.se>
Signed-off-by: Nicolas Morey-Chaisemartin <nicolas@morey-chaisemartin.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pull: fix cli and config option parsing order

pull parses first the cli options and then the config option.
The expected behavior is the other way around, so that config
options can not override the cli ones.

This patch changes the parsing order so config options are
parsed first.

Signed-off-by: Nicolas Morey-Chaisemartin <nicolas@morey-chaisemartin.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: remove git_config_maybe_bool

The function was deprecated in commit 89576613 ("treewide: deprecate
git_config_maybe_bool, use git_parse_maybe_bool", 2017-08-07) and has no
users.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hashmap: add API to disable item counting when threaded

This is to address concerns raised by ThreadSanitizer on the mailing list
about threaded unprotected R/W access to map.size with my previous "disallow
rehash" change (0607e10009ee4e37cb49b4cec8d28a9dda1656a4).

See:
https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.agren@gmail.com/

Add API to hashmap to disable item counting and thus automatic rehashing.
Also include API to later re-enable them.

When item counting is disabled, the map.size field is invalid.  So to
prevent accidents, the field has been renamed and an accessor function
hashmap_get_size() has been added.  All direct references to this
field have been been updated.  And the name of the field changed
to map.private_size to communicate this.

Here is the relevant output from ThreadSanitizer showing the problem:

WARNING: ThreadSanitizer: data race (pid=10554)
  Read of size 4 at 0x00000082d488 by thread T2 (mutexes: write M16):
    #0 hashmap_add hashmap.c:209
    #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302
    #2 handle_range_dir name-hash.c:347
    #3 handle_range_1 name-hash.c:415
    #4 lazy_dir_thread_proc name-hash.c:471
    #5 <null> <null>

  Previous write of size 4 at 0x00000082d488 by thread T1 (mutexes: write M31):
    #0 hashmap_add hashmap.c:209
    #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302
    #2 handle_range_dir name-hash.c:347
    #3 handle_range_1 name-hash.c:415
    #4 handle_range_dir name-hash.c:380
    #5 handle_range_1 name-hash.c:415
    #6 lazy_dir_thread_proc name-hash.c:471
    #7 <null> <null>

Martin gives instructions for running TSan on test t3008 in this post:
https://public-inbox.org/git/CAN0heSoJDL9pWELD6ciLTmWf-a=oyxe4EXXOmCKvsG5MSuzxsA@mail.gmail.com/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git_extract_argv0_path: do nothing without RUNTIME_PREFIX

When the RUNTIME_PREFIX compile-time knob isn't set, we
never look at the argv0_path we extract. We can push its
declaration inside the #ifdef to make it more clear that the
extract code is effectively a noop.

This also un-confuses leak-checking of the argv0_path
variable when RUNTIME_PREFIX isn't set. The compiler is free
to drop this static variable that we set but never look at
(and "gcc -O2" does so). But the compiler still must call
strbuf_detach(), since it doesn't know whether that function
has side effects; it just throws away the result rather than
putting it into the global.

Leak-checkers which work by scanning the data segment for
pointers to heap blocks would normally consider the block
as reachable at program end. But if the compiler removes the
variable entirely, there's nothing to find.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

system_path: move RUNTIME_PREFIX to a sub-function

The system_path() function has an #ifdef in the middle of
it. Let's move the conditional logic into a sub-function.
This isolates it more, which will make it easier to change
and add to.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Add t/helper/test-write-cache to .gitignore

This new binary was introduced in commit 3921a0b ("perf: add test for
writing the index", 2017-08-21), but a .gitignore entry was not added
for it. Add that entry.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: mention that `eol` can change the dirty status of paths

When setting the `eol` attribute, paths can change their dirty status
without any change in the working directory. This can cause confusion
and should at least be mentioned with a remedy.

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Reviewed-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

wt-status: release strbuf after use in read_rebase_todolist()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vcs-svn: release strbuf after use in end_revision()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

utf8: release strbuf on error return in strbuf_utf8_replace()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

userdiff: release strbuf after use in userdiff_get_textconv()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

transport-helper: release strbuf after use in process_connect_service()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: release strbuf after use in save_head()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

shortlog: release strbuf after use in insert_one_record()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sha1_file: release strbuf on error return in index_path()

strbuf_readlink() already frees the buffer for us on error. Clean up
if write_sha1_file() fails as well instead of returning early.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

send-pack: release strbuf on error return in send_pack()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: release strbuf after use in set_url()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: release strbuf after use in migrate_file()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: release strbuf after use in read_remote_branches()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: release strbuf on error return in write_pseudoref()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

notes: release strbuf after use in notes_copy_from_stdin()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge: release strbuf after use in write_merge_heads()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge: release strbuf after use in save_state()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mailinfo: release strbuf on error return in handle_boundary()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mailinfo: release strbuf after use in handle_from()

Clean up at the end and jump there instead of returning early.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

help: release strbuf on error return in exec_woman_emacs()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

help: release strbuf on error return in exec_man_man()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

help: release strbuf on error return in exec_man_konqueror()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: release strbuf after use in show_stats()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: release strbuf after use in show_rename_copy()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: release strbuf after use in diff_summary()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

convert: release strbuf on error return in filter_buffer_or_fd()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

connect: release strbuf on error return in git_connect()

Reduce the scope of the variable cmd and release it before returning
early.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: release strbuf on error return in commit_tree_extended()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

clone: release strbuf after use in remove_junk()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

clean: release strbuf after use in remove_dirs()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

check-ref-format: release strbuf after use in check_ref_format_branch()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

am: release strbuf after use in safe_to_abort()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

am: release strbuf on error return in hg_patch_to_mail()

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

am: release strbufs after use in detect_patch_format()

Don't reset the strbufs l2 and l3 before use as if they were static, but
release them at the end instead.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-parse: don't trim bisect refnames

Using for_each_ref_in() with a full refname has always been
a questionable practice, but it became an error with
b9c8e7f2fb (prefix_ref_iterator: don't trim too much,
2017-05-22), making "git rev-parse --bisect" pretty reliably
show a BUG.

Commit 03df567fbf (for_each_bisect_ref(): don't trim
refnames, 2017-06-18) fixed this case for revision.c, but
rev-parse handles this option on its own. We can use the
same solution here (and piggy-back on its test).

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

set_git_dir: handle feeding gitdir to itself

Ideally we'd free the existing gitdir field before assigning
the new one, to avoid a memory leak. But we can't do so
safely because some callers do the equivalent of:

set_git_dir(get_git_dir());

We can detect that case as a noop, but there are even more
complicated cases like:

set_git_dir(remove_leading_path(worktree, get_git_dir());

where we really do need to do some work, but the original
string must remain valid.

Rather than put the burden on callers to make a copy of the
string (only to free it later, since we'll make a copy of it
ourselves), let's solve the problem inside set_git_dir(). We
can make a copy of the pointer for the old gitdir, and then
avoid freeing it until after we've made our new copy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repository: free fields before overwriting them

It's possible that the repository data may be initialized
twice (e.g., after doing a chdir() to the top of the
worktree we may have to adjust a relative git_dir path). We
should free() any existing fields before assigning to them
to avoid leaks.

This should be safe, as the fields are set based on the
environment or on other strings like the gitdir or
commondir. That makes it impossible that we are feeding an
alias to the just-freed string.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: free allocated tree buffers

We read the tree objects with fill_tree_descriptor(), but
never actually free the resulting buffers, causing a memory
leak. This isn't a huge deal because we call this code at
most twice per program invocation. But it does potentially
double our heap usage if you have large root trees. Let's
free the trees before returning.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: make tree counting less confusing

Depending on whether we're in --keep mode, git-reset may
feed one or two trees to unpack_trees(). We start a counter
at "1" and then increment it to "2" only for the two-tree
case. But that means we must always subtract one to find the
correct array slot to fill with each descriptor.

Instead, let's start at "0" and just increment our counter
after adding each tree. This skips the extra subtraction,
and will make things much easier when we start to actually
free our tree buffers.

While we're at it, let's make the first allocation use the
slot at "desc + nr", too, even though we know "nr" is 0 at
that point. It makes the two fill_tree_descriptor() calls
consistent (always "desc + nr", followed by always
incrementing "nr").

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: plug user_config leak

We generate filenames for the user_config ("~/.gitconfig")
and the xdg config ("$XDG_CONFIG_HOME/git/config") and then
decide which to use by looking at the filesystem. But after
selecting one, the unused string is just leaked.

This is a tiny leak, but it creates noise in leak-checker
output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

update-index: fix cache entry leak in add_one_file()

When we fail to add the cache entry to the index, we end up
just leaking the struct. We should follow the pattern of the
early-return above and free it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

add: free leaked pathspec after add_files_to_cache()

After run_diff_files, we throw away the rev_info struct,
including the pathspec that we copied into it, leaking the
memory. this is probably not a big deal in practice. We
usually only run this once per process, and the leak is
proportional to the pathspec list we're already holding in
memory.

But it's still a leak, and it pollutes leak-checker output,
making it harder to find important leaks.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib: set LSAN_OPTIONS to abort by default

We already set ASAN_OPTIONS to abort if it finds any errors.
As we start to experiment with LSAN, the leak sanitizer,
it's convenient if we give it the same treatment.

Note that ASAN is actually a superset of LSAN and can do the
leak detection itself. So this only has an effect if you
specifically build with "make SANITIZE=leak" (leak detection
but not the rest of ASAN). Building with just LSAN results
in a build that runs much faster. That makes the
build-test-fix cycle more pleasant.

In the long run, once we've fixed or suppressed all the
leaks, it will probably be worth turning leak-detection on
for ASAN and just using that (to check both leaks _and_
memory errors in a single test run). But there's still a lot
of work before we get there.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib: --valgrind should not override --verbose-log

The --verbose test option cannot be used with test harnesses
like "prove". Instead, you must use --verbose-log.

Since the --valgrind option implies --verbose, that means
that it cannot be used with prove. I.e., this does not work:

prove t0000-basic.sh :: --valgrind

You'd think it could be fixed by doing:

prove t0000-basic.sh :: --valgrind --verbose-log

but that doesn't work either, because the implied --verbose
takes precedence over --verbose-log. If the user has given
us a specific option, we should prefer that.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

stop leaking lock structs in some simple cases

Now that it's safe to declare a "struct lock_file" on the
stack, we can do so (and avoid an intentional leak). These
leaks were found by running t0000 and t0001 under valgrind
(though certainly other similar leaks exist and just don't
happen to be exercised by those tests).

Initializing the lock_file's inner tempfile with NULL is not
strictly necessary in these cases, but it's a good practice
to model. It means that if we were to call a function like
rollback_lock_file() on a lock that was never taken in the
first place, it becomes a quiet noop (rather than undefined
behavior).

Likewise, it's always safe to rollback_lock_file() on a file
that has already been committed or deleted, since that
operation is a noop on an inactive lockfile (and that's why
the case in config.c can drop the "if (lock)" check as we
move away from using a pointer).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ref_lock: stop leaking lock_files

Since the tempfile code recently relaxed the rule that
tempfile structs (and thus locks) need to hang around
forever, we no longer have to leak our lock_file structs.

In fact, we don't even need to heap-allocate them anymore,
since their lifetime can just match that of the surrounding
ref_lock (and if we forget to delete a lock, the effect is
the same as before: it will eventually go away at program
exit).

Note that there is a check in unlock_ref() to only rollback
a lock file if it has been allocated. We don't need that
check anymore; we zero the ref_lock (and thus the
lock_file), so at worst we pass a NULL pointer to
delete_tempfile(), which considers that a noop.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

lockfile: update lifetime requirements in documentation

Now that the tempfile system we rely on has loosened the
lifetime requirements for storage, we can adjust our
documentation to match.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: auto-allocate tempfiles on heap

The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:

  struct tempfile t;

  create_tempfile(&t, ...);
  ...
  if (!err)
          rename_tempfile(&t, ...);
  else
          delete_tempfile(&t);

But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.

Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).

But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it.  Let's
see if we can make it harder to get this wrong.

Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.

Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).

This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.

The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:

  1. Storing a pointer instead of the struct itself.

  2. Passing the pointer instead of taking the struct
     address.

  3. Handling a "struct tempfile *" return instead of a file
     descriptor.

We could play games to make this less noisy. For example, by
defining the tempfile like this:

  struct tempfile {
struct heap_allocated_part_of_tempfile {
                int fd;
                ...etc
        } *actual_data;
  }

Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: remove deactivated list entries

Once a "struct tempfile" is added to the global cleanup
list, it is never removed. This means that its storage must
remain valid for the lifetime of the program. For single-use
tempfiles and locks, this isn't a big deal: we just declare
the struct static. But for library code which may take
multiple simultaneous locks (like the ref code), they're
forced to allocate a struct on the heap and leak it.

This is mostly OK in practice. The size of the leak is
bounded by the number of refs, and most programs exit after
operating on a fixed number of refs (and allocate
simultaneous memory proportional to the number of ref
updates in the first place). But:

  1. It isn't hard to imagine a real leak: a program which
     runs for a long time taking a series of ref update
     instructions and fulfilling them one by one. I don't
     think we have such a program now, but it's certainly
     plausible.

  2. The leaked entries appear as false positives to
     tools like valgrind.

Let's relax this rule by keeping only "active" tempfiles on
the list. We can do this easily by moving the list-add
operation from prepare_tempfile_object to activate_tempfile,
and adding a deletion in deactivate_tempfile.

Existing callers do not need to be updated immediately.
They'll continue to leak any tempfile objects they may have
allocated, but that's no different than the status quo. We
can clean them up individually.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: use list.h for linked list

The tempfile API keeps to-be-cleaned tempfiles in a
singly-linked list and never removes items from the list.  A
future patch would like to start removing items, but removal
from a singly linked list is O(n), as we have to walk the
list to find the predecessor element. This means that a
process which takes "n" simultaneous lockfiles (for example,
an atomic transaction on "n" refs) may end up quadratic in
"n".

Before we start allowing items to be removed, it would be
nice to have a way to cover this case in linear time.

The simplest solution is to make an assumption about the
order in which tempfiles are added and removed from the
list. If both operations iterate over the tempfiles in the
same order, then by putting new items at the end of the list
our removal search will always find its items at the
beginning of the list. And indeed, that would work for the
case of refs. But it creates a hidden dependency between
unrelated parts of the code. If anybody changes the ref code
(or if we add a new caller that opens multiple simultaneous
tempfiles) they may unknowingly introduce a performance
regression.

Another solution is to use a better data structure. A
doubly-linked list works fine, and we already have an
implementation in list.h. But there's one snag: the elements
of "struct tempfile" are all marked as "volatile", since a
signal handler may interrupt us and iterate over the list at
any moment (even if we were in the middle of adding a new
entry).

We can declare a "volatile struct list_head", but we can't
actually use it with the normal list functions. The compiler
complains about passing a pointer-to-volatile via a regular
pointer argument. And rightfully so, as the sub-function
would potentially need different code to deal with the
volatile case.

That leaves us with a few options:

  1. Drop the "volatile" modifier for the list items.

     This is probably a bad idea. I checked the assembly
     output from "gcc -O2", and the "volatile" really does
     impact the order in which it updates memory.

  2. Use macros instead of inline functions. The irony here
     is that list.h is entirely implemented as trivial
     inline functions. So we basically are already
     generating custom code for each call. But sadly there's no
     way in C to declare the inline function to take a more
     generic type.

     We could do so by switching the inline functions to
     macros, but it does make the end result harder to read.
     And it doesn't fully solve the problem (for instance,
     the declaration of list_head needs to change so that
     its "prev" and "next" pointers point to other volatile
     structs).

  3. Don't use list.h, and just make our own ad-hoc
     doubly-linked list. It's not that much code to
     implement the basics that we need here. But if we're
     going to do so, why not add the few extra lines
     required to model it after the actual list.h interface?
     We can even reuse a few of the macro helpers.

So this patch takes option 3, but actually implements a
parallel "volatile list" interface in list.h, where it could
potentially be reused by other code. This implements just
enough for tempfile.c's use, though we could easily port
other functions later if need be.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: release deactivated strbufs instead of resetting

When a tempfile is deactivated, we reset its strbuf to the
empty string, which means we hold onto the memory for later
reuse.

Since we'd like to move to a system where tempfile structs
can actually be freed, deactivating one should drop all
resources it is currently using. And thus "release" rather
than "reset" is the appropriate function to call.

In theory the reset may have saved a malloc() when a
tempfile (or a lockfile) is reused multiple times. But in
practice this happened rarely. Most of our tempfiles are
single-use, since in cases where we might actually use many
(like ref locking) we xcalloc() a fresh one for each ref. In
fact, we leak those locks (to appease the rule that tempfile
storage can never be freed). Which means that using reset is
actively hurting us: instead of leaking just the tempfile
struct, we're leaking the extra heap chunk for the filename,
too.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: robustify cleanup handler

We may call remove_tempfiles() from an atexit handler, or
from a signal handler. In the latter case we must take care
to avoid functions which may deadlock if the process is in
an unknown state, including looking at any stdio handles
(which may be in the middle of doing I/O and locked) or
calling malloc() or free().

The current implementation calls delete_tempfile(). We unset
the tempfile's stdio handle (if any) to avoid deadlocking
there. But delete_tempfile() still calls unlink_or_warn(),
which can deadlock writing to stderr if the unlink fails.

Since delete_tempfile() isn't very long, let's just
open-code our own simple conservative version of the same
thing.  Notably:

  1. The "skip_fclose" flag is now called "in_signal_handler",
     because it should inform more decisions than just the
     fclose handling.

  2. We can replace close_tempfile() with just close(fd).
     That skips the fclose() question altogether. This is
     fine for the atexit() case, too; there's no point
     flushing data to a file which we're about to delete
     anyway.

  3. We can choose between unlink/unlink_or_warn based on
     whether it's safe to use stderr.

  4. We can replace the deactivate_tempfile() call with a
     simple setting of the active flag. There's no need to
     do any further cleanup since we know the program is
     exiting.  And even though the current deactivation code
     is safe in a signal handler, this frees us up in future
     patches to make non-signal deactivation more
     complicated (e.g., by freeing resources).

  5. There's no need to remove items from the tempfile_list.
     The "active" flag is the ultimate answer to whether an
     entry has been handled or not. Manipulating the list
     just introduces more chance of recursive signals
     stomping on each other, and the whole list will go away
     when the program exits anyway. Less is more.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: factor out deactivation

When we deactivate a tempfile, we also have to clean up the
"filename" strbuf. Let's pull this out into its own function
to keep the logic in one place (which will become more
important when a future patch makes it more complicated).

Note that we can use the same function when deactivating an
object that _isn't_ actually active yet (like when we hit an
error creating a tempfile). These callsites don't currently
reset the "active" flag to 0, but it's OK to do so (it's
just a noop for these cases).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: factor out activation

There are a few steps required to "activate" a tempfile
struct. Let's pull these out into a function. That saves a
few repeated lines now, but more importantly will make it
easier to change the activation scheme later.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: replace die("BUG") with BUG()

Compared to die(), using BUG() triggers abort(). That may
give us an actual coredump, which should make it easier to
get a stack trace. And since the programming error for these
assertions is not in the functions themselves but in their
callers, such a stack trace is needed to actually find the
source of the bug.

In addition, abort() raises SIGABRT, which is more likely to
be caught by our test suite.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: handle NULL tempfile pointers gracefully

The tempfile functions all take pointers to tempfile
objects, but do not check whether the argument is NULL.
This isn't a big deal in practice, since the lifetime of any
tempfile object is defined to last for the whole program. So
even if we try to call delete_tempfile() on an
already-deleted tempfile, our "active" check will tell us
that it's a noop.

In preparation for transitioning to a new system that
loosens the "tempfile objects can never be freed" rule,
let's tighten up our active checks:

  1. A NULL pointer is now defined as "inactive" (so it will
     BUG for most functions, but works as a silent noop for
     things like delete_tempfile).

  2. Functions should always do the "active" check before
     looking at any of the struct fields.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: prefer is_tempfile_active to bare access

The tempfile code keeps an "active" flag, and we have a
number of assertions to make sure that the objects are being
used in the right order. Most of these directly check
"active" rather than using the is_tempfile_active()
accessor.

Let's prefer using the accessor, in preparation for it
growing more complicated logic (like checking for NULL).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

lockfile: do not rollback lock on failed close

Since the lockfile code is based on the tempfile code, it
has some of the same problems, including that close_lock_file()
erases the tempfile's filename buf, making it hard for the
caller to write a good error message.

In practice this comes up less for lockfiles than for
straight tempfiles, since we usually just report the
refname. But there is at least one buggy case in
write_ref_to_lockfile(). Besides, given the coupling between
the lockfile and tempfile modules, it's less confusing if
their close() functions have the same semantics.

Just as the previous commit did for close_tempfile(), let's
teach close_lock_file() and its wrapper close_ref() not to
rollback on error. And just as before, we'll give them new
"gently" names to catch any new callers that are added.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tempfile: do not delete tempfile on failed close

When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:

  if (close_tempfile(tempfile))
return error_errno("error closing %s", tempfile->filename.buf);

wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).

Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:

  if (write(...) || close_tempfile(...)) {
delete_tempfile(...);
return -1;
  }

already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.

Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

always check return value of close_tempfile

If close_tempfile() encounters an error, then it deletes the
tempfile and resets the "struct tempfile". But many code
paths ignore the return value and continue to use the
tempfile. Instead, we should generally treat this the same
as a write() error.

Note that in the postimage of some of these cases our error
message will be bogus after a failed close because we look
at tempfile->filename (either directly or via get_tempfile_path).
But after the failed close resets the tempfile object, this
is guaranteed to be the empty string. That will be addressed
in a future patch (because there are many more cases of the
same problem than just these instances).

Note also in the hunk in gpg-interface.c that it's fine to
call delete_tempfile() in the error path, even if
close_tempfile() failed and already deleted the file. The
tempfile code is smart enough to know the second deletion is
a noop.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

verify_signed_buffer: prefer close_tempfile() to close()

We do a manual close() on the descriptor provided to us by
mks_tempfile. But this runs contrary to the advice in
tempfile.h, which notes that you should always use
close_tempfile(). Otherwise the descriptor may be reused
without the tempfile object knowing it, and the later call
to delete_tempfile() could close a random descriptor.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup_temporary_shallow: move tempfile struct into function

The setup_temporary_shallow() function creates a temporary
file, but we never access the tempfile struct outside of the
function. This is OK, since it means we'll just clean up the
tempfile on exit. But we can simplify the code a bit by
moving the global tempfile struct to the only function in
which it's used.

Note that it must remain "static" due to tempfile.c's
requirement that tempfile storage never goes away until
program exit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup_temporary_shallow: avoid using inactive tempfile

When there are no shallow entries to write, we skip creating
the tempfile entirely and try to return the empty string.

But we do so by calling get_tempfile_path() on the inactive
tempfile object. This will trigger an assertion that kills
the program. The bug was introduced by 6e122b449b
(setup_temporary_shallow(): use tempfile module,
2015-08-10). But nobody seems to have noticed since then
because we do not end up calling this function at all when
there are no shallow items. In other words, this code path
is completely unexercised.

Since the tempfile object is a static global, it _is_
possible that we call the function twice, writing out
shallow info the first time and then "reusing" our tempfile
object the second time. But:

  1. It seems unlikely that this was the intent, as hitting
     this code path would imply somebody clearing the
     shallow_info list between calls.

     And if somebody _did_ call the function multiple times
     without clearing the shallow_info list, we'd hit a
     different BUG for trying to reuse an already-active
     tempfile.

  2. I verified by code inspection that the function is only
     called once per program. And also replacing this code
     with a BUG() and running the test suite demonstrates
     that it is not triggered there.

So we could probably just replace this with an assertion and
confirm that it's never called. However, the original intent
does seem to be that you _could_ call it when the
shallow_info is empty. And that's easy enough to do; since
the return value doesn't need to point to a writable buffer,
we can just return a string literal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

write_index_as_tree: cleanup tempfile on error

If we failed to write our new index file, we rollback our
lockfile to remove the temporary index. But if we fail
before we even get to the write step (because reading the
old index failed), we leave the lockfile in place, which
makes no sense.

In practice this hasn't been a big deal because failing at
write_index_as_tree() typically results in the whole program
exiting (and thus the tempfile handler kicking in and
cleaning up the files). But this function should
consistently take responsibility for the resources it
allocates.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The sixth batch post 2.14

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'rs/archive-excluded-directory'

"git archive" did not work well with pathspecs and the
export-ignore attribute.

* rs/archive-excluded-directory:
  archive: don't queue excluded directories
  archive: factor out helper functions for handling attributes
  t5001: add tests for export-ignore attributes and exclude pathspecs

Merge branch 'po/read-graft-line'

Conversion from uchar[20] to struct object_id continues; this is to
ensure that we do not assume sizeof(struct object_id) is the same
as the length of SHA-1 hash (or length of longest hash we support).

* po/read-graft-line:
  commit: rewrite read_graft_line
  commit: allocate array using object_id size
  commit: replace the raw buffer with strbuf in read_graft_line
  sha1_file: fix definition of null_sha1

Merge branch 'ks/branch-set-upstream'

"branch --set-upstream" that has been deprecated in Git 1.8 has
finally been retired.

* ks/branch-set-upstream:
  branch: quote branch/ref names to improve readability
  builtin/branch: stop supporting the "--set-upstream" option
  t3200: cleanup cruft of a test

pkt-line: re-'static'-ify buffer in packet_write_fmt_1()

The static-ness was silently dropped in commit 70428d1a5 ("pkt-line: add
packet_write_fmt_gently()", 2016-10-16). As a result, for each call to
packet_write_fmt_1, we allocate and leak a buffer.

We could keep the strbuf non-static and instead make sure we always
release it before returning (but not before we die, so that we don't
touch errno). That would also prepare us for threaded use. But until
that needs to happen, let's just restore the static-ness so that we get
back to a situation where we (eventually) do not continuosly keep
allocating memory.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: remove return value from get_files_dirs

The return value of the get_files_dirs call is never being used.
Looking at the history of the file and it was originally only
being used for debug output statements. Also when
read_tree_recursive return value is non zero it is changed to
zero. This leads me to believe that it doesn't matter if
read_tree_recursive gets an error.

Since the debug output has been removed and the caller isn't
checking the return value there is no reason to keep calculating
and returning a value.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>