git.oblomov.eu Git

treewide: correct several "up-to-date" to "up to date"

Follow the Oxford style, which says to use "up-to-date" before the noun,
but "up to date" after it. Don't change plumbing (specifically
send-pack.c, but transport.c (git push) also has the same string).

This was produced by grepping for "up-to-date" and "up to date". It
turned out we only had to edit in one direction, removing the hyphens.

Fix a typo in Documentation/git-diff-index.txt while we're there.

Reported-by: Jeffrey Manian <jeffrey.manian@gmail.com>
Reported-by: STEVEN WHITE <stevencharleswhitevoices@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation/user-manual: update outdated example output

Since commit f7673490 ("more terse push output", 2007-11-05), git push
has a completely different output format than the one shown in the user
manual for a non-fast-forward push.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vcs-svn: move remaining repo_tree functions to fast_export.h

These used to be for manipulating the in-memory repo_tree structure,
but nowadays they are convenience wrappers to handle a few git-vs-svn
mismatches:

1. Git does not track empty directories but Subversion does.  When
    looking up a path in git that Subversion thinks exists and finding
    nothing, we can safely assume that the path represents a
    directory.  This is needed when a later Subversion revision
    modifies that directory.

2. Subversion allows deleting a file by copying.  In Git fast-import
    we have to handle that more explicitly as a deletion.

These are details of the tool's interaction with git fast-import.
Move them to fast_export.c, where other such details are handled.

This way the function names do not start with a repo_ prefix that
would clash with the repository object introduced in
v2.14.0-rc0~38^2~16 (repository: introduce the repository object,
2017-06-22) or an svn_ prefix that would clash with libsvn (in case
someone wants to link this code with libsvn some day).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vcs-svn: remove repo_delete wrapper function

Since v1.7.10-rc0~118^2~4^2~4^2~3 (vcs-svn: pass paths through to
fast-import, 2010-12-13) this is an alias for fast_export_delete.
Remove the unnecessary layer of indirection.

No functional change intended.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vcs-svn: remove custom mode constants

In the rest of Git, these modes are spelled as S_IFDIR,
S_IFREG | 0644, S_IFREG | 0755, and S_IFLNK. Use the same constants
in svn-fe for simplicity and consistency.

No functional change intended.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vcs-svn: remove more unused prototypes and declarations

I forgot to remove these in v1.7.10-rc0~118^2~4^2~5^2~4 (vcs-svn:
eliminate repo_tree structure, 2010-12-10).

This finishes what was started in commit 36f63b50 (vcs-svn: remove
unused prototypes, 2017-08-21).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Doc: clarify that pack-objects makes packs, plural

The documentation for pack-objects describes that it creates "a packed
archive of objects", which is confusing because it may create multiple
packs if --max-pack-size is set. Update the documentation to clarify
this, and explaining in which cases such a feature would be useful.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ThreadSanitizer: add suppressions

Add a file .tsan-suppressions and list two functions in it: want_color()
and transfer_debug(). Both of these use the pattern

static int foo = -1;
if (foo < 0)
foo = bar();

where bar always returns the same non-negative value. This can cause
ThreadSanitizer to diagnose a race when foo is written from two threads.
That is indeed a race, although it arguably doesn't matter in practice
since it's always the same value that is written.

Add NEEDSWORK-comments to the functions so that this problem is not
forever swept way under the carpet.

The suppressions-file is used by setting the environment variable
TSAN_OPTIONS to, e.g., "suppressions=$(pwd)/.tsan-suppressions". Observe
that relative paths such as ".tsan-suppressions" might not work.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

strbuf_setlen: don't write to strbuf_slopbuf

strbuf_setlen(., 0) writes '\0' to sb.buf[0], where buf is either
allocated and unique to sb, or the global slopbuf. The slopbuf is meant
to provide a guarantee that buf is not NULL and that a freshly
initialized buffer contains the empty string, but it is not supposed to
be written to. That strbuf_setlen writes to slopbuf has at least two
implications:

First, it's wrong in principle. Second, it might be hiding misuses which
are just waiting to wreak havoc. Third, ThreadSanitizer detects a race
when multiple threads write to slopbuf at roughly the same time, thus
potentially making any more critical races harder to spot.

Avoid writing to strbuf_slopbuf in strbuf_setlen. Let's instead assert
on the first byte of slopbuf being '\0', since it helps ensure the
promised invariant of buf[len] == '\0'. (We know that "len" was already
0, or someone has messed with "alloc". If someone has fiddled with the
fields that much beyond the correct interface, they're on their own.)

This is a function which is used in many places, possibly also in hot
code paths. There are two branches in strbuf_setlen already, and we are
adding a third and possibly a fourth (in the assert). In hot code paths,
we hopefully reuse the buffer in order to avoid continous reallocations.
Thus, after a start-up phase, we should always take the same path,
which might help branch prediction, and we would never make the assert.
If a hot code path continuously reallocates, we probably have bigger
performance problems than this new safety-check.

Simple measurements do not contradict this reasoning. 100000000 times
resetting a buffer and adding the empty string takes 5.29/5.26 seconds
with/without this patch (best of three). Releasing at every iteration
yields 18.01/17.87. Adding a 30-character string instead of the empty
string yields 5.61/5.58 and 17.28/17.28(!).

This patch causes the git binary emitted by gcc 5.4.0 -O2 on my machine
to grow from 11389848 bytes to 11497184 bytes, an increase of 0.9%.

I also tried to piggy-back on the fact that we already check alloc,
which should already tell us whether we are using the slopbuf:

        if (sb->alloc) {
                if (len > sb->alloc - 1)
                        die("BUG: strbuf_setlen() beyond buffer");
                sb->buf[len] = '\0';
        } else {
                if (len)
                        die("BUG: strbuf_setlen() beyond buffer");
                assert(!strbuf_slopbuf[0]);
        }
        sb->len = len;

That didn't seem to be much slower (5.38, 18.02, 5.70, 17.32 seconds),
but it does introduce some minor code duplication. The resulting git
binary was 11510528 bytes large (another 0.1% increase).

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: retry acquiring reference locks for 100ms

The philosophy of reference locking has been, "if another process is
changing a reference, then whatever I'm trying to do to it will
probably fail anyway because my old-SHA-1 value is probably no longer
current". But this argument falls down if the other process has locked
the reference to do something that doesn't actually change the value
of the reference, such as `pack-refs` or `reflog expire`. There
actually *is* a decent chance that a planned reference update will
still be able to go through after the other process has released the
lock.

So when trying to lock an individual reference (e.g., when creating
"refs/heads/master.lock"), if it is already locked, then retry the
lock acquisition for approximately 100 ms before giving up. This
should eliminate some unnecessary lock conflicts without wasting a lot
of time.

Add a configuration setting, `core.filesRefLockTimeout`, to allow this
setting to be tweaked.

Note: the function `get_files_ref_lock_timeout_ms()` cannot be private
to the files backend because it is also used by `write_pseudoref()`
and `delete_pseudoref()`, which are defined in `refs.c` so that they
can be used by other reference backends.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge: save merge state earlier

If the `git merge` process is killed while waiting for the editor to
finish, the merge state is lost but the prepared merge msg and tree is kept.
So, a subsequent `git commit` creates a squashed merge even when the
user asked for proper merge commit originally.

Demonstrate the problem with a test crafted after the in t7502. The test
requires EXECKEEPSPID (thus does not run under MINGW).

Save the merge state earlier (in the non-squash case) so that it does
not get lost. This makes the test pass.

Reported-by: hIpPy <hippy2981@gmail.com>
Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge: split write_merge_state in two

write_merge_state() writes out the merge heads, mode, and msg. But we
may want to write out heads, mode without the msg. So, split out heads
(+mode) into a separate function write_merge_heads() that is called by
write_merge_state().

No funtional change so far, except when these non-atomic writes are
interrupted: we write heads-mode-msg now when we used to write
heads-msg-mode.

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge: clarify call chain

prepare_to_commit() cannot be reached in the non-squash case:
It is called by merge_trivial() and finish_automerge() only, but the
calls to the latter are somewhat hard to track:

If option_commit is not set, the code in cmd_merge() uses a fake
conflict return code (ret=1) to avoid writing the tree, which also
avoids setting automerge_was_ok (just as in the proper ret==1 case), so
that finish_automerge() is not called.

To ensure that no code change breaks that assumption, safe-guard
prepare_to_commit() by a BUG() statement.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-objects: take lock before accessing `remaining`

When checking the conditional of "while (me->remaining)", we did not
hold the lock. Calling find_deltas would still be safe, since it checks
"remaining" (after taking the lock) and is able to handle all values. In
fact, this could (currently) not trigger any bug: a bug could happen if
`remaining` transitioning from zero to non-zero races with the evaluation
of the while-condition, but these are always separated by the
data_ready-mechanism.

Make sure we have the lock when we read `remaining`. This does mean we
release it just so that find_deltas can take it immediately again. We
could tweak the contract so that the lock should be taken before calling
find_deltas, but let's defer that until someone can actually show that
"unlock+lock" has a measurable negative impact.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

convert: always initialize attr_action in convert_attrs

convert_attrs contains an "if-else". In the "if", we set attr_action
twice, and the first assignment has no effect. In the "else", we do not
set it at all. Since git_check_attr always returns the same value, we'll
always end up in the "if", so there is no problem right now. But
convert_attrs is obviously trying not to rely on such an
implementation-detail of another component.

Make the initialization of attr_action after the if-else. Remove the
earlier assignments.

Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rerere: allow approxidate in gc.rerereResolved/gc.rerereUnresolved

These two configuration variables are described in the documentation
to take an expiry period expressed in the number of days:

    gc.rerereResolved::
    Records of conflicted merge you resolved earlier are
    kept for this many days when 'git rerere gc' is run.
    The default is 60 days.

    gc.rerereUnresolved::
    Records of conflicted merge you have not resolved are
    kept for this many days when 'git rerere gc' is run.
    The default is 15 days.

There is no strong reason not to allow a more general "approxidate"
expiry specification, e.g. "5.days.ago", or "never".

Rename the config_get_expiry() helper introduced in the previous
step to git_config_get_expiry_in_days() and move it to a more
generic place, config.c, and use date.c::parse_expiry_date() to do
so.  Give it an ability to allow the caller to tell among three
cases (i.e. there is no "gc.rerereResolved" config, there is and it
is correctly parsed into the *expiry variable, and there was an
error in parsing the given value).  The current caller can work
correctly without using the return value, though.

In the future, we may find other variables that only allow an
integer that specifies "this many days" or other unit of time, and
when it happens we may need to drop "_days" suffix from the name of
the function and instead pass the "scale" value as another parameter.

But this will do for now.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

rerere: represent time duration in timestamp_t internally

The two configuration variables, gc.rerereResolved and
gc.rerereUnresolved, are measured in days and are passed as such
into the prune_one() helper function, which worked in time_t to see
if an entry in the rerere database is past its expiry.

Instead, have the caller turn the number of days into the expiry
timestamp. Further, use timestamp_t instead of time_t. This will
make it possible to extend the way the configuration variable is
spelled by using date.c::parse_expiry_date() that gives the expiry
timestamp in timestamp_t.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4200: parameterize "rerere gc" custom expiry test

The test creates a rerere database entry that is two days old, and
tries to expire with three different custom expiry configuration
(keep ones less than 5 days old, keep ones used less than 5 days
ago, and expire everything right now).

We'll be introducing a different way to spell the same "5 days" and
"right now" parameter in a later step; parameterize the test to make
it easier to test the new spelling when it happens.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4200: gather "rerere gc" together

Move the "rerere gc with custom expiry" test up, so that it is close
to the existing basic "rerere gc" tests.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4200: make "rerere gc" test more robust

The test blindly trusted that there may be _some_ entries left in
the rerere database, and used them by updating their timestamps to
see if the gc threshold variables are honoured correctly. This
won't work if there is no entry in the database when the test
begins.

Instead, clear the rerere database, and populate it with a few known
entries (which are bogus, but for the purpose of testing "garbage
collection", it does not matter---we want to make sure we collect
old cruft, even if the files are corrupt rerere database entries),
and use them for the expiry test.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4200: give us a clean slate after "rerere gc" tests

The "multiple identical conflicts" test counts the number of entries
in the rerere database after trying a handful of mergy operations
and recording their resolutions, but without initializing the rerere
database to a known state, allowing the state left by previous tests
to trigger a false failure. Make it robust by cleaning the database
before it starts.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitweb: add 'raw' blob_plain link in history overview

For people that work with very large plain text files it may be easier
if one can bypass viewing the htmlized blob and instead click directly
to the raw file (rather then click through 'blob' and then to 'raw').

Signed-off-by: Job Snijders <job@instituut.net>
Reviewed-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The second batch post 2.14

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'mh/packed-ref-store'

The "ref-store" code reorganization continues.

* mh/packed-ref-store: (32 commits)
  files-backend: cheapen refname_available check when locking refs
  packed_ref_store: handle a packed-refs file that is a symlink
  read_packed_refs(): die if `packed-refs` contains bogus data
  t3210: add some tests of bogus packed-refs file contents
  repack_without_refs(): don't lock or unlock the packed refs
  commit_packed_refs(): remove call to `packed_refs_unlock()`
  clear_packed_ref_cache(): don't protest if the lock is held
  packed_refs_unlock(), packed_refs_is_locked(): new functions
  packed_refs_lock(): report errors via a `struct strbuf *err`
  packed_refs_lock(): function renamed from lock_packed_refs()
  commit_packed_refs(): use a staging file separate from the lockfile
  commit_packed_refs(): report errors rather than dying
  packed_ref_store: make class into a subclass of `ref_store`
  packed-backend: new module for handling packed references
  packed_read_raw_ref(): new function, replacing `resolve_packed_ref()`
  packed_ref_store: support iteration
  packed_peel_ref(): new function, extracted from `files_peel_ref()`
  repack_without_refs(): take a `packed_ref_store *` parameter
  get_packed_ref(): take a `packed_ref_store *` parameter
  rollback_packed_refs(): take a `packed_ref_store *` parameter
  ...

Merge branch 'sb/retire-t1200'

A test script that outlived its usefulness has been removed.

* sb/retire-t1200:
t1200: remove t1200-tutorial.sh

Merge branch 'rs/win32-syslog-leakfix'

Memory leak in an error codepath has been plugged.

* rs/win32-syslog-leakfix:
win32: plug memory leak on realloc() failure in syslog()

Merge branch 'rs/unpack-entry-leakfix'

Memory leak in an error codepath has been plugged.

* rs/unpack-entry-leakfix:
sha1_file: release delta_stack on error in unpack_entry()

Merge branch 'rs/strbuf-getwholeline-fix'

A helper function to read a single whole line into strbuf
mistakenly triggered OOM error at EOF under certain conditions,
which has been fixed.

* rs/strbuf-getwholeline-fix:
strbuf: clear errno before calling getdelim(3)

Merge branch 'rs/merge-microcleanup'

Code clean-up.

* rs/merge-microcleanup:
merge: use skip_prefix()

Merge branch 'rs/fsck-obj-leakfix'

Memory leak in an error codepath has been plugged.

* rs/fsck-obj-leakfix:
fsck: free buffers on error in fsck_obj()

Merge branch 'rs/t4062-obsd'

Test portability fix.

* rs/t4062-obsd:
t4062: use less than 256 repetitions in regex

Merge branch 'rs/find-pack-entry-bisection'

Code clean-up.

* rs/find-pack-entry-bisection:
sha1_file: avoid comparison if no packed hash matches the first byte

Merge branch 'rs/apply-lose-prefix-length'

Code clean-up.

* rs/apply-lose-prefix-length:
apply: remove prefix_length member from apply_state

Merge branch 'rj/add-chmod-error-message'

Message fix.

* rj/add-chmod-error-message:
builtin/add: add detail to a 'cannot chmod' error message

Merge branch 'jk/hashcmp-memcmp'

Code clean-up.

* jk/hashcmp-memcmp:
hashcmp: use memcmp instead of open-coded loop

Merge branch 'jk/drop-sha1-entry-pos'

Code clean-up.

* jk/drop-sha1-entry-pos:
sha1_file: drop experimental GIT_USE_LOOKUP search

Merge branch 'ur/svn-local-zone'

"git svn" used with "--localtime" option did not compute the tz
offset for the timestamp in question and instead always used the
current time, which has been corrected.

* ur/svn-local-zone:
git svn fetch: Create correct commit timestamp when using --localtime

Merge branch 'pw/am-signoff'

"git am -s" has been taught that some input may end with a trailer
block that is not Signed-off-by: and it should refrain from adding
an extra blank line before adding a new sign-off in such a case.

* pw/am-signoff:
am: fix signoff when other trailers are present

Merge branch 'rs/t3700-clean-leftover'

A test fix.

* rs/t3700-clean-leftover:
t3700: fix broken test under !POSIXPERM

Merge branch 'jc/perl-git-comment-typofix'

A comment fix.

* jc/perl-git-comment-typofix:
perl/Git.pm: typofix in a comment

Merge branch 'rs/in-obsd-basename-dirname-take-const'

Portability fix.

* rs/in-obsd-basename-dirname-take-const:
test-path-utils: handle const parameter of basename and dirname

Merge branch 'rs/obsd-getcwd-workaround'

Test portability fix for BSDs.

* rs/obsd-getcwd-workaround:
t0001: skip test with restrictive permissions if getpwd(3) respects them

Merge branch 'mf/no-dashed-subcommands'

Code clean-up.

* mf/no-dashed-subcommands:
scripts: use "git foo" not "git-foo"

Merge branch 'ma/parse-maybe-bool'

Code clean-up.

* ma/parse-maybe-bool:
  parse_decoration_style: drop unused argument `var`
  treewide: deprecate git_config_maybe_bool, use git_parse_maybe_bool
  config: make git_{config,parse}_maybe_bool equivalent
  config: introduce git_parse_maybe_bool_text
  t5334: document that git push --signed=1 does not work
  Doc/git-{push,send-pack}: correct --sign= to --signed=

Merge branch 'ab/ref-filter-no-contains'

A test fix.

* ab/ref-filter-no-contains:
tests: don't give unportable ">" to "test" built-in, use -gt

Merge branch 'bw/clone-recursive-quiet'

"git clone --recurse-submodules --quiet" did not pass the quiet
option down to submodules.

* bw/clone-recursive-quiet:
clone: teach recursive clones to respect -q

Merge branch 'bw/grep-recurse-submodules'

"git grep --recurse-submodules" has been reworked to give a more
consistent output across submodule boundary (and do its thing
without having to fork a separate process).

* bw/grep-recurse-submodules:
  grep: recurse in-process using 'struct repository'
  submodule: merge repo_read_gitmodules and gitmodules_config
  submodule: check for unmerged .gitmodules outside of config parsing
  submodule: check for unstaged .gitmodules outside of config parsing
  submodule: remove fetch.recursesubmodules from submodule-config parsing
  submodule: remove submodule.fetchjobs from submodule-config parsing
  config: add config_from_gitmodules
  cache.h: add GITMODULES_FILE macro
  repository: have the_repository use the_index
  repo_read_index: don't discard the index

Merge branch 'pw/sequence-rerere-autoupdate'

Commands like "git rebase" accepted the --rerere-autoupdate option
from the command line, but did not always use it.  This has been
fixed.

* pw/sequence-rerere-autoupdate:
  cherry-pick/revert: reject --rerere-autoupdate when continuing
  cherry-pick/revert: remember --rerere-autoupdate
  t3504: use test_commit
  rebase -i: honor --rerere-autoupdate
  rebase: honor --rerere-autoupdate
  am: remember --rerere-autoupdate setting

Merge branch 'bw/push-options-recursively-to-submodules'

"git push --recurse-submodules $there HEAD:$target" was not
propagated down to the submodules, but now it is.

* bw/push-options-recursively-to-submodules:
submodule--helper: teach push-check to handle HEAD

Documentation/git-merge: explain --continue

Currently, 'git merge --continue' is mentioned but not explained.

Explain it.

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

read-cache: avoid allocating every ondisk entry when writing

When writing the index for each entry an ondisk struct will be
allocated and freed in ce_write_entry.  We can do better by
using a ondisk struct on the stack for each entry.

This is accomplished by using a stack ondisk_cache_entry_extended
outside looping through the entries in do_write_index.  Only the
fixed fields of this struct are used when writing and depending on
whether it is extended or not the flags2 field will be written.
The name field is not used and instead the cache_entry name field
is used directly when writing out the name.  Because ce_write is
using a buffer and memcpy to fill the buffer before flushing to disk,
we don't have to worry about doing multiple ce_write calls.

Running the p0007-write-cache.sh tests would save anywhere
between 3-7% when the index had over a million entries with no
performance degradation on small repos.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

read-cache: fix memory leak in do_write_index

The previous_name_buf was never getting released when there
was an error in ce_write_entry or allow was false and execution
was returned to the caller.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

perf: add test for writing the index

A performance test for writing the index to be able to
determine if changes to allocating ondisk structure help.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sha1_file: convert index_stream to struct object_id

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sha1_file: convert hash_sha1_file_literally to struct object_id

Convert all remaining callers as well.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sha1_file: convert index_fd to struct object_id

Convert all remaining callers as well.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sha1_file: convert index_path to struct object_id

Convert all remaining callers as well.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

read-cache: convert to struct object_id

Replace hashcmp with oidcmp.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/hash-object: convert to struct object_id

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vcs-svn: rename repo functions to "svn_repo"

There were several functions in the Subversion code that started with
"repo_". This namespace is also used by the Git struct repository code.
Rename these functions to start with "svn_repo" to avoid any future
conflicts.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vcs-svn: remove unused prototypes

The Subversion code had prototypes for several functions which were not
ever defined or used. These functions all had names starting with
"repo_", some of which conflict with those in repository.h. To avoid
the conflict, remove those unused prototypes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: fix typo in sendemail.identity

Saying "the this" is an obvious typo. But while we're here,
let's polish the English on the second half of the sentence,
too.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc/interpret-trailers: fix "the this" typo

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

stash: add a test for stashing in a detached state

All that we are really testing here is that the message is
correct when we are not on any branch. All other functionality is
already tested elsewhere.

Signed-off-by: Joel Teichroeb <joel@teichroeb.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

stash: add a test for when apply fails during stash branch

If the return value of merge recursive is not checked, the stash could end
up being dropped even though it was not applied properly

Signed-off-by: Joel Teichroeb <joel@teichroeb.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

stash: add a test for stash create with no files

Ensure the command suceeds and outputs nothing

Signed-off-by: Joel Teichroeb <joel@teichroeb.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

progress: simplify "delayed" progress API

We used to expose the full power of the delayed progress API to the
callers, so that they can specify, not just the message to show and
expected total amount of work that is used to compute the percentage
of work performed so far, the percent-threshold parameter P and the
delay-seconds parameter N.  The progress meter starts to show at N
seconds into the operation only if we have not yet completed P per-cent
of the total work.

Most callers used either (0%, 2s) or (50%, 1s) as (P, N), but there
are oddballs that chose more random-looking values like 95%.

For a smoother workload, (50%, 1s) would allow us to start showing
the progress meter earlier than (0%, 2s), while keeping the chance
of not showing progress meter for long running operation the same as
the latter.  For a task that would take 2s or more to complete, it
is likely that less than half of it would complete within the first
second, if the workload is smooth.  But for a spiky workload whose
earlier part is easier, such a setting is likely to fail to show the
progress meter entirely and (0%, 2s) is more appropriate.

But that is merely a theory.  Realistically, it is of dubious value
to ask each codepath to carefully consider smoothness of their
workload and specify their own setting by passing two extra
parameters.  Let's simplify the API by dropping both parameters and
have everybody use (0%, 2s).

Oh, by the way, the percent-threshold parameter and the structure
member were consistently misspelled, which also is now fixed ;-)

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

apply: file commited with CRLF should roundtrip diff and apply

When a file had been commited with CRLF but now .gitattributes say
"* text=auto" (or core.autocrlf is true), the following does not
roundtrip, `git apply` fails:

    printf "Added line\r\n" >>file &&
    git diff >patch &&
    git checkout -- . &&
    git apply patch

Before applying the patch, the file from working tree is converted
into the index format (clean filter, CRLF conversion, ...).  Here,
when commited with CRLF, the line endings should not be converted.

Note that `git apply --index` or `git apply --cache` doesn't call
convert_to_git() because the source material is already in index
format.

Analyze the patch if there is a) any context line with CRLF, or b)
if any line with CRLF is to be removed.  In this case the patch file
`patch` has mixed line endings, for a) it looks like this:

    diff --git a/one b/one
    index 533790e..c30dea8 100644
    --- a/one
    +++ b/one
    @@ -1 +1,2 @@
     a\r
    +b\r

And for b) it looks like this:

    diff --git a/one b/one
    index 533790e..485540d 100644
    --- a/one
    +++ b/one
    @@ -1 +1 @@
    -a\r
    +b\r

If `git apply` detects that the patch itself has CRLF, (look at the
line " a\r" or "-a\r" above), the new flag crlf_in_old is set in
"struct patch" and two things will happen:

    - read_old_data() will not convert CRLF into LF by calling
      convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
    - The WS_CR_AT_EOL bit is set in the "white space rule",
      CRLF are no longer treated as white space.

While at there, make it clear that read_old_data() in apply.c knows
what it wants convert_to_git() to do with respect to CRLF.  In fact,
this codepath is about applying a patch to a file in the filesystem,
which may not exist in the index, or may exist but may not match
what is recorded in the index, or in the extreme case, we may not
even be in a Git repository.  If convert_to_git() peeked at the
index while doing its work, it *would* be a bug.

Pass NULL instead of &the_index to convert_to_git() to make sure we
catch future bugs to clarify this.

Update the test in t4124: split one test case into 3:

    - Detect the " a\r" line in the patch
    - Detect the "-a\r" line in the patch
    - Use LF in repo and CLRF in the worktree.

Reported-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: remove unused inline function single_parent()

53b2c823f6 (revision walker: mini clean-up) added the function in 2007,
but it was never used, so we should be able to get rid of it now.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

archive: don't queue excluded directories

Reject directories with the attribute export-ignore already while
queuing them. This prevents read_tree_recursive() from descending into
them and this avoids write_archive_entry() rejecting them later on,
which queue_or_write_archive_entry() is not prepared for.

Borrow the existing strbuf to build the full path to avoid string
copies and extra allocations; just make sure we restore the original
value before moving on.

Keep checking any other attributes in write_archive_entry() as before,
but avoid checking them twice.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

archive: factor out helper functions for handling attributes

Add helpers for accessing attributes that encapsulate the details of how
to retrieve their values.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5001: add tests for export-ignore attributes and exclude pathspecs

Demonstrate mishandling of the attribute export-ignore by git archive
when used together with pathspecs. Wildcard pathspecs can even cause it
to abort. And a directory excluded without a wildcard is still included
as an empty folder in the archive.

Test-case-by: David Adam <zanchey@ucc.gu.uwa.edu.au>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: rewrite read_graft_line

Old implementation determined number of hashes by dividing length of
line by length of hash, which works only if all hash representations
have same length.

New graft line parser works in two phases:

  1. In first phase line is scanned to verify correctness and compute
     number of hashes, then graft struct is allocated.

  2. In second phase line is scanned again to fill up already allocated
     graft struct.

This way graft parsing code can support different sizes of hashes
without any further code adaptations.

A number of alternative implementations were considered and discarded:

  - Modifying graft structure to store oid_array instead of FLEXI_ARRAY
    indicates undesirable usage of struct to readers.

  - Parsing into temporary string_list or oid_array complicates code
    by adding more return paths, as these structures needs to be
    cleared before returning from function.

  - Determining number of hashes by counting separators might cause
    maintenance issues, if this function needs to be modified in future
    again.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: allocate array using object_id size

struct commit_graft aggregates an array of object_id's, which have
size >= GIT_MAX_RAWSZ bytes. This change prevents memory allocation
error when size of object_id is larger than GIT_SHA1_RAWSZ.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: replace the raw buffer with strbuf in read_graft_line

This simplifies function declaration and allows for use of strbuf_rtrim
instead of modifying buffer directly.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation/git-for-each-ref: clarify peeling of tags for --format

`*` in format strings means peeling of tag objects so that object field
names refer to the object that the tag object points at, instead of the
tag object itself.

Currently, this is documented using grammar that is clearly inspired by
classical latin, though missing more than an article in order to be
classical english.

Try and straighten that explanation out a bit.

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: use proper wording for ref format strings

Various commands list refs and allow to use a format string for the
output that interpolates from the ref as well as the object it points
at (for-each-ref; branch and tag in list mode).

Currently, the documentation talks about interpolating from the object.
This is confusing because a ref points to an object but not vice versa,
so the object cannot possible know %(refname), for example. Thus, this is
wrong independent of refs being objects (one day, maybe) or not.

Change the wording to make this clearer (and distinguish it from formats
for the log family).

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sha1_file: fix definition of null_sha1

The array is declared in cache.h as:

extern const unsigned char null_sha1[GIT_MAX_RAWSZ];

Definition in sha1_file.c must match.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-grep: correct exit code with --quiet and -L

The handling of `status_only` no longer interferes with the handling of
`unmatch_name_only`. `--quiet` no longer affects the exit code when using
`-L`/`--files-without-match`.

Signed-off-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: retire sane_truncate_fn

Long time ago, 23707811 ("diff: do not chomp hunk-header in the
middle of a character", 2008-01-02) introduced sane_truncate_line()
helper function to trim the "function header" line that is shown at
the end of the hunk header line, in order to avoid chomping it in
the middle of a single UTF-8 character. It also added a facility to
define a custom callback function to make it possible to extend it
to non UTF-8 encodings.

During the following 8 1/2 years, nobody found need for this custom
callback facility.

A custom callback function is a wrong design to use here anyway---if
your contents need support for non UTF-8 encoding, you shouldn't
have to write a custom function and recompile Git to plumb it in. A
better approach would be to extend sane_truncate_line() function and
have a new member in emit_callback to conditionally trigger it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

files-backend: cheapen refname_available check when locking refs

When locking references in preparation for updating them, we need to
check that none of the newly added references D/F conflict with
existing references (e.g., we don't allow `refs/foo` to be added if
`refs/foo/bar` already exists, or vice versa).

Prior to 524a9fdb51 (refs_verify_refname_available(): use function in
more places, 2017-04-16), conflicts with existing loose references
were checked by looking directly in the filesystem, and then conflicts
with existing packed references were checked by running
`verify_refname_available_dir()` against the packed-refs cache.

But that commit changed the final check to call
`refs_verify_refname_available()` against the *whole* files ref-store,
including both loose and packed references, with the following
comment:

> This means that those callsites now check for conflicts with all
> references rather than just packed refs, but the performance cost
> shouldn't be significant (and will be regained later).

That comment turned out to be too sanguine. User s@kazlauskas.me
reported that fetches involving a very large number of references in
neighboring directories were slowed down by that change.

The problem is that when fetching, each reference is updated
individually, within its own reference transaction. This is done
because some reference updates might succeed even though others fail.
But every time a reference update transaction is finished,
`clear_loose_ref_cache()` is called. So when it is time to update the
next reference, part of the loose ref cache has to be repopulated for
the `refs_verify_refname_available()` call. If the references are all
in neighboring directories, then the cost of repopulating the
reference cache increases with the number of references, resulting in
O(N²) effort.

The comment above also claims that the performance cost "will be
regained later". The idea was that once the packed-refs were finished
being split out into a separate ref-store, we could limit the
`refs_verify_refname_available()` call to the packed references again.
That is what we do now.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5526: fix some broken && chains

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

branch: quote branch/ref names to improve readability

Signed-off-by: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/branch: stop supporting the "--set-upstream" option

The '--set-upstream' option of branch was deprecated in b347d06b
("branch: deprecate --set-upstream and show help if we detect
possible mistaken use", 2012-08-30) and has been planned for removal
ever since.

In order to prevent "--set-upstream" on a command line from being taken as
an abbreviated form of "--set-upstream-to", explicitly catch "--set-upstream"
option and die, instead of just removing it from the list of options.

Before this change, an attempt to use "--set-upstream" resulted in:

    $ git branch
    * master

    $ git branch --set-upstream origin/master
    The --set-upstream flag is deprecated and will be removed. Consider using --track or --set-upstream-to
    Branch origin/master set up to track local branch master.

    $ echo $?
    0

    $ git branch
    * master
      origin/master

With this change, the behaviour becomes like this:

    $ git branch
    * master

    $ git branch --set-upstream origin/master
    fatal: the '--set-upstream' option is no longer supported. Please use '--track' or '--set-upstream-to' instead.

    $ echo $?
    128

    $ git branch
    * master

Signed-off-by: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3200: cleanup cruft of a test

Avoiding the clean up step of tests may help in some cases but in other
cases they cause the other unrelated tests to fail for unobvious reasons.
It's better to cleanup a few things to keep other tests from failing
as a result of it.

So, cleanup a cruft left behind by an old test in order for the changes that
are to be introduced to be independent of it.

Signed-off-by: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule.sh: remove unused variable

This could have been part of 48308681b0 (git submodule update: have a
dedicated helper for cloning, 2016-02-29).

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: define block by number of alphanumeric chars

The existing behavior of diff --color-moved=zebra does not define the
minimum size of a block at all, instead relying on a heuristic applied
later to filter out sets of adjacent moved lines that are shorter than 3
lines long. This can be confusing, because a block could thus be colored
as moved at the source but not at the destination (or vice versa),
depending on its neighbors.

Instead, teach diff that the minimum size of a block is 20 alphanumeric
characters, the same heuristic used by "git blame". This allows diff to
still exclude uninteresting lines appearing on their own (such as those
solely consisting of one or a few closing braces), as was the intention
of the adjacent-moved-line heuristic.

This requires a change in some tests in that some of their lines are no
longer considered to be part of a block, because they are too short.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: respect MIN_BLOCK_LENGTH for last block

Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line
that does not belong to the current block. In particular, this means
that MIN_BLOCK_LENGTH is not checked after all lines are encountered.

Perform that check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

convert: add SAFE_CRLF_KEEP_CRLF

When convert_to_git() is called, the caller may want to keep CRLF to
be kept as CRLF (and not converted into LF).

This will be used in the next commit, when apply works with files
that have CRLF and patches are applied onto these files.

Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf.

Prepare convert_to_git() to be able to run the clean filter, skip
the CRLF conversion and run the ident filter.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: skip discarding the index if there is no pre-commit hook

If there is not a pre-commit hook, there is no reason to discard
the index and reread it.

This change checks to presence of a pre-commit hook and then only
discards the index if there was one.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sub-process: print the cmd when a capability is unsupported

In handshake_capabilities() we use warning() when a capability
is not supported, so the exit code of the function is 0 and no
further error is shown. This is a problem because the warning
message doesn't tell us which subprocess cmd failed.

On the contrary if we cannot write a packet from this function,
we use error() and then subprocess_start() outputs:

initialization for subprocess '<cmd>' failed

so we can know which subprocess cmd failed.

Let's improve the warning() message, so that we can know which
subprocess cmd failed.

Helped-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sha1_file: make read_info_alternates static

read_info_alternates is not used from outside, so let's make it static.

We have to declare the function before link_alt_odb_entry instead of
moving the code around, link_alt_odb_entry calls read_info_alternates,
which in turn calls link_alt_odb_entry.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t1002: stop using sum(1)

sum(1) is a command for calculating checksums of the contents of files.
It was part of early editions of Unix ("Research Unix", 1972/1973, [1]).
cksum(1) appeared in 4.4BSD (1993) as a replacement [2], and became part
of POSIX.1-2008 [3].  OpenBSD 5.6 (2014) removed sum(1).

We only use sum(1) in t1002 to check for changes in three files.  On
MinGW we use md5sum(1) instead.  We could switch to the standard command
cksum(1) for all platforms; MinGW comes with GNU coreutils now, which
provides sum(1), cksum(1) and md5sum(1).  Use our standard method for
checking for file changes instead: test_cmp.

It's more convenient because it shows differences nicely, it's faster on
MinGW because we have a special implementation there based only on
shell-internal commands, it's simpler as it allows us to avoid stripping
out unnecessary entries from the checksum file using grep(1), and it's
more consistent with the rest of the test suite.

We already compare changed files with their expected new contents using
diff(1), so we don't need to check with "test_must_fail test_cmp" if
they differ from their original state.  A later patch could convert the
direct diff(1) calls to test_cmp as well.

With all sum(1) calls gone, remove the MinGW-specific implementation
from test-lib.sh as well.

[1] http://minnie.tuhs.org/cgi-bin/utree.pl?file=V3/man/man1/sum.1
[2] http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.4BSD/usr/share/man/cat1/cksum.0
[3] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cksum.html

Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pretty: support normalization options for %(trailers)

The interpret-trailers command recently learned some options
to make its output easier to parse (for a caller whose only
interested in picking out the trailer values). But it's not
very efficient for asking for the trailers of many commits
in a single invocation.

We already have "%(trailers)" to do that, but it doesn't
know about unfolding or omitting non-trailers. Let's plumb
those options through, so you can have the best of both.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4205: refactor %(trailers) tests

We currently have one test for %(trailers). In preparation
for more, let's refactor a few bits:

  - move the commit creation to its own setup step so it can
    be reused by multiple tests

  - add a trailer with whitespace continuation (to confirm
    that it is left untouched)

  - fix the sample text which claims the placeholder is %bT.
    This was switched long ago to %(trailers)

  - replace one "cat" with an "echo" when generating the
    expected output. This saves a process (and sets a better
    pattern for future tests to follow).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pretty: move trailer formatting to trailer.c

The next commit will add many features to the %(trailer)
placeholder in pretty.c. We'll need to access some internal
functions of trailer.c for that, so our options are either:

1. expose those functions publicly

or

2. make an entry point into trailer.c to do the formatting

Doing (2) ends up exposing less surface area, though do note
that caveats in the docstring of the new function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

interpret-trailers: add --parse convenience option

The last few commits have added command line options that
can turn interpret-trailers into a parsing tool. Since
they'd most often be used together, let's provide a
convenient single option for callers to invoke this mode.

This is implemented as a callback rather than a boolean so
that its effect is applied immediately, as if those options
had been specified. Later options can then override them.
E.g.:

git interpret-trailers --parse --no-unfold

would work.

Let's also update the documentation to make clear that this
parsing mode behaves quite differently than the normal
"add trailers to the input" mode.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

interpret-trailers: add an option to unfold values

The point of "--only-trailers" is to give a caller an output
that's easy for them to parse. Getting rid of the
non-trailer material helps, but we still may see more
complicated syntax like whitespace continuation. Let's add
an option to unfold any continuation, giving the output as a
single "key: value" line per trailer.

As a bonus, this could be used even without --only-trailers
to clean up unusual formatting in the incoming data.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

interpret-trailers: add an option to show only existing trailers

It can be useful to invoke interpret-trailers for the
primary purpose of parsing existing trailers. But in that
case, we don't want to apply existing ifMissing or ifExists
rules from the config. Let's add a special mode where we
avoid applying those rules. Coupled with --only-trailers,
this gives us a reasonable parsing tool.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

interpret-trailers: add an option to show only the trailers

In theory it's easy for any reader who wants to parse
trailers to do so. But there are a lot of subtle corner
cases around what counts as a trailer, when the trailer
block begins and ends, etc. Since interpret-trailers already
has our parsing logic, let's let callers ask it to just
output the trailers.

They still have to parse the "key: value" lines, but at
least they can ignore all of the other corner cases.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>