Derrick Stolee [Thu, 15 Oct 2020 17:22:02 +0000 (17:22 +0000)]
maintenance: create maintenance.strategy config
To provide an on-ramp for users to use background maintenance without
several 'git config' commands, create a 'maintenance.strategy' config
option. Currently, the only important value is 'incremental' which
assigns the following schedule:
* gc: never
* prefetch: hourly
* commit-graph: hourly
* loose-objects: daily
* incremental-repack: daily
These tasks are chosen to minimize disruptions to foreground Git
commands and use few compute resources.
The 'maintenance.strategy' is intended as a baseline that can be
customzied further by manually assigning 'maintenance.<task>.enabled'
and 'maintenance.<task>.schedule' config options, which will override
any recommendation from 'maintenance.strategy'. This operates similarly
to config options like 'feature.experimental' which operate as "meta"
config options that change default config values.
This presents a way forward for updating the 'incremental' strategy in
the future or adding new strategies. For example, a potential strategy
could be to include a 'full' strategy that runs the 'gc' task weekly
and no other tasks by default.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 11 Sep 2020 17:49:18 +0000 (17:49 +0000)]
maintenance: add start/stop subcommands
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.
The schedule is laid out as follows:
0 1-23 * * * $cmd maintenance run --schedule=hourly
0 0 * * 1-6 $cmd maintenance run --schedule=daily
0 0 * * 0 $cmd maintenance run --schedule=weekly
where $cmd is a properly-qualified 'git for-each-repo' execution:
$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo
where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.
This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.
The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 11 Sep 2020 17:49:17 +0000 (17:49 +0000)]
maintenance: add [un]register subcommands
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.
These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.
For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.
The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 11 Sep 2020 17:49:16 +0000 (17:49 +0000)]
for-each-repo: run subcommands on configured repos
It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.
The test is very simple, but does highlight that the "--" argument is
optional.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 11 Sep 2020 17:49:15 +0000 (17:49 +0000)]
maintenance: add --schedule option and config
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.
The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:
'hourly' > 'daily' > 'weekly'
Use new 'enum schedule_priority' to track these values numerically.
The following cron table would run the scheduled tasks with the correct
frequencies:
0 1-23 * * * git -C <repo> maintenance run --schedule=hourly
0 0 * * 1-6 git -C <repo> maintenance run --schedule=daily
0 0 * * 0 git -C <repo> maintenance run --schedule=weekly
This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 28 Aug 2020 15:45:12 +0000 (15:45 +0000)]
maintenance: optionally skip --auto process
Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.
Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 25 Sep 2020 12:33:38 +0000 (12:33 +0000)]
maintenance: add incremental-repack auto condition
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.
The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.
Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 25 Sep 2020 12:33:37 +0000 (12:33 +0000)]
maintenance: auto-size incremental-repack batch
When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.
Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.
The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.
Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.
Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.
We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.
Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.
Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 25 Sep 2020 12:33:36 +0000 (12:33 +0000)]
maintenance: add incremental-repack task
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.
One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.
Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in
ce1e4a1
(midx: implement midx_repack(), 2019-06-10).
The 'incremental-repack' task runs the following steps:
1. 'git multi-pack-index write' creates a multi-pack-index file if
one did not exist, and otherwise will update the multi-pack-index
with any new pack-files that appeared since the last write. This
is particularly relevant with the background fetch job.
When the multi-pack-index sees two copies of the same object, it
stores the offset data into the newer pack-file. This means that
some old pack-files could become "unreferenced" which I will use
to mean "a pack-file that is in the pack-file list of the
multi-pack-index but none of the objects in the multi-pack-index
reference a location inside that pack-file."
2. 'git multi-pack-index expire' deletes any unreferenced pack-files
and updaes the multi-pack-index to drop those pack-files from the
list. This is safe to do as concurrent Git processes will see the
multi-pack-index and not open those packs when looking for object
contents. (Similar to the 'loose-objects' job, there are some Git
commands that open pack-files regardless of the multi-pack-index,
but they are rarely used. Further, a user that self-selects to
use background operations would likely refrain from using those
commands.)
3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
of pack-files that are listed in the multi-pack-index and creates
a new pack-file containing the objects whose offsets are listed
by the multi-pack-index to be in those objects. The set of pack-
files is selected greedily by sorting the pack-files by modified
time and adding a pack-file to the set if its "expected size" is
smaller than the batch size until the total expected size of the
selected pack-files is at least the batch size. The "expected
size" is calculated by taking the size of the pack-file divided
by the number of objects in the pack-file and multiplied by the
number of objects from the multi-pack-index with offset in that
pack-file. The expected size approximates how much data from that
pack-file will contribute to the resulting pack-file size. The
intention is that the resulting pack-file will be close in size
to the provided batch size.
The next run of the incremental-repack task will delete these
repacked pack-files during the 'expire' step.
In this version, the batch size is set to "0" which ignores the
size restrictions when selecting the pack-files. It instead
selects all pack-files and repacks all packed objects into a
single pack-file. This will be updated in the next change, but
it requires doing some calculations that are better isolated to
a separate change.
These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by
af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).
These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.
By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 25 Sep 2020 12:33:35 +0000 (12:33 +0000)]
midx: use start_delayed_progress()
Now that the multi-pack-index may be written as part of auto maintenance
at the end of a command, reduce the progress output when the operations
are quick. Use start_delayed_progress() instead of start_progress().
Update t5319-multi-pack-index.sh to use GIT_PROGRESS_DELAY=0 now that
the progress indicators are conditional.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 25 Sep 2020 12:33:34 +0000 (12:33 +0000)]
midx: enable core.multiPackIndex by default
The core.multiPackIndex setting has been around since
c4d25228ebb
(config: create core.multiPackIndex setting, 2018-07-12), but has been
disabled by default. If a user wishes to use the multi-pack-index
feature, then they must enable this config and run 'git multi-pack-index
write'.
The multi-pack-index feature is relatively stable now, so make the
config option true by default. For users that do not use a
multi-pack-index, the only extra cost will be a file lookup to see if a
multi-pack-index file exists (once per process, per object directory).
Also, this config option will be referenced by an upcoming
"incremental-repack" task in the maintenance builtin, so move the config
option into the repository settings struct. Note that if
GIT_TEST_MULTI_PACK_INDEX=1, then we want to ignore the config option
and treat core.multiPackIndex as enabled.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 25 Sep 2020 12:33:33 +0000 (12:33 +0000)]
maintenance: create auto condition for loose-objects
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.
The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 25 Sep 2020 12:33:32 +0000 (12:33 +0000)]
maintenance: add loose-objects task
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.
Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:
1. Run 'git prune-packed' to delete any loose objects that exist
in a pack-file. Concurrent commands will prefer the packed
version of the object to the loose version. (Of course, there
are exceptions for commands that specifically care about the
location of an object. These are rare for a user to run on
purpose, and we hope a user that has selected background
maintenance will not be trying to do foreground maintenance.)
2. Run 'git pack-objects' on a batch of loose objects. These
objects are grouped by scanning the loose object directories in
lexicographic order until listing all loose objects -or-
reaching 50,000 objects. This is more than enough if the loose
objects are created only by a user doing normal development.
We noticed users with _millions_ of loose objects because VFS
for Git downloads blobs on-demand when a file read operation
requires populating a virtual file.
This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 25 Sep 2020 12:33:31 +0000 (12:33 +0000)]
maintenance: add prefetch task
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.
Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.
The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.
However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.
When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:
1. --no-tags prevents getting a new tag when a user wants to see
the new tags appear in their foreground fetches.
2. --refmap= removes the configured refspec which usually updates
refs/remotes/<remote>/* with the refs advertised by the remote.
While this looks confusing, this was documented and tested by
b40a50264ac (fetch: document and test --refmap="", 2020-01-21),
including this sentence in the documentation:
Providing an empty `<refspec>` to the `--refmap` option
causes Git to ignore the configured refspecs and rely
entirely on the refspecs supplied as command-line arguments.
3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
we can ensure that we actually load the new values somewhere in
our refspace while not updating refs/heads or refs/remotes. By
storing these refs here, the commit-graph job will update the
commit-graph with the commits from these hidden refs.
4. --prune will delete the refs/prefetch/<remote> refs that no
longer appear on the remote.
5. --no-write-fetch-head prevents updating FETCH_HEAD.
We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:52 +0000 (18:11 +0000)]
maintenance: add trace2 regions for task execution
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:51 +0000 (18:11 +0000)]
maintenance: add auto condition for commit-graph task
Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.
This count is controlled by the maintenance.commit-graph.auto config
option.
To compute the count, use a depth-first search starting at each ref, and
leaving markers using the SEEN flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:50 +0000 (18:11 +0000)]
maintenance: use pointers to check --auto
The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.
Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.
First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.
Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.
Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.
We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:49 +0000 (18:11 +0000)]
maintenance: create maintenance.<task>.enabled config
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.
Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:48 +0000 (18:11 +0000)]
maintenance: take a lock on the objects directory
Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.
If the lock file already exists, then no maintenance tasks are
attempted. This will become very important later when we implement the
'prefetch' task, as this is our stop-gap from creating a recursive process
loop between 'git fetch' and 'git maintenance run --auto'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:47 +0000 (18:11 +0000)]
maintenance: add --task option
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.
Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.
Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:46 +0000 (18:11 +0000)]
maintenance: add commit-graph task
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command
git commit-graph write --reachable --split
By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.
Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)
If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:45 +0000 (18:11 +0000)]
maintenance: initialize task array
In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.
The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.
A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.
The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will print an error message.
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:44 +0000 (18:11 +0000)]
maintenance: replace run_auto_gc()
The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.
To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.
Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.
Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:43 +0000 (18:11 +0000)]
maintenance: add --quiet option
Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.
Pipe the option to the 'git gc' child process.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 17 Sep 2020 18:11:42 +0000 (18:11 +0000)]
maintenance: create basic maintenance runner
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.
Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.
For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.
Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.
Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 18 Aug 2020 14:25:22 +0000 (14:25 +0000)]
fetch: optionally allow disabling FETCH_HEAD update
If you run fetch but record the result in remote-tracking branches,
and either if you do nothing with the fetched refs (e.g. you are
merely mirroring) or if you always work from the remote-tracking
refs (e.g. you fetch and then merge origin/branchname separately),
you can get away with having no FETCH_HEAD at all.
Teach "git fetch" a command line option "--[no-]write-fetch-head".
The default is to write FETCH_HEAD, and the option is primarily
meant to be used with the "--no-" prefix to override this default,
because there is no matching fetch.writeFetchHEAD configuration
variable to flip the default to off (in which case, the positive
form may become necessary to defeat it).
Note that under "--dry-run" mode, FETCH_HEAD is never written;
otherwise you'd see list of objects in the file that you do not
actually have. Passing `--write-fetch-head` does not force `git
fetch` to write the file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 18 Aug 2020 00:01:32 +0000 (17:01 -0700)]
Eighth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 18 Aug 2020 00:02:49 +0000 (17:02 -0700)]
Merge branch 'so/log-diff-merges-opt'
Earlier, to countermand the implicit "-m" option when the
"--first-parent" option is used with "git log", we added the
"--[no-]diff-merges" option in the jk/log-fp-implies-m topic. To
leave the door open to allow the "--diff-merges" option to take
values that instructs how patches for merge commits should be
computed (e.g. "cc"? "-p against first parent?"), redefine
"--diff-merges" to take non-optional value, and implement "off"
that means the same thing as "--no-diff-merges".
* so/log-diff-merges-opt:
t/t4013: add test for --diff-merges=off
doc/git-log: describe --diff-merges=off
revision: change "--diff-merges" option to require parameter
Junio C Hamano [Tue, 18 Aug 2020 00:02:49 +0000 (17:02 -0700)]
Merge branch 'jk/log-fp-implies-m'
"git log --first-parent -p" showed patches only for single-parent
commits on the first-parent chain; the "--first-parent" option has
been made to imply "-m". Use "--no-diff-merges" to restore the
previous behaviour to omit patches for merge commits.
* jk/log-fp-implies-m:
doc/git-log: clarify handling of merge commit diffs
doc/git-log: move "-t" into diff-options list
doc/git-log: drop "-r" diff option
doc/git-log: move "Diff Formatting" from rev-list-options
log: enable "-m" automatically with "--first-parent"
revision: add "--no-diff-merges" option to counteract "-m"
log: drop "--cc implies -m" logic
Junio C Hamano [Tue, 18 Aug 2020 00:02:48 +0000 (17:02 -0700)]
Merge branch 'ma/stop-progress-null-fix'
NULL dereference fix.
* ma/stop-progress-null-fix:
progress: don't dereference before checking for NULL
Junio C Hamano [Tue, 18 Aug 2020 00:02:47 +0000 (17:02 -0700)]
Merge branch 'es/test-cmp-typocatcher'
Test framework update.
* es/test-cmp-typocatcher:
test_cmp: diagnose incorrect arguments
Junio C Hamano [Tue, 18 Aug 2020 00:02:46 +0000 (17:02 -0700)]
Merge branch 'rp/apply-cached-with-i-t-a'
Recent versions of "git diff-files" shows a diff between the index
and the working tree for "intent-to-add" paths as a "new file"
patch; "git apply --cached" should be able to take "git diff-files"
and should act as an equivalent to "git add" for the path, but the
command failed to do so for such a path.
* rp/apply-cached-with-i-t-a:
t4140: test apply with i-t-a paths
apply: make i-t-a entries never match worktree
apply: allow "new file" patches on i-t-a entries
Junio C Hamano [Tue, 18 Aug 2020 00:02:45 +0000 (17:02 -0700)]
Merge branch 'al/bisect-first-parent'
"git bisect" learns the "--first-parent" option to find the first
breakage along the first-parent chain.
* al/bisect-first-parent:
bisect: combine args passed to find_bisection()
bisect: introduce first-parent flag
cmd_bisect__helper: defer parsing no-checkout flag
rev-list: allow bisect and first-parent flags
t6030: modernize "git bisect run" tests
Junio C Hamano [Tue, 18 Aug 2020 00:02:45 +0000 (17:02 -0700)]
Merge branch 'jk/sideband-error-l10n'
Mark error message for i18n.
* jk/sideband-error-l10n:
sideband: mark "remote error:" prefix for translation
Junio C Hamano [Tue, 18 Aug 2020 00:02:44 +0000 (17:02 -0700)]
Merge branch 'jc/noop-with-static-inline'
A no-op replacement function implemented as a C preprocessor macro
does not perform as good a job as one implemented as a "static
inline" function in catching errors in parameters; replace the
former with the latter in <git-compat-util.h> header.
* jc/noop-with-static-inline:
compat-util: type-check parameters of no-op replacement functions
Junio C Hamano [Tue, 18 Aug 2020 00:02:43 +0000 (17:02 -0700)]
Merge branch 'pd/mergetool-nvimdiff'
The existing backends for "git mergetool" based on variants of vim
have been refactored and then support for "nvim" has been added.
* pd/mergetool-nvimdiff:
mergetools: add support for nvimdiff (neovim) family
mergetool--lib: improve support for vimdiff-style tool variants
Junio C Hamano [Tue, 18 Aug 2020 00:02:42 +0000 (17:02 -0700)]
Merge branch 'hn/reftable-prep-part-2'
Further preliminary change to refs API.
* hn/reftable-prep-part-2:
Make HEAD a PSEUDOREF rather than PER_WORKTREE.
Modify pseudo refs through ref backend storage
t1400: use git rev-parse for testing PSEUDOREF existence
Junio C Hamano [Tue, 18 Aug 2020 00:02:41 +0000 (17:02 -0700)]
Merge branch 'dd/send-email-config'
Stop when "sendmail.*" configuration variables are defined, which
could be a mistaken attempt to define "sendemail.*" variables.
* dd/send-email-config:
git-send-email: die if sendmail.* config is set
Junio C Hamano [Tue, 18 Aug 2020 00:02:40 +0000 (17:02 -0700)]
Merge branch 'ps/ref-transaction-hook'
The logic to find the ref transaction hook script attempted to
cache the path to the found hook without realizing that it needed
to keep a copied value, as the API it used returned a transitory
buffer space. This has been corrected.
* ps/ref-transaction-hook:
t1416: avoid hard-coded sha1 ids
refs: fix interleaving hook calls with reference-transaction hook
Junio C Hamano [Thu, 13 Aug 2020 21:13:59 +0000 (14:13 -0700)]
Seventh batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 13 Aug 2020 21:13:40 +0000 (14:13 -0700)]
Merge branch 'rp/blame-first-parent-doc'
The "git blame --first-parent" option was not documented, but now
it is.
* rp/blame-first-parent-doc:
blame-options.txt: document --first-parent option
Junio C Hamano [Thu, 13 Aug 2020 21:13:39 +0000 (14:13 -0700)]
Merge branch 'ma/test-quote-cleanup'
Test cleanup.
* ma/test-quote-cleanup:
t4104: modernize and simplify quoting
t: don't spuriously close and reopen quotes
Junio C Hamano [Thu, 13 Aug 2020 21:13:39 +0000 (14:13 -0700)]
Merge branch 'jt/has_object'
A new helper function has_object() has been introduced to make it
easier to mark object existence checks that do and don't want to
trigger lazy fetches, and a few such checks are converted using it.
* jt/has_object:
fsck: do not lazy fetch known non-promisor object
pack-objects: no fetch when allow-{any,promisor}
apply: do not lazy fetch when applying binary
sha1-file: introduce no-lazy-fetch has_object()
Junio C Hamano [Thu, 13 Aug 2020 21:13:39 +0000 (14:13 -0700)]
Merge branch 'bc/sha-256-cvs-svn-updates'
Portability fix.
* bc/sha-256-cvs-svn-updates:
git-cvsexportcommit: support Perl before 5.10.1
Junio C Hamano [Wed, 12 Aug 2020 01:03:56 +0000 (18:03 -0700)]
Sixth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 12 Aug 2020 01:04:13 +0000 (18:04 -0700)]
Merge branch 'ss/cmake-build'
CMake support to build with MSVC for Windows bypassing the Makefile.
* ss/cmake-build:
ci: modification of main.yml to use cmake for vs-build job
cmake: support for building git on windows with msvc and clang.
cmake: support for building git on windows with mingw
cmake: support for testing git when building out of the source tree
cmake: support for testing git with ctest
cmake: installation support for git
cmake: generate the shell/perl/python scripts and templates, translations
Introduce CMake support for configuring Git
Junio C Hamano [Wed, 12 Aug 2020 01:04:12 +0000 (18:04 -0700)]
Merge branch 'tb/upload-pack-filters'
The component to respond to "git fetch" request is made more
configurable to selectively allow or reject object filtering
specification used for partial cloning.
* tb/upload-pack-filters:
t5616: use test_i18ngrep for upload-pack errors
upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
upload-pack.c: allow banning certain object filter(s)
list_objects_filter_options: introduce 'list_object_filter_config_name'
Junio C Hamano [Wed, 12 Aug 2020 01:04:12 +0000 (18:04 -0700)]
Merge branch 'es/worktree-doc-cleanups'
Doc cleanup around "worktree".
* es/worktree-doc-cleanups:
git-worktree.txt: link to man pages when citing other Git commands
git-worktree.txt: make start of new sentence more obvious
git-worktree.txt: fix minor grammatical issues
git-worktree.txt: consistently use term "working tree"
git-worktree.txt: employ fixed-width typeface consistently
Junio C Hamano [Wed, 12 Aug 2020 01:04:11 +0000 (18:04 -0700)]
Merge branch 'bc/sha-256-part-3'
The final leg of SHA-256 transition.
* bc/sha-256-part-3: (39 commits)
t: remove test_oid_init in tests
docs: add documentation for extensions.objectFormat
ci: run tests with SHA-256
t: make SHA1 prerequisite depend on default hash
t: allow testing different hash algorithms via environment
t: add test_oid option to select hash algorithm
repository: enable SHA-256 support by default
setup: add support for reading extensions.objectformat
bundle: add new version for use with SHA-256
builtin/verify-pack: implement an --object-format option
http-fetch: set up git directory before parsing pack hashes
t0410: mark test with SHA1 prerequisite
t5308: make test work with SHA-256
t9700: make hash size independent
t9500: ensure that algorithm info is preserved in config
t9350: make hash size independent
t9301: make hash size independent
t9300: use $ZERO_OID instead of hard-coded object ID
t9300: abstract away SHA-1-specific constants
t8011: make hash size independent
...
Sergey Organov [Wed, 5 Aug 2020 22:08:32 +0000 (01:08 +0300)]
t/t4013: add test for --diff-merges=off
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sergey Organov [Wed, 5 Aug 2020 22:08:31 +0000 (01:08 +0300)]
doc/git-log: describe --diff-merges=off
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sergey Organov [Wed, 5 Aug 2020 22:08:30 +0000 (01:08 +0300)]
revision: change "--diff-merges" option to require parameter
--diff-merges=off is the only accepted form for now, a synonym for
--no-diff-merges.
This patch is a preparation for adding more values, as well as supporting
--diff-merges=<parent>, where <parent> is single parent number to output diff
against.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 11 Aug 2020 06:53:47 +0000 (02:53 -0400)]
t1416: avoid hard-coded sha1 ids
The test added by
e5256c82e5 (refs: fix interleaving hook calls with
reference-transaction hook, 2020-08-07) uses hard-coded sha1 object ids
in its expected output. This causes it to fail when run with
GIT_TEST_DEFAULT_HASH=sha256.
Let's make use of the oid variables we define earlier, as the rest of
the nearby tests do.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Martin Ågren [Mon, 10 Aug 2020 19:47:48 +0000 (21:47 +0200)]
progress: don't dereference before checking for NULL
In `stop_progress()`, we're careful to check that `p_progress` is
non-NULL before we dereference it, but by then we have already
dereferenced it when calling `finish_if_sparse(*p_progress)`. And, for
what it's worth, we'll go on to blindly dereference it again inside
`stop_progress_msg()`.
We could return early if we get a NULL-pointer, but let's go one step
further and BUG instead. The progress API handles NULL just fine, but
that's the NULL-ness of `*p_progress`, e.g., when running with
`--no-progress`. If `p_progress` is NULL, chances are that's a mistake.
For symmetry, let's do the same check in `stop_progress_msg()`, too.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 10 Aug 2020 17:11:52 +0000 (10:11 -0700)]
Fifth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 10 Aug 2020 17:24:04 +0000 (10:24 -0700)]
Merge branch 'pb/guide-docs'
Update "git help guides" documentation organization.
* pb/guide-docs:
git.txt: add list of guides
Documentation: don't hardcode command categories twice
help: drop usage of 'common' and 'useful' for guides
command-list.txt: add missing 'gitcredentials' and 'gitremote-helpers'
Junio C Hamano [Mon, 10 Aug 2020 17:24:03 +0000 (10:24 -0700)]
Merge branch 'so/rev-parser-errormessage-fix'
Error message fix.
* so/rev-parser-errormessage-fix:
revision: fix die() message for "--unpacked="
Junio C Hamano [Mon, 10 Aug 2020 17:24:02 +0000 (10:24 -0700)]
Merge branch 'en/eol-attrs-gotchas'
All "mergy" operations that internally use the merge-recursive
machinery should honor the merge.renormalize configuration, but
many of them didn't.
* en/eol-attrs-gotchas:
checkout: support renormalization with checkout -m <paths>
merge: make merge.renormalize work for all uses of merge machinery
t6038: remove problematic test
t6038: make tests fail for the right reason
Junio C Hamano [Mon, 10 Aug 2020 17:24:02 +0000 (10:24 -0700)]
Merge branch 'jk/compiler-fixes-and-workarounds'
Small fixes and workarounds.
* jk/compiler-fixes-and-workarounds:
revision: avoid leak when preparing bloom filter for "/"
revision: avoid out-of-bounds read/write on empty pathspec
config: work around gcc-10 -Wstringop-overflow warning
Junio C Hamano [Mon, 10 Aug 2020 17:24:02 +0000 (10:24 -0700)]
Merge branch 'ny/notes-doc-sample-update'
Doc updates.
* ny/notes-doc-sample-update:
docs: improve the example that illustrates git-notes path names
Junio C Hamano [Mon, 10 Aug 2020 17:24:01 +0000 (10:24 -0700)]
Merge branch 'es/adjust-subtree-test-for-merge-msg-update'
Adjust tests in contrib/ to the recent change to fmt-merge-msg.
* es/adjust-subtree-test-for-merge-msg-update:
Revert "contrib: subtree: adjust test to change in fmt-merge-msg"
Junio C Hamano [Mon, 10 Aug 2020 17:24:01 +0000 (10:24 -0700)]
Merge branch 'rs/bisect-oid-to-hex-fix'
Code cleanup.
* rs/bisect-oid-to-hex-fix:
bisect: use oid_to_hex_r() instead of memcpy()+oid_to_hex()
Junio C Hamano [Mon, 10 Aug 2020 17:24:00 +0000 (10:24 -0700)]
Merge branch 'en/merge-recursive-comment-fixes'
Comment fix.
* en/merge-recursive-comment-fixes:
merge-recursive: fix unclear and outright wrong comments
Junio C Hamano [Mon, 10 Aug 2020 17:23:59 +0000 (10:23 -0700)]
Merge branch 'ma/t1450-quotefix'
Test fix.
* ma/t1450-quotefix:
t1450: fix quoting of NUL byte when corrupting pack
Junio C Hamano [Mon, 10 Aug 2020 17:23:58 +0000 (10:23 -0700)]
Merge branch 'es/worktree-cleanup'
Code cleanup around "worktree" API implementation.
* es/worktree-cleanup:
worktree: retire special-case normalization of main worktree path
worktree: drop bogus and unnecessary path munging
worktree: drop unused code from get_linked_worktree()
worktree: drop pointless strbuf_release()
Junio C Hamano [Mon, 10 Aug 2020 17:23:57 +0000 (10:23 -0700)]
Merge branch 'jk/strvec'
The argv_array API is useful for not just managing argv but any
"vector" (NULL-terminated array) of strings, and has seen adoption
to a certain degree. It has been renamed to "strvec" to reduce the
barrier to adoption.
* jk/strvec:
strvec: rename struct fields
strvec: drop argv_array compatibility layer
strvec: update documention to avoid argv_array
strvec: fix indentation in renamed calls
strvec: convert remaining callers away from argv_array name
strvec: convert more callers away from argv_array name
strvec: convert builtin/ callers away from argv_array name
quote: rename sq_dequote_to_argv_array to mention strvec
strvec: rename files from argv-array to strvec
argv-array: rename to strvec
argv-array: use size_t for count and alloc
Eric Sunshine [Sun, 9 Aug 2020 17:42:09 +0000 (13:42 -0400)]
test_cmp: diagnose incorrect arguments
Under normal circumstances, if a test author misspells a filename passed
to test_cmp(), the error is quickly discovered when the test fails
unexpectedly due to test_cmp() being unable to find the file. However,
if the test is expected to fail, as with test_expect_failure(), a
misspelled filename as argument to test_cmp() will go unnoticed since
the test will indeed fail, but for the wrong reason. Make it easier for
test authors to discover such problems early by sanity-checking the
arguments to test_cmp(). To avoid penalizing all clients of test_cmp()
in the general case, only check for missing files if the comparison
fails.
While at it, make test_cmp_bin() sanity-check its arguments, as well.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Raymond E. Pasco [Sat, 8 Aug 2020 07:49:59 +0000 (03:49 -0400)]
t4140: test apply with i-t-a paths
apply --cached (as used by add -p) should accept creation and deletion
patches to intent-to-add paths in the index. apply --index, however,
should always fail because an intent-to-add path never matches the
worktree (by definition).
Based-on-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Raymond E. Pasco [Sat, 8 Aug 2020 07:49:58 +0000 (03:49 -0400)]
apply: make i-t-a entries never match worktree
By definition, an intent-to-add index entry can never match the
worktree, because worktrees have no concept of intent-to-add entries.
Therefore, "apply --index" should always fail on intent-to-add paths.
Because check_preimage() calls verify_index_match(), it already fails
for patches other than creation patches, which check_preimage() ignores.
This patch adds a check to check_preimage()'s rough equivalent for
creation patches, check_to_create().
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Aaron Lipman [Fri, 7 Aug 2020 21:58:38 +0000 (17:58 -0400)]
bisect: combine args passed to find_bisection()
Now that find_bisection() accepts multiple boolean arguments, these may
be combined into a single unsigned integer in order to declutter some of
the code in bisect.c
Also, rename the existing "flags" bitfield to "commit_flags", to
explicitly differentiate it from the new "bisect_flags" bitfield.
Based-on-patch-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Aaron Lipman [Fri, 7 Aug 2020 21:58:37 +0000 (17:58 -0400)]
bisect: introduce first-parent flag
Upon seeing a merge commit when bisecting, this option may be used to
follow only the first parent.
In detecting regressions introduced through the merging of a branch, the
merge commit will be identified as introduction of the bug and its
ancestors will be ignored.
This option is particularly useful in avoiding false positives when a
merged branch contained broken or non-buildable commits, but the merge
itself was OK.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Aaron Lipman [Fri, 7 Aug 2020 21:58:36 +0000 (17:58 -0400)]
cmd_bisect__helper: defer parsing no-checkout flag
cmd_bisect__helper() is intended as a temporary shim layer serving as an
interface for git-bisect.sh. This function and git-bisect.sh should
eventually be replaced by a C implementation, cmd_bisect(), serving as
an entrypoint for all "git bisect ..." shell commands: cmd_bisect() will
only parse the first token following "git bisect", and dispatch the
remaining args to the appropriate function ["bisect_start()",
"bisect_next()", etc.].
Thus, cmd_bisect__helper() should not be responsible for parsing flags
like --no-checkout. Instead, let the --no-checkout flag remain in the
argv array, so it may be evaluated alongside the other options already
parsed by bisect_start().
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Aaron Lipman [Fri, 7 Aug 2020 21:58:35 +0000 (17:58 -0400)]
rev-list: allow bisect and first-parent flags
Add first_parent_only parameter to find_bisection(), removing the
barrier that prevented combining the --bisect and --first-parent flags
when using git rev-list
Based-on-patch-by: Tiago Botelho <tiagonbotelho@hotmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Aaron Lipman [Fri, 7 Aug 2020 21:58:34 +0000 (17:58 -0400)]
t6030: modernize "git bisect run" tests
Enforce consistent styling for tests on "git bisect run":
- Use "write_script" to abstract away platform-specific details.
- Favor current whitespace conventions.
- While at it, change "introduced" to "added" in the comments to make
them read better.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Patrick Steinhardt [Fri, 7 Aug 2020 07:05:58 +0000 (09:05 +0200)]
refs: fix interleaving hook calls with reference-transaction hook
In order to not repeatedly search for the reference-transaction hook in
case it's getting called multiple times, we use a caching mechanism to
only call `find_hook()` once. What was missed though is that the return
value of `find_hook()` actually comes from a static strbuf, which means
it will get overwritten when calling `find_hook()` again. As a result,
we may call the wrong hook with parameters of the reference-transaction
hook.
This scenario was spotted in the wild when executing a git-push(1) with
multiple references, where there are interleaving calls to both the
update and the reference-transaction hook. While initial calls to the
reference-transaction hook work as expected, it will stop working after
the next invocation of the update hook. The result is that we now start
calling the update hook with parameters and stdin of the
reference-transaction hook.
This commit fixes the issue by storing a copy of `find_hook()`'s return
value in the cache.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 7 Aug 2020 08:56:49 +0000 (04:56 -0400)]
sideband: mark "remote error:" prefix for translation
A Git client may produce a "remote error:" message (along with whatever
error the other side sent us) in two places:
- when we see an ERR packet
- when we're using a sideband and see sideband 3
We can't reliably translate the message the other side sent us, but we
can do so for our own prefix. However, we translate only the ERR-packet
case but not the sideband-3 case. Let's make them consistent (by marking
both for translation).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 7 Aug 2020 00:25:37 +0000 (17:25 -0700)]
compat-util: type-check parameters of no-op replacement functions
When there is no need to run a specific function on certain platforms,
we often #define an empty function to swallow its parameters and
make it into a no-op, e.g.
#define precompose_argv(c,v) /* no-op */
While this guarantees that no unneeded code is generated, it also
discards type and other checks on these parameters, e.g. a new code
written with the argv-array API (diff_args is of type "struct
argv_array" that has .argc and .argv members):
precompose_argv(diff_args.argc, diff_args.argv);
must be updated to use "struct strvec diff_args" with .nr and .v
members, like so:
precompose_argv(diff_args.nr, diff_args.v);
after the argv-array API has been updated to the strvec API.
However, the "no oop" C preprocessor macro is too aggressive to
discard what is unused, and did not catch such a call that was left
unconverted.
Using a "static inline" function whose body is a no-op should still
result in the same binary with decent compilers yet catch such a
reference to a missing field or passing a value of a wrong type.
While at it, I notice that precompute_str() has never been used
anywhere in the code, since it was introduced at
76759c7d (git on
Mac OS and precomposed unicode, 2012-07-08). Instead of turning it
into a static inline, just remove it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Martin Ågren [Thu, 6 Aug 2020 20:08:54 +0000 (22:08 +0200)]
t4104: modernize and simplify quoting
Drop whitespace in the value of `$test_description` and in a test body
and use `test_write_lines`.
Stop defining `$u` with a trailing space just so that we can tuck it in
like `git foo $u$more...` and get minimal whitespace in the command:
`git foo $u $more...` is more readable at the "cost" of an empty `$u`
yielding `git foo something...`.
Finally, avoid using single quotes within the test scripts to repeatedly
close and reopen the quotes that wrap the test scripts (see the previous
commit). This "unnecessary" quoting does mean that the verbose test
output shows the interpolated values, i.e., the shell code we're
running. But the downside is that the source of the script does *not*
show the shell code we're eventually executing, leaving the reader to
reason about what we really do and whether there are any quoting issues.
(There aren't.)
Where we run through loops to generate several "identical but different"
tests, the test message contains the interpolated variables we're
looping on, meaning one can always identify exactly which instance has
failed, even if the verbose test output shows the exact same test body
several times.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Martin Ågren [Thu, 6 Aug 2020 20:08:53 +0000 (22:08 +0200)]
t: don't spuriously close and reopen quotes
In the test scripts, the recommended style is, e.g.:
test_expect_success 'name' '
do-something somehow &&
do-some-more testing
'
When using this style, any single quote in the multi-line test section
is actually closing the lone single quotes that surround it.
It can be a non-issue in practice:
test_expect_success 'sed a little' '
sed -e 's/hi/lo/' in >out # "ok": no whitespace in s/hi/lo/
'
Or it can be a bug in the test, e.g., because variable interpolation
happens before the test even begins executing:
v=abc
test_expect_success 'variable interpolation' '
v=def &&
echo '"$v"' # abc
'
Change several such in-test single quotes to use double quotes instead
or, in a few cases, drop them altogether. These were identified using
some crude grepping. We're not fixing any test bugs here, but we're
hopefully making these tests slightly easier to grok and to maintain.
There are legitimate use cases for closing a quote and opening a new
one, e.g., both '\'' and '"'"' can be used to produce a literal single
quote. I'm not touching any of those here.
In t9401, tuck the redirecting ">" to the filename while we're touching
those lines.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Raymond E. Pasco [Thu, 6 Aug 2020 18:52:43 +0000 (14:52 -0400)]
blame-options.txt: document --first-parent option
blame/annotate have supported --first-parent since commit
95a4fb0eac
("blame: handle --first-parent"). This adds a blurb on that option to
the documentation.
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Raymond E. Pasco [Thu, 6 Aug 2020 06:01:17 +0000 (02:01 -0400)]
apply: allow "new file" patches on i-t-a entries
diff-files recently changed to treat changes to paths marked "intent to
add" in the index as new file diffs rather than diffs from the empty
blob. However, apply refuses to apply new file diffs on top of existing
index entries, except in the case of renames. This causes "git add -p",
which uses apply, to fail when attempting to stage hunks from a file
when intent to add has been recorded.
This changes the logic in check_to_create() which checks if an entry
already exists in an index in two ways: first, we only search for an
index entry at all if ok_if_exists is false; second, we check for the
CE_INTENT_TO_ADD flag on any index entries we find and allow the apply
to proceed if it is set.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 5 Aug 2020 23:06:52 +0000 (16:06 -0700)]
fsck: do not lazy fetch known non-promisor object
There is a call to has_object_file(), which lazily fetches missing
objects in a partial clone, when the object is known to not be
a promisor object. Change that call to has_object(), which does not do
any lazy fetching.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 5 Aug 2020 23:06:51 +0000 (16:06 -0700)]
pack-objects: no fetch when allow-{any,promisor}
The options --missing=allow-{any,promisor} were introduced in
caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:
This patch introduces handling of missing objects to help
debugging and development of the "partial clone" mechanism,
and once the mechanism is implemented, for a power user to
perform operations that are missing-object aware without
incurring the cost of checking if a missing link is expected.
The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as
07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).
However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 5 Aug 2020 23:06:50 +0000 (16:06 -0700)]
apply: do not lazy fetch when applying binary
When applying a binary patch, as an optimization, "apply" checks if the
postimage is already present. During this fetch, it is perfectly
expected for the postimage not to be present, so there is no need to
lazy-fetch missing objects. Teach "apply" not to lazy-fetch in this
case.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 5 Aug 2020 23:06:49 +0000 (16:06 -0700)]
sha1-file: introduce no-lazy-fetch has_object()
There have been a few bugs wherein Git fetches missing objects whenever
the existence of an object is checked, even though it does not need to
perform such a fetch. To resolve these bugs, we could look at all the
places that has_object_file() (or a similar function) is used. As a
first step, introduce a new function has_object() that checks for the
existence of an object, with a default behavior of not fetching if the
object is missing and the repository is a partial clone. As we verify
each has_object_file() (or similar) usage, we can replace it with
has_object(), and we will know that we are done when we can delete
has_object_file() (and the other similar functions).
Also, the new function has_object() has more appropriate defaults:
besides not fetching, it also does not recheck packed storage.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
brian m. carlson [Thu, 6 Aug 2020 00:16:50 +0000 (00:16 +0000)]
git-cvsexportcommit: support Perl before 5.10.1
The change in
6e9c4d408d ("git-cvsexportcommit: port to SHA-256",
2020-06-22) added the use of a temporary directory for the index.
However, the form we used doesn't work in versions of Perl before
5.10.1. For example, version 5.10.0 contains a version of File::Temp
from 2007 that doesn't contain "newdir".
In order to make the code work with 5.8.8, which we support, let's
change to use the static method "tempdir" with the argument "CLEANUP",
which provides the same behavior.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 5 Aug 2020 08:42:40 +0000 (04:42 -0400)]
t5616: use test_i18ngrep for upload-pack errors
The tests added to t5616 in
6dd3456a8c (upload-pack.c: allow banning
certain object filter(s), 2020-08-03) can fail racily, but only with
GETTEXT_POISON enabled.
The tests in question look something like this:
test_must_fail ok=sigpipe git clone --filter=blob:none ... 2>err &&
grep "filter blob:none not supported' err
The remote upload-pack process writes that error message both as an ERR
packet, but also via a die() message. In theory we should see the
message twice in the "err" file. The client relays the message from the
packet to its stderr (with a "remote error:" prefix), and because this
is a local-system clone, upload-pack's stderr goes to the same place.
But because clone may be writing to the pipe when upload-pack calls
die(), it may get SIGPIPE and fail to relay the message. That's why we
need our "ok=sigpipe" trick. But our grep should still work reliably in
that case. Either:
- we got SIGPIPE on the client, which means upload-pack completed its
die(), and we'll see that version of the message.
- the client didn't get SIGPIPE, and so it successfully relays the
message.
In theory we'd see both copies of the message in the second case. But
now always! As soon as the client sees ERR, it exits and we run grep.
But we have no guarantee that the upload-pack process has exited at this
point, or even written its die() message. We might only see the client
version of the message.
Normally that's OK. We only need to see one or the other to pass the
test. But now consider GETTEXT_POISON. upload-pack doesn't translate the
die() message nor the ERR packet. But once the client receives it, it
calls:
die(_("remote error: %s"), buffer + 4);
That message _is_ marked for translation. Normally we'd just replace the
"remote error:" portion of it, but in GETTEXT_POISON mode, we replace
the whole thing with "# GETTEXT POISON #" and don't include the "%s"
part at all. So the whole text from the ERR packet is dropped, and so we
may racily see a test failure if upload-pack's die() call wasn't yet
written.
We can fix it by using test_i18ngrep, which just makes this grep a noop
in the poison mode.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Wed, 5 Aug 2020 01:19:07 +0000 (01:19 +0000)]
git.txt: add list of guides
Not all man5/man7 guides are mentioned in the 'git(1)' documentation,
which makes the missing ones somewhat hard to find.
Add a list of the guides to git(1) by leveraging the existing
`Documentation/cmd-list.perl` script to generate a file `cmds-guide.txt`
which gets included in git.txt.
Also, do not hard-code the manual section '1'. Instead, use a regex so
that the manual section is discovered from the first line of each
`git*.txt` file.
This addition was hinted at in
1b81d8cb19 (help: use command-list.txt
for the source of guides, 2018-05-20).
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 5 Aug 2020 01:19:06 +0000 (01:19 +0000)]
Documentation: don't hardcode command categories twice
Instead of hard-coding the list of command categories in both
`Documentation/Makefile` and `Documentation/cmd-list.perl`, make the
Makefile the authoritative source and tweak `cmd-list.perl` so that it
receives the list of command categories as argument.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Wed, 5 Aug 2020 01:19:05 +0000 (01:19 +0000)]
help: drop usage of 'common' and 'useful' for guides
Since
1b81d8cb19 (help: use command-list.txt for the source of guides,
2018-05-20), all man5/man7 guides listed in command-list.txt appear in
the output of 'git help -g'.
However, 'git help -g' still prefixes this list with "The common Git
guides are:", which makes one wonder if there are others!
In the same spirit, the man page for 'git help' describes the '--guides'
option as listing 'useful' guides, which is not false per se but can
also be taken to mean that there are other guides that exist but are not
useful.
Instead of 'common' and 'useful', use 'Git concept guides' in both
places. To keep the code in line with this change, rename
help.c::list_common_guides_help to list_guides_help.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Wed, 5 Aug 2020 01:19:04 +0000 (01:19 +0000)]
command-list.txt: add missing 'gitcredentials' and 'gitremote-helpers'
The guides 'gitcredentials' and 'gitremote-helpers' do not currently
appear in command-list.txt.
'gitcredentials' was forgotten back when guides were added to
command-list.txt in
1b81d8cb19 (help: use command-list.txt for the
source of guides, 2018-05-20).
'gitremote-helpers' was moved to section 7 in
439cc74632 (docs: move
gitremote-helpers into section 7, 2019-03-25), but command-list.txt was
not updated at the time.
Add these two guides to the list of guides in 'command-list.txt', so
that they appear in the output of 'git help --guides', and capitalize
the first word of the description of 'gitcredentials', as was done in
1b81d8c (help: use command-list.txt for the source of guides,
2018-05-20) for the other guides.
While at it, add a comment in Documentation/Makefile to remind developers
to update command-list.txt if they add a new guide.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sergey Organov [Tue, 4 Aug 2020 22:26:30 +0000 (01:26 +0300)]
revision: fix die() message for "--unpacked="
Get rid of the trailing dot and mark for translation.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 4 Aug 2020 20:53:36 +0000 (13:53 -0700)]
Fourth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 4 Aug 2020 20:53:58 +0000 (13:53 -0700)]
Merge branch 'jt/pretend-object-never-come-from-elsewhere'
The pretend-object mechanism checks if the given object already
exists in the object store before deciding to keep the data
in-core, but the check would have triggered lazy fetching of such
an object from a promissor remote.
* jt/pretend-object-never-come-from-elsewhere:
sha1-file: make pretend_object_file() not prefetch
Junio C Hamano [Tue, 4 Aug 2020 20:53:57 +0000 (13:53 -0700)]
Merge branch 'jt/pack-objects-prefetch-in-batch'
While packing many objects in a repository with a promissor remote,
lazily fetching missing objects from the promissor remote one by
one may be inefficient---the code now attempts to fetch all the
missing objects in batch (obviously this won't work for a lazy
clone that lazily fetches tree objects as you cannot even enumerate
what blobs are missing until you learn which trees are missing).
* jt/pack-objects-prefetch-in-batch:
pack-objects: prefetch objects to be packed
pack-objects: refactor to oid_object_info_extended
Junio C Hamano [Tue, 4 Aug 2020 20:53:56 +0000 (13:53 -0700)]
Merge branch 'mp/complete-show-color-moved'
Command line completion (in contrib/) update.
* mp/complete-show-color-moved:
completion: add show --color-moved[-ws]
Jeff King [Tue, 4 Aug 2020 07:50:17 +0000 (03:50 -0400)]
revision: avoid leak when preparing bloom filter for "/"
If we're given an empty pathspec, we refuse to set up bloom filters, as
described in
f3c2a36810 (revision: empty pathspecs should not use Bloom
filters, 2020-07-01).
But before the empty string check, we drop any trailing slash by
allocating a new string without it. So a pathspec consisting only of "/"
will allocate that string, but then still cause us to bail, leaking the
new string. Let's make sure to free it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 4 Aug 2020 07:46:52 +0000 (03:46 -0400)]
revision: avoid out-of-bounds read/write on empty pathspec
Running t4216 with ASan results in it complaining of an out-of-bounds
read in prepare_to_use_bloom_filter(). The issue is this code to strip a
trailing slash:
last_index = pi->len - 1;
if (pi->match[last_index] == '/') {
because we have no guarantee that pi->len isn't zero. This can happen if
the pathspec is ".", as we translate that to an empty string. And if
that read of random memory does trigger the conditional, we'd then do an
out-of-bounds write:
path_alloc = xstrdup(pi->match);
path_alloc[last_index] = '\0';
Let's make sure to check the length before subtracting. Note that for an
empty pathspec, we'd end up bailing from the function a few lines later,
which makes it tempting to just:
if (!pi->len)
return;
early here. But our code here is stripping a trailing slash, and we need
to check for emptiness after stripping that slash, too. So we'd have two
blocks, which would require repeating some cleanup code.
Instead, just skip the trailing-slash for an empty string. Setting
last_index at all in the case is awkward since it will have a nonsense
value (and it uses an "int", which is a too-small type for a string
anyway). So while we're here, let's:
- drop last_index entirely; it's only used in two spots right next to
each other and writing out "pi->len - 1" in both is actually easier
to follow
- use xmemdupz() to duplicate the string. This is slightly more
efficient, but more importantly makes the intent more clear by
allocating the correct-sized substring in the first place. It also
eliminates any question of whether path_alloc is as long as
pi->match (which it would not be if pi->match has any embedded NULs,
though in practice this is probably impossible).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 4 Aug 2020 07:43:53 +0000 (03:43 -0400)]
config: work around gcc-10 -Wstringop-overflow warning
Compiling with gcc-10, -O2, and -fsanitize=undefined results in a
compiler warning:
config.c: In function ‘git_config_copy_or_rename_section_in_file’:
config.c:3170:17: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
3170 | output[0] = '\t';
| ~~~~~~~~~~^~~~~~
config.c:3076:7: note: at offset -1 to object ‘buf’ with size 1024 declared here
3076 | char buf[1024];
| ^~~
This is a false positive. The interesting lines of code are:
int i;
char *output = buf;
...
for (i = 0; buf[i] && isspace(buf[i]); i++)
; /* do nothing */
...
int offset;
offset = section_name_match(&buf[i], old_name);
if (offset > 0) {
...
output += offset + i;
if (strlen(output) > 0) {
/*
* More content means there's
* a declaration to put on the
* next line; indent with a
* tab
*/
output -= 1;
output[0] = '\t';
}
}
So we do assign output to buf initially. Later we increment it based on
"offset" and "i" and then subtract "1" from it. That latter step is what
the compiler is complaining about; it could lead to going off the left
side of the array if "output == buf" at the moment of the subtraction.
For that to be the case, then "offset + i" would have to be 0. But that
can't happen:
- we know that "offset" is at least 1, since we're in a conditional
block that checks that
- we know that "i" is not negative, since it started at 0 and only
incremented over whitespace
So the sum must be at least 1, and therefore it's OK to subtract one
from "output".
But that's not quite the whole story. Since "i" is an int, it could in
theory be possible to overflow to negative (when counting whitespace on
a very large string). But we know that's impossible because we're
counting the 1024-byte buffer we just fed to fgets(), so it can never be
larger than that.
Switching the type of "i" to "unsigned" makes the warning go away, so
let's do that.
Arguably size_t is an even better type (for this and for the other
length fields), but switching to it produces a similar but distinct
warning:
config.c: In function ‘git_config_copy_or_rename_section_in_file’:
config.c:3170:13: error: array subscript -1 is outside array bounds of ‘char[1024]’ [-Werror=array-bounds]
3170 | output[0] = '\t';
| ~~~~~~^~~
config.c:3076:7: note: while referencing ‘buf’
3076 | char buf[1024];
| ^~~
If we were to ever switch off of fgets() to strbuf_getline() or similar,
we'd probably need to use size_t to avoid other overflow problems. But
for now we know we're safe because of the small fixed size of our
buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eric Sunshine [Tue, 4 Aug 2020 00:55:35 +0000 (20:55 -0400)]
git-worktree.txt: link to man pages when citing other Git commands
When citing other Git commands, rather than merely formatting them with
a fixed-width typeface, improve the reader experience by linking to them
directly via `linkgit:`.
Suggested-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>