builtin/pack-objects.c: avoid iterating all refs
authorJacob Vosmaer <jacob@gitlab.com>
Wed, 20 Jan 2021 12:45:14 +0000 (13:45 +0100)
committerJunio C Hamano <gitster@pobox.com>
Sat, 23 Jan 2021 01:27:42 +0000 (17:27 -0800)
commitbe18153b975844f8792b03e337f1a4c86fe87531
treea3afd5d4480bac8efd6065542328bf126dd0e9b4
parent66e871b6647ffea61a77a0f82c7ef3415f1ee79c
builtin/pack-objects.c: avoid iterating all refs

In git-pack-objects, we iterate over all the tags if the --include-tag
option is passed on the command line. For some reason this uses
for_each_ref which is expensive if the repo has many refs. We should
use for_each_tag_ref instead.

Because the add_ref_tag callback will now only visit tags we
simplified it a bit.

The motivation for this change is that we observed performance issues
with a repository on gitlab.com that has 500,000 refs but only 2,000
tags. The fetch traffic on that repo is dominated by CI, and when we
changed CI to fetch with 'git fetch --no-tags' we saw a dramatic
change in the CPU profile of git-pack-objects. This lead us to this
particular ref walk. More details in:
https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598

Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/pack-objects.c