Rebase performance

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Rebase performance

Christian Couder-2
Hi,

Using GIT_TRACE_PERFORMANCE it looks like a lot of time in a regular
rebase is spent in run_apply() in builtin/am.c. This function first
sets up a 'struct child_process cp' to launch "git apply" on a patch
and then uses run_command(&cp) to actually launch the "git apply".
Then this function calls discard_cache() and read_cache_from() to get
the index created by the "git apply".

On a Linux server with many CPUs and many cores on each CPU, it is
strange because the same rebase of 13 commits on a big repo is
significantly slower than on a laptop (typically around 9 seconds
versus 6 seconds). Both the server and the laptop have that has SSD
storage.

It appears that the server is trying to run the "git apply" processes
on different cores or cpus perhaps to try to spread the load on many
of its cores. Anyway adding something like "taskset -c 7" in front of
the "git rebase..." command, when launching it on the server, speeds
it up, so that it takes around the same amount of time as it does on
the laptop (6 seconds). "taskset -c 7" is just asking Linux to run a
process and its children on core number 7, and it appears that doing
that results in a much better cpu (or core) cache usage which explains
the speed up.

If there was a config option called maybe "rebase.taskset" or
"rebase.setcpuaffinity" that could be set to ask the OS for all the
rebase child processes to be run on the same core, people who run many
rebases on big repos on big servers as we do at Booking.com could
easily benefit from a nice speed up.

Technically the option may make git-rebase--am.sh call "git am" using
"taskset" (if taskset is available on the current OS).

Another possibility would be to libify the "git apply" functionality
and then to use the libified "git apply" in run_apply() instead of
launching a separate "git apply" process. One benefit from this is
that we could probably get rid of the read_cache_from() call at the
end of run_apply() and this would likely further speed up things. Also
avoiding to launch separate processes might be a win especially on
Windows.

Suggestions?

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Jacob Keller
On Wed, Feb 24, 2016 at 2:09 PM, Christian Couder
<[hidden email]> wrote:
> Hi,
> Another possibility would be to libify the "git apply" functionality
> and then to use the libified "git apply" in run_apply() instead of
> launching a separate "git apply" process. One benefit from this is
> that we could probably get rid of the read_cache_from() call at the
> end of run_apply() and this would likely further speed up things. Also
> avoiding to launch separate processes might be a win especially on
> Windows.
>

This is the route I would go, since the addition of a taskset option
seems pretty difficult to get correct, and may not be portable. This
above solution likely improves more in general, and is more portable.
Not exactly sure how easy it would be to "libify" the required bits of
apply, however.. it may be that this is already available as well but
we just didn't go that route during the port of git-am into C code.

Regards,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Stefan Beller-4
On Wed, Feb 24, 2016 at 4:15 PM, Jacob Keller <[hidden email]> wrote:

> On Wed, Feb 24, 2016 at 2:09 PM, Christian Couder
> <[hidden email]> wrote:
>> Hi,
>> Another possibility would be to libify the "git apply" functionality
>> and then to use the libified "git apply" in run_apply() instead of
>> launching a separate "git apply" process. One benefit from this is
>> that we could probably get rid of the read_cache_from() call at the
>> end of run_apply() and this would likely further speed up things. Also
>> avoiding to launch separate processes might be a win especially on
>> Windows.
>>
>
> This is the route I would go, since the addition of a taskset option
> seems pretty difficult to get correct, and may not be portable. This
> above solution likely improves more in general, and is more portable.
> Not exactly sure how easy it would be to "libify" the required bits of
> apply, however.. it may be that this is already available as well but
> we just didn't go that route during the port of git-am into C code.
>
> Regards,
> Jake

IIRC Junio started libifying am after Paul Tan rewrote it in C,
See origin/jc/am-mailinfo-direct (tip at 4b98bae2cbc6b).

That part however only libifyies the mailinfo which is used
by apply (one layer below), so I am not sure if the run_apply
has been attempted to libify.

I would also encourage to rather try to not call out to a child
(libifying) instead of adding the taskset hack for servers.

Thanks,
Stefan

> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Duy Nguyen
In reply to this post by Christian Couder-2
On Thu, Feb 25, 2016 at 5:09 AM, Christian Couder
<[hidden email]> wrote:
> Another possibility would be to libify the "git apply" functionality
> and then to use the libified "git apply" in run_apply() instead of
> launching a separate "git apply" process. One benefit from this is
> that we could probably get rid of the read_cache_from() call at the
> end of run_apply() and this would likely further speed up things. Also
> avoiding to launch separate processes might be a win especially on
> Windows.

The amount of global variables in apply.c is just scary. Libification
will need some cleanup first (i'm not against it though). Out of
curiosity, how long does it take to do "git update-index <one modified
path>" on this repo? That would cover read_cache_from() and
write_cache(). While you're measuring, maybe sprinkle some
trace_performance() in apply.c:apply_patch() to get an idea where time
is most spent in? This is a big guess, but if we spend more time
parsing/validating the patch than applying, then that part could be
parallelized, we only need to apply patches in sequence.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Junio C Hamano
Duy Nguyen <[hidden email]> writes:

> On Thu, Feb 25, 2016 at 5:09 AM, Christian Couder
> <[hidden email]> wrote:
>> Another possibility would be to libify the "git apply" functionality
>> and then to use the libified "git apply" in run_apply() instead of
>> launching a separate "git apply" process. One benefit from this is
>> that we could probably get rid of the read_cache_from() call at the
>> end of run_apply() and this would likely further speed up things. Also
>> avoiding to launch separate processes might be a win especially on
>> Windows.
>
> The amount of global variables in apply.c is just scary. Libification
> will need some cleanup first (i'm not against it though).

The global variables do not scare me.  You can just throw them into
a single "apply_state" structure and pass it around the callchain
just fine--the helper functions in the file all take only a small
number of parameters, and a new parameter that is a pointer to the
state structure that consistently comes at the beginning would not
make things harder to read or maintain.

What does scare me (and what you should be scared) more is the way
the current code handles erroneous input.  It was one of the
programs written in early days with "just do one thing well and
exit, the larger program structure will be scripted around us"
mentality and liberally die() without cleaning things up.  I do
agree with others that libification of it is a very good thing, but
the second step (the first step is the easier "global to state
structure" one, which should be more or less mechanical) towards it
is to design how the errors are handled and resources are released.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Duy Nguyen
On Thu, Feb 25, 2016 at 10:02 AM, Junio C Hamano <[hidden email]> wrote:

> Duy Nguyen <[hidden email]> writes:
>
>> On Thu, Feb 25, 2016 at 5:09 AM, Christian Couder
>> <[hidden email]> wrote:
>>> Another possibility would be to libify the "git apply" functionality
>>> and then to use the libified "git apply" in run_apply() instead of
>>> launching a separate "git apply" process. One benefit from this is
>>> that we could probably get rid of the read_cache_from() call at the
>>> end of run_apply() and this would likely further speed up things. Also
>>> avoiding to launch separate processes might be a win especially on
>>> Windows.
>>
>> The amount of global variables in apply.c is just scary. Libification
>> will need some cleanup first (i'm not against it though).
>
> The global variables do not scare me.  You can just throw them into
> a single "apply_state" structure and pass it around the callchain
> just fine--the helper functions in the file all take only a small
> number of parameters, and a new parameter that is a pointer to the
> state structure that consistently comes at the beginning would not
> make things harder to read or maintain.

There are a bunch of shadow variables in this file. If you are not
careful, it's easy to mistake local var "x" with global "x" and
replace it with global_state->x.

> What does scare me (and what you should be scared) more is the way
> the current code handles erroneous input.  It was one of the
> programs written in early days with "just do one thing well and
> exit, the larger program structure will be scripted around us"
> mentality and liberally die() without cleaning things up.  I do
> agree with others that libification of it is a very good thing, but
> the second step (the first step is the easier "global to state
> structure" one, which should be more or less mechanical) towards it
> is to design how the errors are handled and resources are released.

Yeah.. thorough study of apply code may be needed before anybody does
any surgery in this code
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Duy Nguyen
In reply to this post by Duy Nguyen
On Thu, Feb 25, 2016 at 7:50 AM, Duy Nguyen <[hidden email]> wrote:

> On Thu, Feb 25, 2016 at 5:09 AM, Christian Couder
> <[hidden email]> wrote:
>> Another possibility would be to libify the "git apply" functionality
>> and then to use the libified "git apply" in run_apply() instead of
>> launching a separate "git apply" process. One benefit from this is
>> that we could probably get rid of the read_cache_from() call at the
>> end of run_apply() and this would likely further speed up things. Also
>> avoiding to launch separate processes might be a win especially on
>> Windows.
>
> The amount of global variables in apply.c is just scary. Libification
> will need some cleanup first (i'm not against it though). Out of
> curiosity, how long does it take to do "git update-index <one modified
> path>" on this repo? That would cover read_cache_from() and
> write_cache(). While you're measuring, maybe sprinkle some
> trace_performance() in apply.c:apply_patch() to get an idea where time
> is most spent in?

I did some experiment. The calls from am are basically

for i in $PATCHES; do git apply --cached $i; git commit; done

and we can approximate the libification version of that with

git apply --cached $PATCHES

(I hacked git-apply to do write-tree after each patch, close enough to
git-commit).

I tried it on my shallow-deepen series, 26 patches. The "for; do
git-apply;done" command took 0m0.482s (real's time), taskset does not
affect much for me, while the "libification" took just  0m0.105s.
That's a very impressive number to aim for, and git.git is a small
repo.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Ævar Arnfjörð Bjarmason
In reply to this post by Christian Couder-2
On Wed, Feb 24, 2016 at 11:09 PM, Christian Couder
<[hidden email]> wrote:

[Resent because I was accidentally in GMail's HTML mode and the ML rejected it]

> If there was a config option called maybe "rebase.taskset" or
> "rebase.setcpuaffinity" that could be set to ask the OS for all the
> rebase child processes to be run on the same core, people who run many
> rebases on big repos on big servers as we do at Booking.com could
> easily benefit from a nice speed up.
>
> Technically the option may make git-rebase--am.sh call "git am" using
> "taskset" (if taskset is available on the current OS).

I think aside from issues with git-apply this would be an interesting
feature to have in git. I.e. some general facility to intercept
commands and inject a prefix command in front of them, whether that's
taskset, nice/ionice, strace etc.

> Another possibility would be to libify the "git apply" functionality
> and then to use the libified "git apply" in run_apply() instead of
> launching a separate "git apply" process. One benefit from this is
> that we could probably get rid of the read_cache_from() call at the
> end of run_apply() and this would likely further speed up things. Also
> avoiding to launch separate processes might be a win especially on
> Windows.

Yeah that should help in this particular case and make the taskset
redundant since the whole sequence of operations would all be on one
core, right?

At the risk of derailing this thread, a thing that would make rebase
even faster I think would be to change it so that instead of applying
a patch at a time to the working tree the whole operation takes place
on temporary trees & commits and then we'll eventually move the branch
pointer to that once it's finished.

I.e. there's no reason for why a sequence of 1000 patches where a
FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably
slower than applying the same changes with git-fast-import.

Of course this would require a lot of nuances, e.g. if there's a
conflict we'd need to change the working tree & index as we do now
before continuing.

Has anyone looked into some advanced refactoring of the rebase process
that would work like this, or has some feedback on why this would be
dumb or that there's a better way to do it?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Matthieu Moy-2
Ævar Arnfjörð Bjarmason <[hidden email]> writes:

> At the risk of derailing this thread, a thing that would make rebase
> even faster I think would be to change it so that instead of applying
> a patch at a time to the working tree the whole operation takes place
> on temporary trees & commits and then we'll eventually move the branch
> pointer to that once it's finished.
>
> I.e. there's no reason for why a sequence of 1000 patches where a
> FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably
> slower than applying the same changes with git-fast-import.

Also, not touching the worktree during rebase would have the advantage
that if the final result doesn't change a file, we wouldn't need to
touch this file at all, hence the next "make" (or whatever
timestamp-using build system the user runs) would consider this file
unchanged.

--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Johannes Schindelin
Hi Matthieu,

On Thu, 25 Feb 2016, Matthieu Moy wrote:

> Ævar Arnfjörð Bjarmason <[hidden email]> writes:
>
> > At the risk of derailing this thread, a thing that would make rebase
> > even faster I think would be to change it so that instead of applying
> > a patch at a time to the working tree the whole operation takes place
> > on temporary trees & commits and then we'll eventually move the branch
> > pointer to that once it's finished.
> >
> > I.e. there's no reason for why a sequence of 1000 patches where a
> > FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably
> > slower than applying the same changes with git-fast-import.
>
> Also, not touching the worktree during rebase would have the advantage
> that if the final result doesn't change a file, we wouldn't need to
> touch this file at all, hence the next "make" (or whatever
> timestamp-using build system the user runs) would consider this file
> unchanged.
We still have to write all blobs. So I would still expect this to be I/O
bound.

Ciao,
Dscho
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Stefan Beller-4
On Fri, Feb 26, 2016 at 7:45 AM, Johannes Schindelin
<[hidden email]> wrote:

> Hi Matthieu,
>
> On Thu, 25 Feb 2016, Matthieu Moy wrote:
>
>> Ævar Arnfjörð Bjarmason <[hidden email]> writes:
>>
>> > At the risk of derailing this thread, a thing that would make rebase
>> > even faster I think would be to change it so that instead of applying
>> > a patch at a time to the working tree the whole operation takes place
>> > on temporary trees & commits and then we'll eventually move the branch
>> > pointer to that once it's finished.
>> >
>> > I.e. there's no reason for why a sequence of 1000 patches where a
>> > FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably
>> > slower than applying the same changes with git-fast-import.
>>
>> Also, not touching the worktree during rebase would have the advantage
>> that if the final result doesn't change a file, we wouldn't need to
>> touch this file at all, hence the next "make" (or whatever
>> timestamp-using build system the user runs) would consider this file
>> unchanged.
>
> We still have to write all blobs. So I would still expect this to be I/O
> bound.

But if there is an IO bound process, the only way to make it faster
is to reduce its IO, which was the proposal here? I agree that it probably
is not enough to shift it from being IO bound to say CPU bounded.

Thanks,
Stefan

>
> Ciao,
> Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Christian Couder-2
In reply to this post by Duy Nguyen
On Thu, Feb 25, 2016 at 10:42 AM, Duy Nguyen <[hidden email]> wrote:

> On Thu, Feb 25, 2016 at 7:50 AM, Duy Nguyen <[hidden email]> wrote:
>> On Thu, Feb 25, 2016 at 5:09 AM, Christian Couder
>> <[hidden email]> wrote:
>>> Another possibility would be to libify the "git apply" functionality
>>> and then to use the libified "git apply" in run_apply() instead of
>>> launching a separate "git apply" process. One benefit from this is
>>> that we could probably get rid of the read_cache_from() call at the
>>> end of run_apply() and this would likely further speed up things. Also
>>> avoiding to launch separate processes might be a win especially on
>>> Windows.
>>
>> The amount of global variables in apply.c is just scary. Libification
>> will need some cleanup first (i'm not against it though). Out of
>> curiosity, how long does it take to do "git update-index <one modified
>> path>" on this repo? That would cover read_cache_from() and
>> write_cache().

Sure I get:

$ time git update-index <README

real    0m0.079s
user    0m0.057s
sys     0m0.022s

>> While you're measuring, maybe sprinkle some
>> trace_performance() in apply.c:apply_patch() to get an idea where time
>> is most spent in?

Each call to read_cache() takes around 0.07 seconds and each call to
check_patch_list() takes around 0.045 seconds, the rest of
apply_patch() is not significant.

> I did some experiment.

Thanks for those experiments. They are very interesting!

> The calls from am are basically
>
> for i in $PATCHES; do git apply --cached $i; git commit; done
>
> and we can approximate the libification version of that with
>
> git apply --cached $PATCHES
>
> (I hacked git-apply to do write-tree after each patch, close enough to
> git-commit).
>
> I tried it on my shallow-deepen series, 26 patches. The "for; do
> git-apply;done" command took 0m0.482s (real's time), taskset does not
> affect much for me, while the "libification" took just  0m0.105s.
> That's a very impressive number to aim for, and git.git is a small
> repo.

Yeah, from my tests it also looks like there is a lot that could be
gained from the libification.

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Rebase performance

Christian Couder-2
In reply to this post by Ævar Arnfjörð Bjarmason
On Thu, Feb 25, 2016 at 5:31 PM, Ævar Arnfjörð Bjarmason
<[hidden email]> wrote:

> On Wed, Feb 24, 2016 at 11:09 PM, Christian Couder
> <[hidden email]> wrote:
>
> [Resent because I was accidentally in GMail's HTML mode and the ML rejected it]
>
>> If there was a config option called maybe "rebase.taskset" or
>> "rebase.setcpuaffinity" that could be set to ask the OS for all the
>> rebase child processes to be run on the same core, people who run many
>> rebases on big repos on big servers as we do at Booking.com could
>> easily benefit from a nice speed up.
>>
>> Technically the option may make git-rebase--am.sh call "git am" using
>> "taskset" (if taskset is available on the current OS).
>
> I think aside from issues with git-apply this would be an interesting
> feature to have in git. I.e. some general facility to intercept
> commands and inject a prefix command in front of them, whether that's
> taskset, nice/ionice, strace etc.

I started to take a look at that. It could be done in git.c but would
be a bit involved as we would have to check the config option or the
env variable that describe the prefix command early and exec the
prefixed command while disabling the config option or the env
variable. I wonder if people would prefer an env variable or a config
option by the way.

>> Another possibility would be to libify the "git apply" functionality
>> and then to use the libified "git apply" in run_apply() instead of
>> launching a separate "git apply" process. One benefit from this is
>> that we could probably get rid of the read_cache_from() call at the
>> end of run_apply() and this would likely further speed up things. Also
>> avoiding to launch separate processes might be a win especially on
>> Windows.
>
> Yeah that should help in this particular case and make the taskset
> redundant since the whole sequence of operations would all be on one
> core, right?

Yeah, all the applying would happen in one process, so hopefully it
would not be moved often to another core or cpu.

> At the risk of derailing this thread, a thing that would make rebase
> even faster I think would be to change it so that instead of applying
> a patch at a time to the working tree the whole operation takes place
> on temporary trees & commits and then we'll eventually move the branch
> pointer to that once it's finished.
>
> I.e. there's no reason for why a sequence of 1000 patches where a
> FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably
> slower than applying the same changes with git-fast-import.
>
> Of course this would require a lot of nuances, e.g. if there's a
> conflict we'd need to change the working tree & index as we do now
> before continuing.
>
> Has anyone looked into some advanced refactoring of the rebase process
> that would work like this, or has some feedback on why this would be
> dumb or that there's a better way to do it?

My opinion is that such advanced refactoring can happen on top of
libifying the "git apply" functionality, so I started to work on that.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html