Dividing up a large merge.

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Dividing up a large merge.

davidb-3
I'm trying to figure out a better way of dividing up the effort
involved in a merge amongst a group of people.  Right now, I
basically describe the merge to each of them, and ask them to
merge their part, and then 'git checkout HEAD' the other parts.
They tell me about the commits, along with the files that they've
merged correctly.  When everybody is done, I make a real merge
commit, and pull in all of their files.  It's a lot for me to
track, and confusing for each person.

I'd like to create a branch we can all push to that we gradually
work to become the result of a resolved merge.  Not only does git
not want to help me do the merge, but seems to actively be
fighting against me doing this.

What I thought of was something like telling people to do:

  $ git merge v2.6.30
  resolve some files
  $ git checkout HEAD ...rest of files...
  $ git commit; git push

but that 'rest of files' is fairly large and complicated.  I can
think of two ideas:

  - Something that basically does a partial 'git reset --hard
    HEAD' to put many of the files back.

  - The ability to specify subpaths on the 'git merge' to do the
    merge work but limited to a directory or set of files.

Obviously, either case will require someone to still track the
overall effort and make sure the final state of the tree really
represents the total merge.

Is there anything that can parse the output of 'git merge-tree'?
Even just splitting this up and then applying parts of it would
be helpful.  Would it be useful to write something that can apply
the results output of 'git merge-tree'?

Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

Bryan Donlan
On Tue, Jul 14, 2009 at 7:32 PM, <[hidden email]> wrote:
> I'm trying to figure out a better way of dividing up the effort
> involved in a merge amongst a group of people.  Right now, I
> basically describe the merge to each of them, and ask them to
> merge their part, and then 'git checkout HEAD' the other parts.
> They tell me about the commits, along with the files that they've
> merged correctly.  When everybody is done, I make a real merge
> commit, and pull in all of their files.  It's a lot for me to
> track, and confusing for each person.

What do you mean by describing a merge? git is designed to have all
the information needed for a merge inherent in the repository history.

> I'd like to create a branch we can all push to that we gradually
> work to become the result of a resolved merge.  Not only does git
> not want to help me do the merge, but seems to actively be
> fighting against me doing this.
>
> What I thought of was something like telling people to do:
>
>  $ git merge v2.6.30
>  resolve some files
>  $ git checkout HEAD ...rest of files...
>  $ git commit; git push
>
> but that 'rest of files' is fairly large and complicated.  I can
> think of two ideas:
>
>  - Something that basically does a partial 'git reset --hard
>    HEAD' to put many of the files back.
>
>  - The ability to specify subpaths on the 'git merge' to do the
>    merge work but limited to a directory or set of files.
>
> Obviously, either case will require someone to still track the
> overall effort and make sure the final state of the tree really
> represents the total merge.
>
> Is there anything that can parse the output of 'git merge-tree'?
> Even just splitting this up and then applying parts of it would
> be helpful.  Would it be useful to write something that can apply
> the results output of 'git merge-tree'?

I'm having a hard time understanding the situation here - why can't you just:
$ git checkout -b mergebranch v2.6.30
$ git merge developer1/topic
# Fix conflicts
$ git merge developer2/topic
# Fix conflicts
# etc

Why are there so many conflicts to make this an issue?

If the commits are isolated to small changes, rebasing the developer
topic branches instead of merging may help, by allowing you to take
conflicts one commit at a time. For example, if your problems are
primarily conflicts between developer branches and upstream:

$ git checkout -b mergebranch-dev1 developer1/topic
$ git rebase v2.6.30
# Fix conflicts on a commit-by-commit basis
$ git checkout -b mergebranch-dev2 developer2/topic
$ git rebase v2.6.30
# Fix conflicts on a commit-by-commit basis
$ git checkout -b mergebranch
$ git merge mergebranch-dev1
# Fix any remaining conflicts

If your problems are because of conflicts between developer branches
and each other:
$ git checkout -b mergebranch-dev1 developer1/topic
$ git rebase v2.6.30
# Fix conflicts on a commit-by-commit basis
$ git checkout -b mergebranch-dev2 developer2/topic
$ git rebase mergebranch-dev1
# Fix conflicts on a commit-by-commit basis

These rebasing approaches will change the commit IDs, so your
developers will need to rebase any further work upon these new commit
IDs, but if things are as bad as you say, a commit-by-commit merge
that rebase allows you may be much simpler.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

davidb-3
On Tue, Jul 14, 2009 at 05:16:54PM -0700, Bryan Donlan wrote:

> What do you mean by describing a merge? git is designed to have all
> the information needed for a merge inherent in the repository history.

Yes, provided you can actually do the merge all at once.

> Why are there so many conflicts to make this an issue?

Because I have to work in the "real world".

> If the commits are isolated to small changes, rebasing the developer
> topic branches instead of merging may help, by allowing you to take
> conflicts one commit at a time. For example, if your problems are
> primarily conflicts between developer branches and upstream:

No real developer branches with conflicts (I make those be
fixed), but several upstreams.  We have many developers busily
doing work, and one or more other companies is also working on
the same code.  Meanwhile, the mainline kernel advances at it's
own astounding rate.

Unfortunately, paying customers will always get priority of work,
even when that position is actually somewhat shortsighted and it
makes for a lot of merge effort later.

The real issue is that there isn't any single individual who
understands all of the code that conflicts.  It has to be divided
up somehow, I'm just trying to figure out a better way of doing
it.

Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

Avery Pennarun
On Tue, Jul 14, 2009 at 8:29 PM, <[hidden email]> wrote:
> The real issue is that there isn't any single individual who
> understands all of the code that conflicts.  It has to be divided
> up somehow, I'm just trying to figure out a better way of doing
> it.

How about having one person do the merge, then commit it (including
conflict markers), then have other people just make a series of
commits removing the conflict markers?

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

davidb-3
On Tue, Jul 14, 2009 at 05:34:26PM -0700, Avery Pennarun wrote:

> How about having one person do the merge, then commit it (including
> conflict markers), then have other people just make a series of
> commits removing the conflict markers?

I guess this helps in some sense, but the intermediate result
isn't going to build, and things like mergetool aren't going to
work.  It's helpful for the individuals to have the full merge
conflict available, or at least the stages of the files in
question.

David
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

Douglas Campos
Merging the peer branches before doesn't help it?

On Tue, Jul 14, 2009 at 10:19 PM, <[hidden email]> wrote:

> On Tue, Jul 14, 2009 at 05:34:26PM -0700, Avery Pennarun wrote:
>
>> How about having one person do the merge, then commit it (including
>> conflict markers), then have other people just make a series of
>> commits removing the conflict markers?
>
> I guess this helps in some sense, but the intermediate result
> isn't going to build, and things like mergetool aren't going to
> work.  It's helpful for the individuals to have the full merge
> conflict available, or at least the stages of the files in
> question.
>
> David
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



--
Douglas Campos
Theros Consulting
+55 11 7626 5959
+55 11 3020 8168
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

Avery Pennarun
In reply to this post by davidb-3
On Tue, Jul 14, 2009 at 9:19 PM, <[hidden email]> wrote:

> On Tue, Jul 14, 2009 at 05:34:26PM -0700, Avery Pennarun wrote:
>> How about having one person do the merge, then commit it (including
>> conflict markers), then have other people just make a series of
>> commits removing the conflict markers?
>
> I guess this helps in some sense, but the intermediate result
> isn't going to build, and things like mergetool aren't going to
> work.  It's helpful for the individuals to have the full merge
> conflict available, or at least the stages of the files in
> question.

It sounds like you're going in circles a bit here.  You want the full
merge conflict available - but you want it to be able to build.

It sounds like the "git reset the unwanted subdirs" solution suggested
earlier is the only option that will really work.  You could simplify
life for your co-workers by writing a script to automate the steps, I
suppose.

You probably want all the individuals to use merge --squash, so that
you don't mark the history as merged until you're done.  Then you
combine all their work at the end and mark the commit as done using
'git merge -s ours'.

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

Ted Ts'o
In reply to this post by davidb-3
On Tue, Jul 14, 2009 at 05:29:26PM -0700, [hidden email] wrote:
> No real developer branches with conflicts (I make those be
> fixed), but several upstreams.  We have many developers busily
> doing work, and one or more other companies is also working on
> the same code.  Meanwhile, the mainline kernel advances at it's
> own astounding rate.

If you hare maintaining a large number of changes over a long-term
(which in the case of the kernel can be measured in a month or two),
it's often much easier to maintain things as a series of patches.

That way you can merge each patch one at a time.

If you already have everything in a git tree, I'd suggest pulling it
apart into separate patches, by using "git format-patch".  Note that
if you have multiple merges into tree, this will go much more smoothly
if you can separate things into a single linear stream.

This is also a good reason why if you have partial work that is
complete enough to be merged into mainline, it is ***much*** better to
try pushing patches to mainline earlier rather than later.  Waiting
until you are 100% done and the work is completely certified involves
a large number of risks; for example, what if people complain about
work that was done early on?  Or if the design was fundamentally
flawed from the get-go?  At the minimum, you will save a huge amount
of effort if you post a request-for-comment version of the patches up
front.

And, if you believe your release cycle is going to run for more than,
say, 2-3 months, I suggest that you keep things in a single linear
patch stream.  You can keep the patch series under git control, and
then rebase periodically; I'd suggest rebasing once a mainline release
happens (i.e., when 2.6.X is released), and then again after most of
the major changes have been merged in and the tree has settled down
(i.e., after 2.6.X-rc2 or 2.6.X-rc3).

> The real issue is that there isn't any single individual who
> understands all of the code that conflicts.  It has to be divided
> up somehow, I'm just trying to figure out a better way of doing
> it.

Yeah, that's another prime argument for maintaining your changes as a
patch queue.  I use a combination of quilt plus git.  So the rebasing
methodology becomes:

# pop all patches
guilt pop -a
# update the base of the patches
git pull origin
# start trying to apply each of the patches, one at a time
# next_patch:
guilt push -a
# when you get a failure, the push will stop and tell you it can't
# apply a patch; so force apply the patch:
guilt push -f
#
# this will leave some patch .rej files; resolve the patch failures
# for all of the files.    Use "git add" once the patches have been resolved
# also make sure that any files that were added by the patch that was
# force applied are also manually marked as needing added using "git add".
# Once you are sure the patch is properly merged, do this:
guilt refresh --diffstat
# Check the changes made to the patch; I normally create a symlink from
# .git/patches/<work-branch-for-quilt> to patches in the top level, i.e.
# "ln -s .git/patches/master patches"; if you can't remember the name of the
# patch, you can get it via the command "guilt applied | tail -1"
(cd patches; git diff name-of-patch)
# now repeat with the next set of patches by going back to next_patch, above

I normally keep an indication of the version that the patch series is
based upon via a comment in the first line of the series file, like
this: "# BASE v2.6.30-rc3" or sometimes like this "# BASE 6ab2792".
This can be useful when creating automated scripts to test the patch
series, since they know what version to apply the patches against.

In your case, the first person to start the rebase should change the
"# BASE" comment, and then apply those patches which he/she is most
familiar with.  When you hit a point where you need someone else's
expertise, you can do a "(cd patches; git commit -a)" to commit all of
the changes in the patch queue so far, and then let someone else take
over.  

They would then do:

# Pop all of the patches off the next developers work directory
guilt pop -a
# Update the patch queue
(cd patches; guilt pull)
# Now we need to make sure we have the latest kernel patches from mainline
git fetch
# Now update the work directory to the version specified by the patch
# series file
git merge $(head patches/series | sed -e 's/# BASE //')
# Now resume trying to apply patches, one at a time...
# next_patch
guilt push -a
# if there is a failed patch, force apply it and resolve patch rejects
guilt push -f
# refresh the patch
guilt refresh --diffstat
# .... and so on

My biggest suggestion, though, is to try to merge partial work earlier
rather than later.  I'd try getting a partially functioning device
driver merged first, and then try to get the optimizations applied
earlier.  If you don't want people using it in production, that's what
the EXPERIMENTAL tag is for...


                                                - Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

Jakub Narębski
Theodore Tso <[hidden email]> writes:

> Yeah, that's another prime argument for maintaining your changes as a
> patch queue.  I use a combination of quilt plus git.

Why not StGit, or Guilt, or TopGit?

--
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

Larry D'Anna
In reply to this post by Ted Ts'o
* Theodore Tso ([hidden email]) [090715 08:28]:
> And, if you believe your release cycle is going to run for more than,
> say, 2-3 months, I suggest that you keep things in a single linear
> patch stream.  You can keep the patch series under git control, and
> then rebase periodically; I'd suggest rebasing once a mainline release
> happens (i.e., when 2.6.X is released), and then again after most of
> the major changes have been merged in and the tree has settled down
> (i.e., after 2.6.X-rc2 or 2.6.X-rc3).

or use TopGit

http://repo.or.cz/w/topgit.git

        --larry


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

Ted Ts'o
In reply to this post by Jakub Narębski
On Wed, Jul 15, 2009 at 06:39:46AM -0700, Jakub Narebski wrote:
> Theodore Tso <[hidden email]> writes:
>
> > Yeah, that's another prime argument for maintaining your changes as a
> > patch queue.  I use a combination of quilt plus git.
>
> Why not StGit, or Guilt, or TopGit?

Sorry, typo; that should have read "guilt".  The example workflow I
included used guilt commands.

                                        - Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

Daniel Barkalow
In reply to this post by davidb-3
On Tue, 14 Jul 2009, [hidden email] wrote:

> On Tue, Jul 14, 2009 at 05:16:54PM -0700, Bryan Donlan wrote:
>
> > What do you mean by describing a merge? git is designed to have all
> > the information needed for a merge inherent in the repository history.
>
> Yes, provided you can actually do the merge all at once.
>
> > Why are there so many conflicts to make this an issue?
>
> Because I have to work in the "real world".
>
> > If the commits are isolated to small changes, rebasing the developer
> > topic branches instead of merging may help, by allowing you to take
> > conflicts one commit at a time. For example, if your problems are
> > primarily conflicts between developer branches and upstream:
>
> No real developer branches with conflicts (I make those be
> fixed), but several upstreams.  We have many developers busily
> doing work, and one or more other companies is also working on
> the same code.  Meanwhile, the mainline kernel advances at it's
> own astounding rate.
>
> Unfortunately, paying customers will always get priority of work,
> even when that position is actually somewhat shortsighted and it
> makes for a lot of merge effort later.
>
> The real issue is that there isn't any single individual who
> understands all of the code that conflicts.  It has to be divided
> up somehow, I'm just trying to figure out a better way of doing
> it.

It sounds to me like you're maintaining an internal version that everybody
merges their stuff into, and you periodically merge that with the mainline
kernel (generating a lot of conflicts which have to be resolved at the
same time). Instead of merging the branch that contains a lot of merges,
it would probably be easier to merge into a clone of mainline each of the
things that was merged before. That is, instead of merging less than all
of two trees, you'd merge commits which are not the newest commit on the
branch, choosing ones that individuals can resolve.

This also has the advantage where, if two of the changes affect an API
that's used in various different places, one person will get the
responsibility of resolving each of those conflicts, despite them being in
the middle of code they don't really understand, because they understand
what happened with the API and therefore what has to be done in that
little spot. Dividing the merge up by parts of the content would split
this work among people who aren't looking at the conflict in the
definition of the API.

        -Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Dividing up a large merge.

davidb-3
On Wed, Jul 15, 2009 at 11:57:59AM -0700, Daniel Barkalow wrote:

> It sounds to me like you're maintaining an internal version that everybody
> merges their stuff into, and you periodically merge that with the mainline
> kernel (generating a lot of conflicts which have to be resolved at the
> same time). Instead of merging the branch that contains a lot of merges,
> it would probably be easier to merge into a clone of mainline each of the
> things that was merged before. That is, instead of merging less than all
> of two trees, you'd merge commits which are not the newest commit on the
> branch, choosing ones that individuals can resolve.

That's part of it, although I have a pretty good handle on that
part.

The place where this comes up is that people in company X are
working on an internal version and company Y are working on a
similar internal version.  We need to share back and forth
between these more frequently than stuff gets into the mainline.

We do rebase at various points, but it takes quite a bit of work,
and it's fairly different work than the conflicts I'm concerned
with here.

Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html