Git commit generation numbers

classic Classic list List threaded Threaded
89 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Re: Git commit generation numbers

David Lang
On Fri, 22 Jul 2011, Jakub Narebski wrote:

>> Yes, the cache validity/invalidation criteria are the tricky bit.
>> Honestly, this is where the code gets ugly, not computing and storing
>> the generation numbers.
>
> BTW. with storing generation number in commit header there is a problem
> what would old version of git, one which does not understand said header,
> do during rebase.  Would it strip unknown headers, or would it copy
> generation number verbatim - which means that it can be incorrect?

Linus has already pointed out that this is safe.

old versions won't create generation numbers, but they will ignore them if
they exist. Since commits are not modified after they are created, the old
versions don't copy or modify them.

David Lang

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git commit generation numbers

David Lang
In reply to this post by Nicolas Pitre-2
On Fri, 22 Jul 2011, Nicolas Pitre wrote:

> On Fri, 22 Jul 2011, Jakub Narebski wrote:
>
>> BTW. with storing generation number in commit header there is a problem
>> what would old version of git, one which does not understand said header,
>> do during rebase.  Would it strip unknown headers, or would it copy
>> generation number verbatim - which means that it can be incorrect?
>
> They would indeed be copied verbatim and become incorrect.

how would they become incorrect?

David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git commit generation numbers

Jakub Narębski
On Fri, 22 Jul 2011, David Lang <[hidden email]> wrote:

> On Fri, 22 Jul 2011, Nicolas Pitre wrote:
> > On Fri, 22 Jul 2011, Jakub Narebski wrote:
> >
> > > BTW. with storing generation number in commit header there is a problem
> > > what would old version of git, one which does not understand said header,
> > > do during rebase.  Would it strip unknown headers, or would it copy
> > > generation number verbatim - which means that it can be incorrect?
> >
> > They would indeed be copied verbatim and become incorrect.
>
> how would they become incorrect?

Let's assume that the following history was created with new git, one
that correcly adds generation number header to commits:


  A(1)---B(2)---C(3)---D(4)---E(5)       <-- master
          \
           \----x(3)---y(4)---z(5)       <-- foo

The numbers are generation numbers in commit object.

Let's assume that this repository is fetched into repository instance
that is managed by older git, one that doesn't understand generation
header.

Then, if we do

  [old]$ git rebase master foo

and if old git _copies_ generation number header _verbatim_, we would
get:

  A(1)---B(2)---C(3)---D(4)---E(5)                         <-- master
                               \
                                \---x'(3)--y'(4)--z'(5)    <-- foo

Those generation numbers are *incorrect*; they should be:

  A(1)---B(2)---C(3)---D(4)---E(5)                         <-- master
                               \
                                \---x'(6)--y'(7)--z'(8)    <-- foo


That is IF unknown headers are copied verbatim during rebase.  For
"encoding" header this is a good thing, for "generation" it isn't.

--
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git commit generation numbers

Linus Torvalds-3
On Fri, Jul 22, 2011 at 11:34 AM, Jakub Narebski <[hidden email]> wrote:
>
> That is IF unknown headers are copied verbatim during rebase.  For
> "encoding" header this is a good thing, for "generation" it isn't.

Afaik, they aren't copied verbatim, and never have been. Afaik, the
only thing that has *ever* written commits is "commit_tree()"
(originally "main()" in commit-tree.c). Why is this red herring even
being discussed?

Of course you can always generate bogus commits by writing them by
hand. But that's irrelevant.

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git commit generation numbers

David Lang
In reply to this post by Jakub Narębski
On Fri, 22 Jul 2011, Jakub Narebski wrote:

> On Fri, 22 Jul 2011, David Lang <[hidden email]> wrote:
>> On Fri, 22 Jul 2011, Nicolas Pitre wrote:
>>> On Fri, 22 Jul 2011, Jakub Narebski wrote:
>>>
>>>> BTW. with storing generation number in commit header there is a problem
>>>> what would old version of git, one which does not understand said header,
>>>> do during rebase.  Would it strip unknown headers, or would it copy
>>>> generation number verbatim - which means that it can be incorrect?
>>>
>>> They would indeed be copied verbatim and become incorrect.
>>
>> how would they become incorrect?
>
> Let's assume that the following history was created with new git, one
> that correcly adds generation number header to commits:
>
>
>  A(1)---B(2)---C(3)---D(4)---E(5)       <-- master
>          \
>           \----x(3)---y(4)---z(5)       <-- foo
>
> The numbers are generation numbers in commit object.
>
> Let's assume that this repository is fetched into repository instance
> that is managed by older git, one that doesn't understand generation
> header.
>
> Then, if we do
>
>  [old]$ git rebase master foo
>
> and if old git _copies_ generation number header _verbatim_, we would
> get:
>
>  A(1)---B(2)---C(3)---D(4)---E(5)                         <-- master
>                               \
>                                \---x'(3)--y'(4)--z'(5)    <-- foo
>
> Those generation numbers are *incorrect*; they should be:
>
>  A(1)---B(2)---C(3)---D(4)---E(5)                         <-- master
>                               \
>                                \---x'(6)--y'(7)--z'(8)    <-- foo
>
>
> That is IF unknown headers are copied verbatim during rebase.  For
> "encoding" header this is a good thing, for "generation" it isn't.

commit headers are _not_ copied during rebase

a rebase is not the exact same commit, it's a "logically equivalent"
commit.

so when you do a rebase, you change the commit headers (you have to change
the parent headers in any case, and you would have to change the
generation numbers as well)

this was discussed earlier in this thread.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git commit generation numbers

Nicolas Pitre-2
On Fri, 22 Jul 2011, [hidden email] wrote:

> On Fri, 22 Jul 2011, Jakub Narebski wrote:
>
> > That is IF unknown headers are copied verbatim during rebase.  For
> > "encoding" header this is a good thing, for "generation" it isn't.
>
> commit headers are _not_ copied during rebase

Yes, this turns out to be true as I forgot that rebase is constructed on
top of format-patch+am, and format-patch doesn't preserve the ancillary
headers such as the existing "encoding" header, or the hypothetical
"generation" header.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git commit generation numbers

Jeff King
In reply to this post by Linus Torvalds-3
On Fri, Jul 22, 2011 at 12:06:08PM -0700, Linus Torvalds wrote:

> On Fri, Jul 22, 2011 at 11:34 AM, Jakub Narebski <[hidden email]> wrote:
> >
> > That is IF unknown headers are copied verbatim during rebase.  For
> > "encoding" header this is a good thing, for "generation" it isn't.
>
> Afaik, they aren't copied verbatim, and never have been. Afaik, the
> only thing that has *ever* written commits is "commit_tree()"
> (originally "main()" in commit-tree.c). Why is this red herring even
> being discussed?

In git.git, that is the case. There are other programs that may write
git commits, though. Try:

  http://www.google.com/codesearch#search/&q=hash-object.*commit&type=cs

Many uses seem OK (they are generating a commit from scratch). This one
at least (the sixth result from the search above) would actually
generate buggy generation headers (it modifies parents but passes other
headers through):

  http://www.google.com/codesearch#XUVcT9DKB_U/replace&ct=rc&cd=7&q=hash-object.*commit

It may be worth saying that such code is stupid and ugly and wrong, or
that it is not deployed widely enough to care about.  But it's not
entirely a red herring.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git commit generation numbers

Felipe Contreras
In reply to this post by Linus Torvalds-3
On Fri, Jul 22, 2011 at 10:06 PM, Linus Torvalds
<[hidden email]> wrote:

> On Fri, Jul 22, 2011 at 11:34 AM, Jakub Narebski <[hidden email]> wrote:
>>
>> That is IF unknown headers are copied verbatim during rebase.  For
>> "encoding" header this is a good thing, for "generation" it isn't.
>
> Afaik, they aren't copied verbatim, and never have been. Afaik, the
> only thing that has *ever* written commits is "commit_tree()"
> (originally "main()" in commit-tree.c). Why is this red herring even
> being discussed?
>
> Of course you can always generate bogus commits by writing them by
> hand. But that's irrelevant.

Let's suppose for a moment that the commits do have these wrong
generation numbers, shouldn't a fetch on the newer client check these
and show an error? But what if they are pushed to a central server
that has old version of git? It would be messy.

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git commit generation numbers

artagnon
Hi,

First, let me start out by saying that I'm a fairly new contributor to
Git, and I'm far less experienced than the other people on this
thread.  I've read through all the discussions time and again, and
thought about the problem for some time now - I can't say I understand
it as fully as many of you do, but I think I may have a slightly
different perspective to offer.

In what way is Git fundamentally different from Subversion?  It's the
simplicity of the data model.  From the simplest building block, a
key-value store, we have been able to compose and build things on top
of it.  The reason we built centralized version control systems
earlier is because it was *easier* to address the composition
problems.  We dumped all related repository and problems into one
central server.  With so much information in one place, things are
tightly coupled and problems are easier to solve.  Still not
convinced?  What's the weakest component in Git today?  Undoubtedly
submodules.  Ofcourse, a large part of the reason is that many people
don't use submodules, and hence it doesn't improve -- but it's
actually a circular problem.  People don't use submodules, because
it's so featureless and hard to develop.  Why is it so hard?  Back to
the fundamental problem of composition from simple building blocks.
In submodules, we have to take entire DAGs and build a composite DAG.
The key pieces of information are deep inside Git's fundamnetals:
Gitlinks.  Other projects try like Gitslave try to attack the problem
on a more superficial level, but they all hit a barrier when they
discover that they can't compose big blocks of data: you need simple
building blocks to compose.

It's the same story with C (and now, Haskell).  Why does everyone like
C so much?  Because it only provides fundamental building blocks and
gives people the freedom to compose the way they like.  It doesn't
provide big "template blocks" like Java, because they tend to be
restrictive in the long run.  Sure, Java is easier to start out with,
but people soon realize that big blocks can't compose.

More than arguing about backward compatibility, and about how older
versions of Git commits won't have generation numbers, I think this is
what we should be focusing on.  Sure, it'll additionally make sense to
put in a cache to speed things up now, but we need to think about what
Git will be 10~15 years from now.  The fundamental pieces of
information required for composition must be present in the
fundamental building blocks.

The real question we should be asking is: "Should Git have had commit
generation numbers in 2005?".  If the answer is "yes", we should put
them in now before it becomes even harder, bending over backwards for
backward compatibility if necessary.  Otherwise, we'll regret this
decision 10~15 years later, when we're faced with deeper issues.  If
you want a concrete example, think about how you'd compose DAGs
together (again, the submodules problem): where is the information
required to prune each DAG and compose?

I wish I could write this in myself, but I'm afraid I don't have the
engineering skill yet.  I'll be happy to contribute whatever little I
can, and participate in the review process.

Thanks.

-- Ram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
12345