Git is not scalable with too many refs/*

classic Classic list List threaded Threaded
126 messages Options
1234 ... 7
Reply | Threaded
Open this post in threaded view
|

Git is not scalable with too many refs/*

NAKAMURA Takumi
Hello, Git. It is my 1st post here.

I have tried tagging each commit as "refs/tags/rXXXXXX" on git-svn
repo locally. (over 100k refs/tags.)
Indeed, it made something extremely slower, even with packed-refs and
pack objects.
I gave up, then, to push tags to upstream. (it must be terror) :p

I know it might be crazy in the git way, but it would bring me conveniences.
(eg. git log --oneline --decorate shows me each svn revision)
I would like to work for Git to live with many tags.

* Issues as far as I have investigated;

  - git show --decorate is always slow.
    in decorate.c, every commits are inspected.
  - git rev-tree --quiet --objects $upstream --not --all spends so much time,
    even if it is expected to return with 0.
    As you know, it is used in builtin/fetch.c.
  - git-upload-pack shows "all" refs to me if upstream has too many refs.

I would like to work as below if they were valuable.

  - Get rid of inspecting commits in packed-refs on decorate stuff.
  - Implement sort-by-hash packed-refs, (not sort-by-name)
  - Implement more effective pruning --not --all on revision.c.
  - Think about enhancement of protocol to transfer many refs more effectively.

I am happy to consider the issue, thank you.

...Takumi
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Sverre Rabbelier-2
Heya,

[+shawn, who runs into something similar with Gerrit]

On Thu, Jun 9, 2011 at 05:44, NAKAMURA Takumi <[hidden email]> wrote:

> Hello, Git. It is my 1st post here.
>
> I have tried tagging each commit as "refs/tags/rXXXXXX" on git-svn
> repo locally. (over 100k refs/tags.)
> Indeed, it made something extremely slower, even with packed-refs and
> pack objects.
> I gave up, then, to push tags to upstream. (it must be terror) :p
>
> I know it might be crazy in the git way, but it would bring me conveniences.
> (eg. git log --oneline --decorate shows me each svn revision)
> I would like to work for Git to live with many tags.
>
> * Issues as far as I have investigated;
>
>  - git show --decorate is always slow.
>    in decorate.c, every commits are inspected.
>  - git rev-tree --quiet --objects $upstream --not --all spends so much time,
>    even if it is expected to return with 0.
>    As you know, it is used in builtin/fetch.c.
>  - git-upload-pack shows "all" refs to me if upstream has too many refs.
>
> I would like to work as below if they were valuable.
>
>  - Get rid of inspecting commits in packed-refs on decorate stuff.
>  - Implement sort-by-hash packed-refs, (not sort-by-name)
>  - Implement more effective pruning --not --all on revision.c.
>  - Think about enhancement of protocol to transfer many refs more effectively.
>
> I am happy to consider the issue, thank you.

--
Cheers,

Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Jakub Narębski
In reply to this post by NAKAMURA Takumi
NAKAMURA Takumi <[hidden email]> writes:

> Hello, Git. It is my 1st post here.
>
> I have tried tagging each commit as "refs/tags/rXXXXXX" on git-svn
> repo locally. (over 100k refs/tags.)
[...]

That's insane.  You would do much better to mark each commit with
note.  Notes are designed to be scalable.  See e.g. this thread

  [RFD] Proposal for git-svn: storing SVN metadata (git-svn-id) in notes
  http://article.gmane.org/gmane.comp.version-control.git/174657

--
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Shawn Pearce
In reply to this post by Sverre Rabbelier-2
On Wed, Jun 8, 2011 at 23:50, Sverre Rabbelier <[hidden email]> wrote:
> [+shawn, who runs into something similar with Gerrit]

> On Thu, Jun 9, 2011 at 05:44, NAKAMURA Takumi <[hidden email]> wrote:
>> Hello, Git. It is my 1st post here.
>>
>> I have tried tagging each commit as "refs/tags/rXXXXXX" on git-svn
>> repo locally. (over 100k refs/tags.)

As Jakub pointed out, use git notes for this. They were designed to
scale to >100,000 annotations.

>> Indeed, it made something extremely slower, even with packed-refs and
>> pack objects.

Having a reference to every commit in the repository is horrifically
slow. We run into this with Gerrit Code Review and I need to find
another solution. Git just wasn't meant to process repositories like
this.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Stephen Bash
In reply to this post by Jakub Narębski
----- Original Message -----

> From: "Jakub Narebski" <[hidden email]>
> To: "NAKAMURA Takumi" <[hidden email]>
> Cc: "git" <[hidden email]>
> Sent: Thursday, June 9, 2011 7:18:09 AM
> Subject: Re: Git is not scalable with too many refs/*
> NAKAMURA Takumi <[hidden email]> writes:
>
> > Hello, Git. It is my 1st post here.
> >
> > I have tried tagging each commit as "refs/tags/rXXXXXX" on git-svn
> > repo locally. (over 100k refs/tags.)
> [...]
>
> That's insane. You would do much better to mark each commit with
> note. Notes are designed to be scalable. See e.g. this thread
>
> [RFD] Proposal for git-svn: storing SVN metadata (git-svn-id) in notes
> http://article.gmane.org/gmane.comp.version-control.git/174657

As a reformed SVN user (i.e. not using it anymore ;]) I agree that 100k tags seems crazy, but I was contemplating doing the exact same thing as Takumi.  Skimming that thread, I didn't see the key point (IMO): notes can map from commits to a "name" (or other information), tags map from a "name" to commits.

I've seen two different workflows develop:
  1) Hacking on some code in Git the programmer finds something wrong.  Using Git tools he can pickaxe/bisect/etc. and find that the problem traces back to a commit imported from Subversion.
  2) The programmer finds something wrong, asks coworker, coworker says "see bug XYZ", bug XYZ says "Fixed in r20356".

I agree notes is the right answer for (1), but for (2) you really want a cross reference table from Subversion rev number to Git commit.

In our office we created the cross reference table once by walking the Git tree and storing it as a file (we had some degenerate cases where one SVN rev mapped to multiple Git commits, but I don't remember the details), but it's not really usable from Git.  Lightweight tags would be an awesome solution (if they worked).  Perhaps a custom subcommand is a reasonable middle ground.

Thanks,
Stephen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

A Large Angry SCM
In reply to this post by Shawn Pearce
On 06/09/2011 11:23 AM, Shawn Pearce wrote:

> On Wed, Jun 8, 2011 at 23:50, Sverre Rabbelier<[hidden email]>  wrote:
>> [+shawn, who runs into something similar with Gerrit]
>
>> On Thu, Jun 9, 2011 at 05:44, NAKAMURA Takumi<[hidden email]>  wrote:
>>> Hello, Git. It is my 1st post here.
>>>
>>> I have tried tagging each commit as "refs/tags/rXXXXXX" on git-svn
>>> repo locally. (over 100k refs/tags.)
>
> As Jakub pointed out, use git notes for this. They were designed to
> scale to>100,000 annotations.
>
>>> Indeed, it made something extremely slower, even with packed-refs and
>>> pack objects.
>
> Having a reference to every commit in the repository is horrifically
> slow. We run into this with Gerrit Code Review and I need to find
> another solution. Git just wasn't meant to process repositories like
> this.

Assuming a very large number of refs, what is it that makes git so
horrifically slow? Is there a design or implementation lesson here?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Shawn Pearce
On Thu, Jun 9, 2011 at 08:52, A Large Angry SCM <[hidden email]> wrote:
> On 06/09/2011 11:23 AM, Shawn Pearce wrote:
>> Having a reference to every commit in the repository is horrifically
>> slow. We run into this with Gerrit Code Review and I need to find
>> another solution. Git just wasn't meant to process repositories like
>> this.
>
> Assuming a very large number of refs, what is it that makes git so
> horrifically slow? Is there a design or implementation lesson here?

A few things.

Git does a sequential scan of all references when it first needs to
access references for an operation. This requires reading the entire
packed-refs file, and the recursive scan of the "refs/" subdirectory
for any loose refs that might override the packed-refs file.

A lot of operations toss every commit that a reference points at into
the revision walker's LRU queue. If you have a tag pointing to every
commit, then the entire project history enters the LRU queue at once,
up front. That queue is managed with O(N^2) insertion time. And the
entire queue has to be filled before anything can be output.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Jeff King
On Thu, Jun 09, 2011 at 08:56:50AM -0700, Shawn O. Pearce wrote:

> A lot of operations toss every commit that a reference points at into
> the revision walker's LRU queue. If you have a tag pointing to every
> commit, then the entire project history enters the LRU queue at once,
> up front. That queue is managed with O(N^2) insertion time. And the
> entire queue has to be filled before anything can be output.

We ran into this recently at github. Since our many-refs repos were
mostly forks, we had a lot of duplicate commits, and were able to solve
it with ea5f220 (fetch: avoid repeated commits in mark_complete,
2011-05-19).

However, I also worked up a faster priority queue implementation that
would work in the general case:

  http://thread.gmane.org/gmane.comp.version-control.git/174003/focus=174005

I suspect it would speed up the original poster's slow fetch. The
problem is that a fast priority queue doesn't have quite the same access
patterns as a linked list, so replacing all of the commit_lists in git
with the priority queue would be quite a painful undertaking. So we are
left with using the fast queue only in specific hot-spots.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

NAKAMURA Takumi
Good afternoon Git! Thank you guys to give me comments.

Jakub and Shawn,

Sure, Notes should be used at the case, I agree.

> (eg. git log --oneline --decorate shows me each svn revision)

My example might misunderstand you. I intended tags could show me
pretty abbrev everywhere on Git. I would be happier if tags might be
available bi-directional alias, as Stephen mentions.

It would be better git-svn could record metadata into notes, I think, too. :D

Stephen,

2011/6/10 Stephen Bash <[hidden email]>:
> I've seen two different workflows develop:
>  1) Hacking on some code in Git the programmer finds something wrong.  Using Git tools he can pickaxe/bisect/etc. and find that the problem traces back to a commit imported from Subversion.
>  2) The programmer finds something wrong, asks coworker, coworker says "see bug XYZ", bug XYZ says "Fixed in r20356".
>
> I agree notes is the right answer for (1), but for (2) you really want a cross reference table from Subversion rev number to Git commit.

It is the point I wanted to say, thank you! I am working with svn-men.
They often speak svn revision number. (And I have to tell them svn
revs then)

> In our office we created the cross reference table once by walking the Git tree and storing it as a file (we had some degenerate cases where one SVN rev mapped to multiple Git commits, but I don't remember the details), but it's not really usable from Git.  Lightweight tags would be an awesome solution (if they worked).  Perhaps a custom subcommand is a reasonable middle ground.

Reconstructing svnrev-commits mapping can be done by git-svn itself.
Unfortunately, git-svn's .rev-map is sorted by revision number. I
think it would be useless to make subcommands unless they were
pluggable into Git as "smart-tag resolver".

Peff,

At first, thank you to work for Github! Awesome!
I didn't know Github has refs issues. (yeah, I should not push 100k of
tags to Github for now :p )

I am working on linux and windows. Many-refs-repo can make Git awfully
slow (than linux!) I hope I could work also for windows to improve
various performance issue.

FYI, I have tweaked git-rev-list for commits not to sort by date with
--quiet. It improves git-fetch (git-rev-list --not --all) performance
when objects is well-packed.


...Takumi
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Andreas Ericsson
In reply to this post by Shawn Pearce
On 06/09/2011 05:56 PM, Shawn Pearce wrote:

> On Thu, Jun 9, 2011 at 08:52, A Large Angry SCM<[hidden email]>  wrote:
>> On 06/09/2011 11:23 AM, Shawn Pearce wrote:
>>> Having a reference to every commit in the repository is horrifically
>>> slow. We run into this with Gerrit Code Review and I need to find
>>> another solution. Git just wasn't meant to process repositories like
>>> this.
>>
>> Assuming a very large number of refs, what is it that makes git so
>> horrifically slow? Is there a design or implementation lesson here?
>
> A few things.
>
> Git does a sequential scan of all references when it first needs to
> access references for an operation. This requires reading the entire
> packed-refs file, and the recursive scan of the "refs/" subdirectory
> for any loose refs that might override the packed-refs file.
>
> A lot of operations toss every commit that a reference points at into
> the revision walker's LRU queue. If you have a tag pointing to every
> commit, then the entire project history enters the LRU queue at once,
> up front. That queue is managed with O(N^2) insertion time. And the
> entire queue has to be filled before anything can be output.
>

Hmm. Since we're using pre-hashed data with an obvious lookup method
we should be able to do much, much better than O(n^2) for insertion
and better than O(n) for worst-case lookups. I'm thinking a 1-byte
trie, resulting in a depth, lookup and insertion complexity of 20. It
would waste some memory but it might be worth it for fixed asymptotic
complexity for both insertion and lookup.

--
Andreas Ericsson                   [hidden email]
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Shawn Pearce
On Fri, Jun 10, 2011 at 00:41, Andreas Ericsson <[hidden email]> wrote:

> On 06/09/2011 05:56 PM, Shawn Pearce wrote:
>>
>> A lot of operations toss every commit that a reference points at into
>> the revision walker's LRU queue. If you have a tag pointing to every
>> commit, then the entire project history enters the LRU queue at once,
>> up front. That queue is managed with O(N^2) insertion time. And the
>> entire queue has to be filled before anything can be output.
>
> Hmm. Since we're using pre-hashed data with an obvious lookup method
> we should be able to do much, much better than O(n^2) for insertion
> and better than O(n) for worst-case lookups. I'm thinking a 1-byte
> trie, resulting in a depth, lookup and insertion complexity of 20. It
> would waste some memory but it might be worth it for fixed asymptotic
> complexity for both insertion and lookup.

Not really.

The queue isn't sorting by SHA-1. Its sorting by commit timestamp,
descending. Those aren't pre-hashed. The O(N^2) insertion is because
the code is trying to find where this commit belongs in the list of
commits as sorted by commit timestamp.

There are some priority queue datastructures designed for this sort of
work, e.g. a calendar queue might help. But its not as simple as a 1
byte trie.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Jakub Narębski
Shawn Pearce <[hidden email]> writes:

> On Fri, Jun 10, 2011 at 00:41, Andreas Ericsson <[hidden email]> wrote:
>> On 06/09/2011 05:56 PM, Shawn Pearce wrote:
>>>
>>> A lot of operations toss every commit that a reference points at into
>>> the revision walker's LRU queue. If you have a tag pointing to every
>>> commit, then the entire project history enters the LRU queue at once,
>>> up front. That queue is managed with O(N^2) insertion time. And the
>>> entire queue has to be filled before anything can be output.
>>
>> Hmm. Since we're using pre-hashed data with an obvious lookup method
>> we should be able to do much, much better than O(n^2) for insertion
>> and better than O(n) for worst-case lookups. I'm thinking a 1-byte
>> trie, resulting in a depth, lookup and insertion complexity of 20. It
>> would waste some memory but it might be worth it for fixed asymptotic
>> complexity for both insertion and lookup.
>
> Not really.
>
> The queue isn't sorting by SHA-1. Its sorting by commit timestamp,
> descending. Those aren't pre-hashed. The O(N^2) insertion is because
> the code is trying to find where this commit belongs in the list of
> commits as sorted by commit timestamp.
>
> There are some priority queue datastructures designed for this sort of
> work, e.g. a calendar queue might help. But its not as simple as a 1
> byte trie.

In the case of Subversion numbers (revision number to hash mapping)
sorted by name (in version order at least) means sorted by date.  I
wonder if there is data structure for which this is optimum insertion
order (like for insertion sort almost sorted data is best case).

--
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Jeff King
In reply to this post by Shawn Pearce
On Fri, Jun 10, 2011 at 12:41:39PM -0700, Shawn O. Pearce wrote:

> Not really.
>
> The queue isn't sorting by SHA-1. Its sorting by commit timestamp,
> descending. Those aren't pre-hashed. The O(N^2) insertion is because
> the code is trying to find where this commit belongs in the list of
> commits as sorted by commit timestamp.
>
> There are some priority queue datastructures designed for this sort of
> work, e.g. a calendar queue might help. But its not as simple as a 1
> byte trie.

All you really need is a heap-based priority queue, which gives O(lg n)
insertion and popping (and O(1) peeking at the top). I even wrote one
and posted it recently (I won't dig up the reference, but I posted it
elsewhere in this thread, I think).

The problem is that many parts of the code assume that commit_list is a
linked list and do fast iterations, or even splicing. It's nothing you
couldn't get around with some work, but it turns out to involve a lot
of code changes.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Andreas Ericsson
In reply to this post by Shawn Pearce
On 06/10/2011 09:41 PM, Shawn Pearce wrote:

> On Fri, Jun 10, 2011 at 00:41, Andreas Ericsson<[hidden email]>  wrote:
>> On 06/09/2011 05:56 PM, Shawn Pearce wrote:
>>>
>>> A lot of operations toss every commit that a reference points at into
>>> the revision walker's LRU queue. If you have a tag pointing to every
>>> commit, then the entire project history enters the LRU queue at once,
>>> up front. That queue is managed with O(N^2) insertion time. And the
>>> entire queue has to be filled before anything can be output.
>>
>> Hmm. Since we're using pre-hashed data with an obvious lookup method
>> we should be able to do much, much better than O(n^2) for insertion
>> and better than O(n) for worst-case lookups. I'm thinking a 1-byte
>> trie, resulting in a depth, lookup and insertion complexity of 20. It
>> would waste some memory but it might be worth it for fixed asymptotic
>> complexity for both insertion and lookup.
>
> Not really.
>
> The queue isn't sorting by SHA-1. Its sorting by commit timestamp,
> descending. Those aren't pre-hashed. The O(N^2) insertion is because
> the code is trying to find where this commit belongs in the list of
> commits as sorted by commit timestamp.
>

Hmm. We should still be able to do better than that, and particularly
for the "tag-each-commit" workflow. Since it's most likely those tags
are generated using incrementing numbers, we could have a cut-off where
we first parse all the refs and make an optimistic assumption that an
alphabetical sort of the refs provides a map of insertion-points for
the commits. Since the best case behaviour is still O(1) for insertion
sort and it's unlikely that thousands of refs are in random order, that
should cause the vast majority of the refs we insert to follow the best
case scenario.

This will fall on its arse when people start doing hg-ref -> git-commit
tags ofcourse, but that doesn't seem to be happening, or at least not to
the same extent as with svn-revisions -> git-gommit mapping.

We're still not improving the asymptotic complexity, but it's a pretty
safe bet that we for a vast majority of cases improve wallclock runtime
by a hefty amount with a relatively minor effort.

--
Andreas Ericsson                   [hidden email]
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Jeff King
In reply to this post by NAKAMURA Takumi
On Fri, Jun 10, 2011 at 12:59:47PM +0900, NAKAMURA Takumi wrote:

> 2011/6/10 Stephen Bash <[hidden email]>:
> > I've seen two different workflows develop:
> >  1) Hacking on some code in Git the programmer finds something wrong.  Using Git tools he can pickaxe/bisect/etc. and find that the problem traces back to a commit imported from Subversion.
> >  2) The programmer finds something wrong, asks coworker, coworker says "see bug XYZ", bug XYZ says "Fixed in r20356".
> >
> > I agree notes is the right answer for (1), but for (2) you really want a cross reference table from Subversion rev number to Git commit.
>
> It is the point I wanted to say, thank you! I am working with svn-men.
> They often speak svn revision number. (And I have to tell them svn
> revs then)

Yeah, there is no simple way to do the bi-directional mapping in git.
If all you want are decorations on commits, notes are definitely the way
to go. They are optimized for lookup in of commit -> data. But if you
want data -> commit, the only mapping we have is refs, and they are not
well optimized for the many-refs use case.

Packed-refs are better than loose refs, but I think right now we just
load them all in to an in-memory linked list. We could load them into a
more efficient in-memory data structure, or we could perhaps even mmap
the packed-refs file and binary search it in place.

But lookup is only part of the problem. There are algorithms that want
to look at all the refs (notably fetching and pushing), which are going
to be a bit slower. We don't have a way to tell those algorithms that
those refs are uninteresting for reachability analysis, because they are
just pointing to parts of the graph that are already reachable by
regular refs. Maybe there could be a part of the refs namespace that is
ignored by "--all". I dunno. That seems like a weird inconsistency.

> FYI, I have tweaked git-rev-list for commits not to sort by date with
> --quiet. It improves git-fetch (git-rev-list --not --all) performance
> when objects is well-packed.

I'm not sure that is a good solution. Even with --quiet, we will be
walking the commit graph to find merge bases to see if things are
connected. The walking code expects date-sorting; I'm not sure what
changing that assumption will do to the code.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Andreas Ericsson
In reply to this post by NAKAMURA Takumi
On 06/10/2011 05:59 AM, NAKAMURA Takumi wrote:

> Good afternoon Git! Thank you guys to give me comments.
>
> Jakub and Shawn,
>
> Sure, Notes should be used at the case, I agree.
>
>> (eg. git log --oneline --decorate shows me each svn revision)
>
> My example might misunderstand you. I intended tags could show me
> pretty abbrev everywhere on Git. I would be happier if tags might be
> available bi-directional alias, as Stephen mentions.
>
> It would be better git-svn could record metadata into notes, I think, too. :D
>
> Stephen,
>
> 2011/6/10 Stephen Bash<[hidden email]>:
>> I've seen two different workflows develop:
>>   1) Hacking on some code in Git the programmer finds something wrong.  Using Git tools he can pickaxe/bisect/etc. and find that the problem traces back to a commit imported from Subversion.
>>   2) The programmer finds something wrong, asks coworker, coworker says "see bug XYZ", bug XYZ says "Fixed in r20356".
>>
>> I agree notes is the right answer for (1), but for (2) you really want a cross reference table from Subversion rev number to Git commit.
>

If you're using svn metadata in the commit text, you can always do
"git log -p --grep=@20356" to get the commits relevant to that one.
It's not as fast as "git show svn-20356", but it's not exactly
glacial either and would avoid the problems you're having now.

--
Andreas Ericsson                   [hidden email]
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Jeff King
On Tue, Jun 14, 2011 at 02:17:58AM +0200, Andreas Ericsson wrote:

> If you're using svn metadata in the commit text, you can always do
> "git log -p --grep=@20356" to get the commits relevant to that one.
> It's not as fast as "git show svn-20356", but it's not exactly
> glacial either and would avoid the problems you're having now.

If we do end up putting this data into notes eventually (which I think
we _should_ do, because then you aren't locked into having this svn
cruft in your commit messages for all time, but can rather choose
whether or not to display it), it would be nice to have a --grep-notes
feature in git-log. Or maybe --grep should look in notes by default,
too, if we are showing them.

I suspect the feature would be really easy to implement, if somebody is
looking for a gentle introduction to git, or a fun way to spend an hour.
:)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Junio C Hamano
Jeff King <[hidden email]> writes:

> I suspect the feature would be really easy to implement, if somebody is
> looking for a gentle introduction to git, or a fun way to spend an hour.

I would rather want to see if somebody can come up with a flexible reverse
mapping feature around notes. It does not have to be completely generic,
just being flexible enough is fine.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Sverre Rabbelier-2
Heya,

On Tue, Jun 14, 2011 at 06:41, Junio C Hamano <[hidden email]> wrote:
> I would rather want to see if somebody can come up with a flexible reverse
> mapping feature around notes. It does not have to be completely generic,
> just being flexible enough is fine.

Wouldn't it be enough to simply create a note on 'r651235' with as
contents the git ref?

--
Cheers,

Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Git is not scalable with too many refs/*

Johan Herland
On Tuesday 14 June 2011, Sverre Rabbelier wrote:
> Heya,
>
> On Tue, Jun 14, 2011 at 06:41, Junio C Hamano <[hidden email]> wrote:
> > I would rather want to see if somebody can come up with a flexible
> > reverse mapping feature around notes. It does not have to be
> > completely generic, just being flexible enough is fine.
>
> Wouldn't it be enough to simply create a note on 'r651235' with as
> contents the git ref?

Not quite sure what you mean by "create a note on 'r651235'". You could
devise a scheme where you SHA1('r651235'), and then create a note on the
resulting hash.

Notes are named by the SHA1 of the object they annotate, but there is no
hard requirement (as long as you stay away from "git notes prune") that the
SHA1 annotated actually exists as a valid Git object in your repo.

Hence, you can use notes to annotate _anything_ that can be uniquely reduced
to a SHA1 hash.


...Johan

--
Johan Herland, <[hidden email]>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
1234 ... 7