git fetch --depth=* broken?

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

git fetch --depth=* broken?

Nicolas Pitre

try out:

        git clone --depth=1 git://git.kernel.org/pub/scm/git/git
        cd git
        git fetch --depth=2

It then silently fails, except for the return code of 1.

With -v this is the same result.  Only if I remove --depth= do I get:

From git://git.kernel.org/pub/scm/git/git
 = [up to date]      html       -> origin/html
 = [up to date]      maint      -> origin/maint
 = [up to date]      man        -> origin/man
 = [up to date]      master     -> origin/master
 = [up to date]      next       -> origin/next
 = [up to date]      pu         -> origin/pu
 = [up to date]      todo       -> origin/todo

and a return code of 0.

It seems that commit c6bc400585 is partly responsible for that
misbehavior.  At least reverting it makes the status list appear again
even with the presence of --depth=.

But still, actual result isn't any better.  Using --depth=2 or
--depth=1000 doesn't change anything, unless there is _also_ a ref that
was updated on the remote end.  Looks like the code is happy to conclude
that there is nothing to do if local refs match remote refs and never go
to talk further to the remote end ("no "shallow ..." nor "deepen ..."
are sent over the wire) despite the fact that --depth=1000 would
certainly have to trigger a pack transfer.

I'm also surprised that such thing as simple deepening of a repo is not
in the test suite.  We certainly document this operation in the
git-fetch man page though.

The code in builtin-fetch-pack.c still looks rather confusing to me, so
hopefully you are more familiar with it and can provide a fix.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH] fix simple deepening of a repo

Nicolas Pitre
If all refs sent by the remote repo during a fetch are reachable
locally, then no further conversation is performed with the remote. This
check is skipped when the --depth argument is provided to allow the
deepening of a shallow clone which corresponding remote repo has no
changed.

However, some additional filtering was added in commit c29727d5 to
remove those refs which are equal on both sides.  If the remote repo has
not changed, then the list of refs to give the remote process becomes
empty and simply attempting to deepen a shallow repo always fails.

Let's stop being smart in that case and simply send the whole list over
when that condition is met.  The remote will do the right thing anyways.

Test cases for this issue are also provided.

Signed-off-by: Nicolas Pitre <[hidden email]>
---

On Sat, 22 Aug 2009, Nicolas Pitre wrote:

> try out:
>
> git clone --depth=1 git://git.kernel.org/pub/scm/git/git
> cd git
> git fetch --depth=2
>
> It then silently fails, except for the return code of 1.

Here's the fix.  Problem wasn't in builtin-fetch-pack.c after all.

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index a8c2ca2..18376d6 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -139,6 +139,36 @@ test_expect_success 'fsck in shallow repo' '
  )
 '
 
+test_expect_success 'simple fetch in shallow repo' '
+ (
+ cd shallow &&
+ git fetch
+ )
+'
+
+test_expect_success 'no changes expected' '
+ (
+ cd shallow &&
+ git count-objects -v
+ ) > count.shallow.2 &&
+ cmp count.shallow count.shallow.2
+'
+
+test_expect_success 'fetch same depth in shallow repo' '
+ (
+ cd shallow &&
+ git fetch --depth=2
+ )
+'
+
+test_expect_success 'no changes expected' '
+ (
+ cd shallow &&
+ git count-objects -v
+ ) > count.shallow.3 &&
+ cmp count.shallow count.shallow.3
+'
+
 test_expect_success 'add two more' '
  add B66 $B65 &&
  add B67 $B66
@@ -201,4 +231,21 @@ test_expect_success 'pull in shallow repo with missing merge base' '
  )
 '
 
+test_expect_success 'additional simple shallow deepenings' '
+ (
+ cd shallow &&
+ git fetch --depth=8 &&
+ git fetch --depth=10 &&
+ git fetch --depth=11
+ )
+'
+
+test_expect_success 'clone shallow object count' '
+ (
+ cd shallow &&
+ git count-objects -v
+ ) > count.shallow &&
+ grep "^count: 52" count.shallow
+'
+
 test_done
diff --git a/transport.c b/transport.c
index a8df319..ce91387 100644
--- a/transport.c
+++ b/transport.c
@@ -925,11 +925,12 @@ const struct ref *transport_get_remote_refs(struct transport *transport)
 int transport_fetch_refs(struct transport *transport, const struct ref *refs)
 {
  int rc;
- int nr_heads = 0, nr_alloc = 0;
+ int nr_heads = 0, nr_alloc = 0, nr_refs = 0;
  const struct ref **heads = NULL;
  const struct ref *rm;
 
  for (rm = refs; rm; rm = rm->next) {
+ nr_refs++;
  if (rm->peer_ref &&
     !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
  continue;
@@ -937,6 +938,19 @@ int transport_fetch_refs(struct transport *transport, const struct ref *refs)
  heads[nr_heads++] = rm;
  }
 
+ if (!nr_heads) {
+ /*
+ * When deepening of a shallow repository is requested,
+ * then local and remote refs are likely to still be equal.
+ * Just feed them all to the fetch method in that case.
+ * This condition shouldn't be met in a non-deepening fetch
+ * (see builtin-fetch.c:quickfetch()).
+ */
+ heads = xmalloc(nr_refs * sizeof(*heads));
+ for (rm = refs; rm; rm = rm->next)
+ heads[nr_heads++] = rm;
+ }
+
  rc = transport->fetch(transport, nr_heads, heads);
  free(heads);
  return rc;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Junio C Hamano
Nicolas Pitre <[hidden email]> writes:

> If all refs sent by the remote repo during a fetch are reachable
> locally, then no further conversation is performed with the remote. This
> check is skipped when the --depth argument is provided to allow the
> deepening of a shallow clone which corresponding remote repo has no
> changed.
>
> However, some additional filtering was added in commit c29727d5 to
> remove those refs which are equal on both sides.  If the remote repo has
> not changed, then the list of refs to give the remote process becomes
> empty and simply attempting to deepen a shallow repo always fails.
>
> Let's stop being smart in that case and simply send the whole list over
> when that condition is met.  The remote will do the right thing anyways.
>
> Test cases for this issue are also provided.
>
> Signed-off-by: Nicolas Pitre <[hidden email]>
> ---

Thanks.  The fix looks correct (as usual with patches from you).

But it makes me wonder if this logic to filter refs is buying us anything.

>   for (rm = refs; rm; rm = rm->next) {
> + nr_refs++;
>   if (rm->peer_ref &&
>      !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
>   continue;
                ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
                heads[nr_heads++] = rm;
        }

What is the point of not asking for the refs that we know are the same?

In other words, what breaks (not necessarily in the correctness sense, but
also in the performance sense) if we get rid of this filtering altogether?

Over a native protocol, we will tell the other end what we have and the
remote end will notice that we are asking for the same thing, so it won't
include unnecessary objects in the pack anyway.  When calling out to a
walker, we will also notice that the ref matches what we have already and
will not fetch anything from the remote.  Because the commit at the tip of
the ref that is already up-to-date is marked as COMPLETE, we will not even
locally have to walk the object chain to make sure things are connected.

I think the only thing that this possibly gains is if _everything_ is up
to date.  Then we won't need to make a call to transport->fetch() and it
would save a new connection.

But that optimization is already done long ago by the caller's
quickfetch() in the normal case.

Which makes me suspect that the real fix should be something a lot
simpler, like this (untested) patch.

diff --git a/transport.c b/transport.c
index f231b35..14dab81 100644
--- a/transport.c
+++ b/transport.c
@@ -1053,18 +1053,16 @@ const struct ref *transport_get_remote_refs(struct transport *transport)
 int transport_fetch_refs(struct transport *transport, const struct ref *refs)
 {
  int rc;
- int nr_heads = 0, nr_alloc = 0;
+ int nr_heads = 0;
  const struct ref **heads = NULL;
  const struct ref *rm;
 
- for (rm = refs; rm; rm = rm->next) {
- if (rm->peer_ref &&
-    !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
- continue;
- ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
+ for (rm = refs; rm; rm = rm->next)
+ nr_heads++;
+ heads = xmalloc(nr_heads * sizeof(*heads));
+ nr_heads = 0;
+ for (rm = refs; rm; rm = rm->next)
  heads[nr_heads++] = rm;
- }
-
  rc = transport->fetch(transport, nr_heads, heads);
  free(heads);
  return rc;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Nicolas Pitre
On Sun, 23 Aug 2009, Junio C Hamano wrote:

> Nicolas Pitre <[hidden email]> writes:
>
> > If all refs sent by the remote repo during a fetch are reachable
> > locally, then no further conversation is performed with the remote. This
> > check is skipped when the --depth argument is provided to allow the
> > deepening of a shallow clone which corresponding remote repo has no
> > changed.
> >
> > However, some additional filtering was added in commit c29727d5 to
> > remove those refs which are equal on both sides.  If the remote repo has
> > not changed, then the list of refs to give the remote process becomes
> > empty and simply attempting to deepen a shallow repo always fails.
> >
> > Let's stop being smart in that case and simply send the whole list over
> > when that condition is met.  The remote will do the right thing anyways.
> >
> > Test cases for this issue are also provided.
> >
> > Signed-off-by: Nicolas Pitre <[hidden email]>
> > ---
>
> Thanks.  The fix looks correct (as usual with patches from you).
>
> But it makes me wonder if this logic to filter refs is buying us anything.
>
> >   for (rm = refs; rm; rm = rm->next) {
> > + nr_refs++;
> >   if (rm->peer_ref &&
> >      !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
> >   continue;
> ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
> heads[nr_heads++] = rm;
> }
>
> What is the point of not asking for the refs that we know are the same?

I could see the advantage if the number of refs is really huge.  Wasn't
some converted repositories producing a lot of branches and/or tags
significantly slowing down a fetch operation?  Granted that was long ago
when that issue got "fixed" so the details are fuzzy to me.

> In other words, what breaks (not necessarily in the correctness sense, but
> also in the performance sense) if we get rid of this filtering altogether?

If you really want to get rid of that filtering, I'd still do it in a
separate patch.  That way if any issue appears because of that then
bissection will point directly to that removal alone.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Johan Herland
On Monday 24 August 2009, Nicolas Pitre wrote:

> On Sun, 23 Aug 2009, Junio C Hamano wrote:
> > Nicolas Pitre <[hidden email]> writes:
> > > If all refs sent by the remote repo during a fetch are reachable
> > > locally, then no further conversation is performed with the
> > > remote. This check is skipped when the --depth argument is
> > > provided to allow the deepening of a shallow clone which
> > > corresponding remote repo has no changed.
> > >
> > > However, some additional filtering was added in commit c29727d5
> > > to remove those refs which are equal on both sides.  If the
> > > remote repo has not changed, then the list of refs to give the
> > > remote process becomes empty and simply attempting to deepen a
> > > shallow repo always fails.
> > >
> > > Let's stop being smart in that case and simply send the whole
> > > list over when that condition is met.  The remote will do the
> > > right thing anyways.
> > >
> > > Test cases for this issue are also provided.
> > >
> > > Signed-off-by: Nicolas Pitre <[hidden email]>
> > > ---
> >
> > Thanks.  The fix looks correct (as usual with patches from you).
> >
> > But it makes me wonder if this logic to filter refs is buying us
> > anything.
> >
> > >   for (rm = refs; rm; rm = rm->next) {
> > > + nr_refs++;
> > >   if (rm->peer_ref &&
> > >      !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
> > >   continue;
> >
> > ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
> > heads[nr_heads++] = rm;
> > }
> >
> > What is the point of not asking for the refs that we know are the
> > same?
>
> I could see the advantage if the number of refs is really huge.
> Wasn't some converted repositories producing a lot of branches and/or
> tags significantly slowing down a fetch operation?  Granted that was
> long ago when that issue got "fixed" so the details are fuzzy to me.

I'm converting several CVS repos to Git with ~50 000 refs, so I'm happy
with any change that can speed things up for repos with many refs.

Right now, my biggest gripe is that a 'git push --mirror' on such a repo
can easily take ~10 min. even though the actual pack generation and
transfer only takes a couple of seconds. It seems like it needs ~10
minutes to generate the list of changed/added/deleted refs...
Unfortunately I haven't had time to look properly into it, yet...


...Johan

--
Johan Herland, <[hidden email]>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Daniel Barkalow
In reply to this post by Junio C Hamano
On Sun, 23 Aug 2009, Junio C Hamano wrote:

> Nicolas Pitre <[hidden email]> writes:
>
> > If all refs sent by the remote repo during a fetch are reachable
> > locally, then no further conversation is performed with the remote. This
> > check is skipped when the --depth argument is provided to allow the
> > deepening of a shallow clone which corresponding remote repo has no
> > changed.
> >
> > However, some additional filtering was added in commit c29727d5 to
> > remove those refs which are equal on both sides.  If the remote repo has
> > not changed, then the list of refs to give the remote process becomes
> > empty and simply attempting to deepen a shallow repo always fails.
> >
> > Let's stop being smart in that case and simply send the whole list over
> > when that condition is met.  The remote will do the right thing anyways.
> >
> > Test cases for this issue are also provided.
> >
> > Signed-off-by: Nicolas Pitre <[hidden email]>
> > ---
>
> Thanks.  The fix looks correct (as usual with patches from you).
>
> But it makes me wonder if this logic to filter refs is buying us anything.
>
> >   for (rm = refs; rm; rm = rm->next) {
> > + nr_refs++;
> >   if (rm->peer_ref &&
> >      !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
> >   continue;
> ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
> heads[nr_heads++] = rm;
> }
>
> What is the point of not asking for the refs that we know are the same?

This code is part of the original C implementation of fetch; I suspect the
optimization was somehow in the shell version and made sense there,
perhaps because there wasn't a quickfetch in the shell version or that
there was some non-negligable per-ref cost in the code around there, since
it was calling helper programs and such.

Anyway, I think it makes sense to remove the filtering from
transport_fetch_refs(), like your patch does.

If it makes a difference for the actual protocol, fetch_refs_via_pack()
could filter them at that stage.

        -Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Junio C Hamano
In reply to this post by Nicolas Pitre
Nicolas Pitre <[hidden email]> writes:

> If you really want to get rid of that filtering, I'd still do it in a
> separate patch.  That way if any issue appears because of that then
> bissection will point directly to that removal alone.

Fair enough.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Julian Phillips
In reply to this post by Daniel Barkalow
On Mon, 24 Aug 2009, Daniel Barkalow wrote:

> On Sun, 23 Aug 2009, Junio C Hamano wrote:
>
>> But it makes me wonder if this logic to filter refs is buying us anything.
>>
>>>   for (rm = refs; rm; rm = rm->next) {
>>> + nr_refs++;
>>>   if (rm->peer_ref &&
>>>      !hashcmp(rm->peer_ref->old_sha1, rm->old_sha1))
>>>   continue;
>> ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
>> heads[nr_heads++] = rm;
>> }
>>
>> What is the point of not asking for the refs that we know are the same?
>
> This code is part of the original C implementation of fetch; I suspect the
> optimization was somehow in the shell version and made sense there,
> perhaps because there wasn't a quickfetch in the shell version or that
> there was some non-negligable per-ref cost in the code around there, since
> it was calling helper programs and such.

I don't remember copying it from the shell version but my memory is
terrible, so I could easily be wrong.  The relevant commit message was:

"git-fetch2: remove ref_maps that are not interesting

Once we have the full list of ref_maps, remove any where the local
and remote sha1s are the same - as we don't need to do anything for
them."

So that doesn't help.  I was very concerned about performance though
(which was why I wanted fetch in C in the first place), so may have added
it to speed up fetches that have only updated a few refs - and I assume
that quickfetch was something that came along later after you absorbed the
work into the transport series?

> Anyway, I think it makes sense to remove the filtering from
> transport_fetch_refs(), like your patch does.
>
> If it makes a difference for the actual protocol, fetch_refs_via_pack()
> could filter them at that stage.

I think it would certainly be worth investigating the performance aspects
... no time tonight, but maybe tomorrow.

--
Julian

  ---
Some circumstantial evidence is very strong, as when you find a trout in
the milk.
  -- Thoreau
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Nicolas Pitre
On Mon, 24 Aug 2009, Julian Phillips wrote:

> On Mon, 24 Aug 2009, Daniel Barkalow wrote:
>
> > On Sun, 23 Aug 2009, Junio C Hamano wrote:
> >
> > > What is the point of not asking for the refs that we know are the same?
> >
> > This code is part of the original C implementation of fetch; I suspect the
> > optimization was somehow in the shell version and made sense there,
> > perhaps because there wasn't a quickfetch in the shell version or that
> > there was some non-negligable per-ref cost in the code around there, since
> > it was calling helper programs and such.
>
> I don't remember copying it from the shell version but my memory is terrible,
> so I could easily be wrong.  The relevant commit message was:
>
> "git-fetch2: remove ref_maps that are not interesting
>
> Once we have the full list of ref_maps, remove any where the local
> and remote sha1s are the same - as we don't need to do anything for
> them."
>
> So that doesn't help.  I was very concerned about performance though (which
> was why I wanted fetch in C in the first place), so may have added it to speed
> up fetches that have only updated a few refs - and I assume that quickfetch
> was something that came along later after you absorbed the work into the
> transport series?

Well... Johan Herland says he has to deal with repositories containing
around 50000 refs.  So in that case it is certainly a good idea not to
send the whole 50000 refs back if only one or two (or a hundred) need to
be updated.  And quickfetch() won't help in that case since its purpose
is only to determine if there is anything at all to update.

> > Anyway, I think it makes sense to remove the filtering from
> > transport_fetch_refs(), like your patch does.
> >
> > If it makes a difference for the actual protocol, fetch_refs_via_pack()
> > could filter them at that stage.
>
> I think it would certainly be worth investigating the performance aspects ...
> no time tonight, but maybe tomorrow.

50000 refs * 45 bytes each = 2.25 MB.  That's all wasted bandwidth if
only one ref needs updating.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Shawn Pearce
Nicolas Pitre <[hidden email]> wrote:
> Well... Johan Herland says he has to deal with repositories containing
> around 50000 refs.  So in that case it is certainly a good idea not to
> send the whole 50000 refs back if only one or two (or a hundred) need to
> be updated.  And quickfetch() won't help in that case since its purpose
> is only to determine if there is anything at all to update.
...
> 50000 refs * 45 bytes each = 2.25 MB.  That's all wasted bandwidth if
> only one ref needs updating.

Not just Johan Herland.  Gerrit Code Review creates a new ref
for every patch proposed for review.  Imagine taking every email
message on git ML that has "[PATCH]" in the subject, and creating
a new ref for that in a git.git clone.

We aren't quite at the 50k ref stage yet, but we're starting to
consider that some of our repositories have a ton of refs, and
that the initial advertisement for either fetch or push is horrid.

Since the refs are immutable I could actually teach the JGit
daemon to hide them from JGit's receive-pack, thus cutting down the
advertisement on push, but the refs exist so you can literally say:

  git fetch URL refs/changes/88/4488/2
  git show FETCH_HEAD

to inspect the "v2" version of whatever 4488 is, and if 4488 was
the last commit in a patch series, you'd also be able to do:

  git log -p --reverse ..FETCH_HEAD

to see the complete series.

Given how infrequent it is to grab a given change is though, I'm
starting to consider either a protocol extension that allows the
client to probe for a ref which wasn't in the initial advertisement,
or take it on a command line flag, e.g.:

  git fetch --uploadpack='git upload-pack --ref refs/changes/88/4488/2' URL refs/changes/88/4488/2

Personally I'd prefer extending the protocol, because making the
end user supply information twice is stupid.

I don't know enough about Johan's case though to know whether or
not he can get away with hiding the bulk of the refs in the initial
advertisement.  In the case of Gerrit Code Review, the bulk of the
refs is under refs/changes/, only a handful of things are under the
refs/heads/ and ref/tags/ namespace, and most fetches actually are
for only refs/heads/ and refs/tags/.  So hiding the refs/changes/
namespace would make large improvement in the advertisement cost.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Sverre Rabbelier-2
Heya,

On Mon, Aug 24, 2009 at 19:12, Shawn O. Pearce<[hidden email]> wrote:
>  git fetch --uploadpack='git upload-pack --ref refs/changes/88/4488/2' URL refs/changes/88/4488/2
>
> Personally I'd prefer extending the protocol, because making the
> end user supply information twice is stupid.

$ git fetch --unadvertised URL refs/changes/88/4488/2

And we expand that to the above behind the screens?

--
Cheers,

Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Junio C Hamano
In reply to this post by Shawn Pearce
"Shawn O. Pearce" <[hidden email]> writes:

> We aren't quite at the 50k ref stage yet, but we're starting to
> consider that some of our repositories have a ton of refs, and
> that the initial advertisement for either fetch or push is horrid.
>
> Since the refs are immutable I could actually teach the JGit
> daemon to hide them from JGit's receive-pack, thus cutting down the
> advertisement on push, but the refs exist so you can literally say:

What do you mean "refs are immutable"?

Do you mean "In the particular application, Gerrit, the server knows that
certain refs will never move nor get deleted, once they are created"?  If
so, then I would understand, but otherwise what you are describing is not
git anymore ;-)

And I think it is probably worth thinking things through to find a way to
take advantage of that knowledge.

Even though refs under refs/changes/ hierarchy may have that property, the
client won't know what's available unless you advertise it in some way.

You could assume some offline measure outside the git protocol exists for
clients to find out about them, and protocol extension could say "I do not
want to hear about refs that match these globs during this exchange,
because I have learnt about them offline", and the server could skip
advertisement.

>   git fetch --uploadpack='git upload-pack --ref refs/changes/88/4488/2' URL refs/changes/88/4488/2
>
> Personally I'd prefer extending the protocol, because making the
> end user supply information twice is stupid.

In the upload-pack protocol, the server talks first, so it is rather hard
to shoehorn a request from a client to ask "I know about refs/changes/*
hiearchy, so don't talk about them".

I however think it is entirely reasonable to have a server side
configuration that tells upload-pack not to advertise refs/changes/*
hierarchy but still remembers they are OUR_REF.  In send_ref() in
upload-pack.c, you'd do something like (I know, I know, you'd be doing
an equivalent of this in jgit):

        static const char *capabilities = "multi_ack ...";
        struct object *o = parse_object(sha1);
        int skip_advertisement = exclude_ref_from_advertisement(refname);

        if (!o)
                die("git upload-pack: cannot find object %s:", sha1_to_hex(sha1));

        if (!skip_advertisement) {
                if (capabilities)
                        packet_write(1, "%s %s%c%s\n", sha1_to_hex(sha1), refname,
                                0, capabilities);
                else
                        packet_write(1, "%s %s\n", sha1_to_hex(sha1), refname);
                capabilities = NULL;
        }

        if (!(o->flags & OUR_REF)) {
                o->flags |= OUR_REF;
                nr_our_refs++;
        }
        if (o->type == OBJ_TAG) {
                o = deref_tag(o, refname, 0);
                if (o && !skip_advertisement)
                        packet_write(1, "%s %s^{}\n", sha1_to_hex(o->sha1), refname);
        }
        return 0;

Doing it this way, receive_needs() will allow refs/changes/88/4488/2 to be
requested, because that is what send_ref() saw and marked as OUR_REF.  It
was just not sent to the client.  And get_common_commits() will behave the
same with or without this abbreviated advertisement,

Of course, the client side cannot grab everything with refs/*:refs/remotes/*
wildcard refspecs from such a server, but I think that can be considered a
feature.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Shawn Pearce
Junio C Hamano <[hidden email]> wrote:

> "Shawn O. Pearce" <[hidden email]> writes:
> > We aren't quite at the 50k ref stage yet, but we're starting to
> > consider that some of our repositories have a ton of refs, and
> > that the initial advertisement for either fetch or push is horrid.
> >
> > Since the refs are immutable I could actually teach the JGit
> > daemon to hide them from JGit's receive-pack, thus cutting down the
> > advertisement on push, but the refs exist so you can literally say:
>
> What do you mean "refs are immutable"?
>
> Do you mean "In the particular application, Gerrit, the server knows that
> certain refs will never move nor get deleted, once they are created"?  If
> so, then I would understand, but otherwise what you are describing is not
> git anymore ;-)

The former.  :-)

I mean that this particular server implementation will deny any
update made to refs/changes/, as if one had the following as the
update hook on that repository:

  #!/bin/sh
  case "$1" in
  refs/changes/*) exit 1;;
  *) exit 0;
  esac

This of course is completely legal, and since the server knows the
ref cannot be moved, there is no need to advertise it to the client.
But this is a very specialized thing, its rare that the thing that
formats the advertisement knows what the update hook will permit
to be modified.
 
> >   git fetch --uploadpack='git upload-pack --ref refs/changes/88/4488/2' URL refs/changes/88/4488/2
> >
> > Personally I'd prefer extending the protocol, because making the
> > end user supply information twice is stupid.
>
> In the upload-pack protocol, the server talks first, so it is rather hard
> to shoehorn a request from a client to ask "I know about refs/changes/*
> hiearchy, so don't talk about them".

Actually, that assumption is still a problem.

The client knows the *name* of the ref, but not the SHA-1 the ref is
currently valued at.  Thus when the client knows it wants a certain
ref by name, it needs to send a "want " line to the server that would
give it whatever that ref currently points at.  Unfortunately since we
have not obtained that value yet, we are stuck.

However, we do have one name we want to know about, but the server may
have 50k other names in the same namespace we do not know about.

I was thinking instead that we have a new protocol extension:

  S: ... HEAD\0side-band ... expand-refs
  S: ... refs/heads/master
  S: 0000

  C: ... expand refs/changes/88/4488/2
  C: 0000

  S: ... refs/changes/88/4488/2
  S: 0000

  C: ... want XXXXXX\0side-band-64k ...
 
> Of course, the client side cannot grab everything with refs/*:refs/remotes/*
> wildcard refspecs from such a server, but I think that can be considered a
> feature.

If expand accepted globs like fetch does, then fetch can ask for
expand of refs/changes/* if it does not see any refs/changes/*
on advertisement.  Or just expand a particular ref, or handful of
refs, that the user has asked for on the fetch line.

The problem with this is servers which are sending this expand-refs
tag have hidden certain namespaces from older clients.  Those names
can't be seen by older git clients, unless the user does an upgrade.

This might be OK for Gerrit Code Review's refs/changes/ namespace,
but it may not be good for others.


--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Junio C Hamano
"Shawn O. Pearce" <[hidden email]> writes:

> The client knows the *name* of the ref, but not the SHA-1 the ref is
> currently valued at.  Thus when the client knows it wants a certain
> ref by name, it needs to send a "want " line to the server that would
> give it whatever that ref currently points at.  Unfortunately since we
> have not obtained that value yet, we are stuck.

That could be something you can fix in the out-of-band procedure Gerrit
uses (you let the client learn both name and value offline, and then the
client uses that value on "want" line).

However, even if we limit the discussion to Gerrit, you would need an
updated client that can be called with the out-of-band information
(i.e. "we know that changes/88/4488/2 points at X, so use X when
requesting") when talking with such an updated server.

So I think that expand-refs is a much nicer general solution than just
"server side is configured to hide but still allow certain refs", and
client updates cannot be avoided.

And again,

> The problem with this is servers which are sending this expand-refs
> tag have hidden certain namespaces from older clients.  Those names
> can't be seen by older git clients, unless the user does an upgrade.

I do not think "generally hidden, but if you need to know you are allowed
to peek" is much of a problem.  You do not do that for regular refs, only
for "on-demand-as-needed" type things.  If we are going to make extensive
use of notes on commits to give richer annotations, I suspect notes
hierarchy could be hidden by default in a similar way.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Shawn Pearce
Junio C Hamano <[hidden email]> wrote:

> "Shawn O. Pearce" <[hidden email]> writes:
> > The client knows the *name* of the ref, but not the SHA-1 the ref is
> > currently valued at.  Thus when the client knows it wants a certain
> > ref by name, it needs to send a "want " line to the server that would
> > give it whatever that ref currently points at.  Unfortunately since we
> > have not obtained that value yet, we are stuck.
>
> That could be something you can fix in the out-of-band procedure Gerrit
> uses (you let the client learn both name and value offline, and then the
> client uses that value on "want" line).

Well, we're trying to reduce out-of-band things with Gerrit.

Its bad enough that Gerrit doesn't use git am and git send-email
to slug around changes for discussion.  As it is we're an island
among the git world, *despite* the fact that Gerrit speaks the git
protocol natively and you can push directly to it, avoiding the
send-email SMTP nonsense many folks run into.
 
> However, even if we limit the discussion to Gerrit, you would need an
> updated client that can be called with the out-of-band information
> (i.e. "we know that changes/88/4488/2 points at X, so use X when
> requesting") when talking with such an updated server.

Yes, exactly.  Existing clients wouldn't send an arbitrary want
request, even if the server had a whitelist of objects it would
allow to be requested.

One reason why Gerrit publishes pending changes with ref names is to
make it easier for any user to obtain the proposed change locally.
Its hard to beat `git fetch URL blah`, that's even easier than
"save to mbox, git am mbox".
 
> So I think that expand-refs is a much nicer general solution than just
> "server side is configured to hide but still allow certain refs", and
> client updates cannot be avoided.

Yes, I agree.  Given 20/20 hindsight, its the way the protocol
should have been implemented:

  C: 0014expand refs/heads/*
  C: 0013expand refs/tags/*
  C: 0000

  S: ...refs/heads/master
  S: ...refs/heads/next
  S: ...refs/tags/v1.0
  S: 0000

This would have permitted clients doing `git pull URL for-linus` to say:

  C: 0011expand for-linus
  C: 0000

  S: ...refs/heads/for-linus
  S: ...refs/remotes/k26/for-linus
  S: 0000

and thus significantly narrow the scope of what they are shown when
they connect for a given ref.

> > The problem with this is servers which are sending this expand-refs
> > tag have hidden certain namespaces from older clients.  Those names
> > can't be seen by older git clients, unless the user does an upgrade.
>
> I do not think "generally hidden, but if you need to know you are allowed
> to peek" is much of a problem.  You do not do that for regular refs, only
> for "on-demand-as-needed" type things.  If we are going to make extensive
> use of notes on commits to give richer annotations, I suspect notes
> hierarchy could be hidden by default in a similar way.

After sleeping on it, I'm OK with hiding some refs from older clients.

Sometimes things evolve, and you should just update your software
to keep up with them.  If you really want the "hidden refs" that
Gerrit advertises, you should install a newer client.

We could consider supporting a legacy option through upload-pack,
such as:

  git fetch --upload-pack='git-upload-pack --expand refs/changes/' URL

which tells the remote side to additionally expand those refs during
the initial advertisement.  Then users have an escape hatch if:

* They know the remote is new enough to hide refs;
* They suspect the remote is hiding refs;
* They received an out-of-band notification telling this;
* They have an older client which doesn't support expanding refs;
* They cannot upgrade said client yet;

I'm thinking about writing an RFC patch for this today for git.git.
I think the expand refs feature neatly solves a number of problems
for me in Gerrit.  But I'm really hoping its not the only set of
repositories that would benefit from such a feature, because if so,
its not worth the headache of the protocol change.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Shawn Pearce
"Shawn O. Pearce" <[hidden email]> wrote:
> Junio C Hamano <[hidden email]> wrote:
> > So I think that expand-refs is a much nicer general solution than just
> > "server side is configured to hide but still allow certain refs", and
> > client updates cannot be avoided.
>
> Yes, I agree.
...
> I'm thinking about writing an RFC patch for this today for git.git.

RFC patch below.  This is only the upload-pack side, and lacks
test, etc, so its posted *ONLY* for discussion.  I'll try to flesh
it out further tomorrow into something that could be considered
more seriously.

--8<--
[RFC] upload-pack: expand capability advertises additional refs

The expand capability and associated command permits the client
to ask for information about refs which were not in the initial
advertisement sent when the connection was first opened.

In the below exchange the server initially only advertises its
current HEAD, refs/heads and refs/tags namespaces.  However,
the client has been instructed to fetch anything which matches
refs/remotes/jc/*.

Since no matching refs appeared in the initial advertisement,
the client requests the server to expand the desired pattern,
and terminates its expand request list with a flush.

Upon receiving a flush from the client, the server displays any
local refs which match any of the expand patterns requested,
and then closes this secondary advertisement list with a flush.
If no refs matched, the server immediately returns a flush.

If multiple expand patterns match the same ref, the ref is returned
only once in the secondary advertisement, avoid confusing the client
with duplicate results.

  S: 008f... HEAD\0...include-tag expand
  S: 0043... refs/heads/build-next
  S: 0040... refs/tags/v1.6.4.1
  S: 0043... refs/tags/v1.6.4.1^{}
  S: 0000

  C: 001dexpand refs/remotes/jc/*
  C: 0000

  S: 0043... refs/remotes/jc/maint
  S: 0044... refs/remotes/jc/master
  S: 0000

  C: 0031want ...
  C: 0000

Signed-off-by: Shawn O. Pearce <[hidden email]>
Signed-off-by: Shawn O. Pearce <[hidden email]>
---
 upload-pack.c |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 4d8be83..e1ec608 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -10,6 +10,8 @@
 #include "revision.h"
 #include "list-objects.h"
 #include "run-command.h"
+#include "remote.h"
+#include "string-list.h"
 
 static const char upload_pack_usage[] = "git upload-pack [--strict] [--timeout=nn] <dir>";
 
@@ -30,6 +32,18 @@ static int multi_ack, nr_our_refs;
 static int use_thin_pack, use_ofs_delta, use_include_tag;
 static int no_progress, daemon_mode;
 static int shallow_nr;
+
+struct adv_ref {
+ struct adv_ref *next;
+ char *name;
+ unsigned pattern:1;
+};
+static struct adv_ref *to_advertise;
+static struct adv_ref **advertise_tail = &to_advertise;
+
+static struct ref *local_refs;
+static struct ref **refs_tail = &local_refs;
+
 static struct object_array have_obj;
 static struct object_array want_obj;
 static unsigned int timeout;
@@ -470,6 +484,17 @@ static int get_common_commits(void)
  }
 }
 
+static void push_advertise(const char *name)
+{
+ struct adv_ref *adv = xcalloc(1, sizeof(*adv));
+ adv->name = xstrdup(name);
+ adv->pattern = !!strchr(adv->name, '*');
+ *advertise_tail = adv;
+ advertise_tail = &adv->next;
+}
+
+static void send_refs(void);
+
 static void receive_needs(void)
 {
  struct object_array shallows = {0, 0, NULL};
@@ -484,11 +509,22 @@ static void receive_needs(void)
  unsigned char sha1_buf[20];
  len = packet_read_line(0, line, sizeof(line));
  reset_timeout();
- if (!len)
+ if (!len) {
+ if (to_advertise) {
+ send_refs();
+ continue;
+ }
  break;
+ }
  if (debug_fd)
  write_in_full(debug_fd, line, len);
 
+ if (!prefixcmp(line, "expand ")) {
+ if (line[len - 1] == '\n')
+ line[len - 1] = 0;
+ push_advertise(line + 7);
+ continue;
+ }
  if (!prefixcmp(line, "shallow ")) {
  unsigned char sha1[20];
  struct object *object;
@@ -603,11 +639,14 @@ static void receive_needs(void)
  free(shallows.objects);
 }
 
-static int send_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
+static int send_ref(struct string_list_item *item, void *cb_data)
 {
  static const char *capabilities = "multi_ack thin-pack side-band"
  " side-band-64k ofs-delta shallow no-progress"
- " include-tag";
+ " include-tag expand";
+ const struct ref *ref = item->util;
+ const char *refname = ref->name;
+ const unsigned char *sha1 = ref->new_sha1;
  struct object *o = parse_object(sha1);
 
  if (!o)
@@ -631,12 +670,58 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
  return 0;
 }
 
+static void send_refs(void)
+{
+ struct ref *to_send = NULL, **tail = &to_send;
+ struct ref *ref;
+ struct adv_ref *adv, *next_adv;
+ struct string_list sorted_names;
+
+ for (adv = to_advertise; adv; adv = next_adv) {
+ struct refspec spec;
+
+ memset(&spec, 0, sizeof(spec));
+ spec.pattern = adv->pattern;
+ spec.src = adv->name;
+ spec.dst = adv->name;
+ next_adv = adv->next;
+ get_fetch_map(local_refs, &spec, &tail, 1);
+
+ free(adv->name);
+ free(adv);
+ }
+ to_advertise = NULL;
+ advertise_tail = &to_advertise;
+
+ memset(&sorted_names, 0, sizeof(sorted_names));
+ for (ref = to_send; ref; ref = ref->next)
+ string_list_insert(ref->name, &sorted_names)->util = ref;
+ for_each_string_list(send_ref, &sorted_names, NULL);
+ string_list_clear(&sorted_names, 0);
+ free_refs(to_send);
+ packet_flush(1);
+}
+
+static int scan_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
+{
+ struct ref *r = alloc_ref(refname);
+ hashcpy(r->new_sha1, sha1);
+ *refs_tail = r;
+ refs_tail = &r->next;
+ return 0;
+}
+
 static void upload_pack(void)
 {
  reset_timeout();
- head_ref(send_ref, NULL);
- for_each_ref(send_ref, NULL);
- packet_flush(1);
+ head_ref(scan_ref, NULL);
+ for_each_ref(scan_ref, NULL);
+
+ push_advertise("HEAD");
+ push_advertise("refs/heads/*");
+ push_advertise("refs/tags/*");
+ send_refs();
+
  receive_needs();
  if (want_obj.nr) {
  get_common_commits();
--
1.6.4.1.331.gda1d56

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Johannes Sixt-2
Shawn O. Pearce schrieb:

>  static void upload_pack(void)
>  {
>   reset_timeout();
> - head_ref(send_ref, NULL);
> - for_each_ref(send_ref, NULL);
> - packet_flush(1);
> + head_ref(scan_ref, NULL);
> + for_each_ref(scan_ref, NULL);
> +
> + push_advertise("HEAD");
> + push_advertise("refs/heads/*");
> + push_advertise("refs/tags/*");
> + send_refs();
> +

How will this mesh with 'git clone --mirror'? Is the client expected to
ask with 'expand refs/*'?

-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Shawn Pearce
Johannes Sixt <[hidden email]> wrote:

> Shawn O. Pearce schrieb:
> >  static void upload_pack(void)
> >  {
> >   reset_timeout();
> > - head_ref(send_ref, NULL);
> > - for_each_ref(send_ref, NULL);
> > - packet_flush(1);
> > + head_ref(scan_ref, NULL);
> > + for_each_ref(scan_ref, NULL);
> > +
> > + push_advertise("HEAD");
> > + push_advertise("refs/heads/*");
> > + push_advertise("refs/tags/*");
> > + send_refs();
> > +
>
> How will this mesh with 'git clone --mirror'?

Not well.

> Is the client expected to
> ask with 'expand refs/*'?

If the client is new enough to understand the "expand" extension,
yes, it would ask for "expand refs/*" and get everthing, allowing
it to complete a full mirror.

If the client is older and doesn't know this extension, then it
is hopeless.  The server isn't advertising refs/*, doesn't know
that the client can't ask for refs/*, and the client doesn't know
it is missing refs when it makes the mirror.

This backwards incompatible breakage is something I have no real
solution for.  :-|

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Junio C Hamano
"Shawn O. Pearce" <[hidden email]> writes:

>> How will this mesh with 'git clone --mirror'?
>
> Not well.
>
>> Is the client expected to
>> ask with 'expand refs/*'?
>
> If the client is new enough to understand the "expand" extension,
> yes, it would ask for "expand refs/*" and get everthing, allowing
> it to complete a full mirror.
>
> If the client is older and doesn't know this extension, then it
> is hopeless.  The server isn't advertising refs/*, doesn't know
> that the client can't ask for refs/*, and the client doesn't know
> it is missing refs when it makes the mirror.
>
> This backwards incompatible breakage is something I have no real
> solution for.  :-|

But we at least can assume that the server operator is reasonable and
wouldn't go overboard, (ab)using this "abbreviated advertisement" feature
to hide heads and tags from the clients.  I would suspect that a server
operated by such a sick person who hides these normal refs will be shunned
by users so it won't either happen in practice, or even if it happens, it
won't be a problem---simply because nobody would want to interact with
such a server.  

Think about in what situation you would want to do a mirror clone.  The
most obvious is to have a back-up or secondary distribution point, and I
do not think of any other sane reason (other than "because I can", which
does not count).  If the original repository has so many refs to benefit
from the "abbreviated advertisement" feature (otherwise there is no point
using it in the first place), then its mirror repository would also want
to use the feature when talking to its clients, acting as a back-up
distribution point.  That means the version of git used to prime, update
and serve the mirror will know the expand extention.

I am hoping that we can finish 1.6.5 by mid September (let's tentatively
say we will shoot for 16th).  I expect the expand extention to be in
'next' by that time, cooking for 1.7.0.  How does that timetable sound?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fix simple deepening of a repo

Shawn Pearce
Junio C Hamano <[hidden email]> wrote:
> "Shawn O. Pearce" <[hidden email]> writes:
> >> How will this mesh with 'git clone --mirror'?
> >
> > Not well.
>
> But we at least can assume that the server operator is reasonable and
> wouldn't go overboard, (ab)using this "abbreviated advertisement" feature
> to hide heads and tags from the clients.

Yes.  My patch is hardcoded to show only heads and tags, and nothing else.

But I think we want to make this configurable, and show everything
by default, but if there is a configuration entry, show only what
the configuration entry patterns suggest to advertise.

Thus an admin could hide refs/heads/*, but maybe he wants to, and
show only refs/heads/master, refs/heads/maint, refs/heads/next by
default.  This is actually a rather clear indication to a client
that although there may be individual cooking topics scattered
through the expanded refs/heads/* space, any reasonable default
clone wouldn't take them.

> Think about in what situation you would want to do a mirror clone.
...
> That means the version of git used to prime, update
> and serve the mirror will know the expand extention.

Great point Junio.  The backwards compatibility may be a non-issue
then, especially if this is configurable and we advertise refs/*
by default like we do now, and any reasonable admin who does enable
the hiding still advertises the core namspaces that really matter
to the majority of clients.

> I am hoping that we can finish 1.6.5 by mid September (let's tentatively
> say we will shoot for 16th).  I expect the expand extention to be in
> 'next' by that time, cooking for 1.7.0.  How does that timetable sound?

Oh, if 1.6.5 is mid-September, this is certainly not 1.6.5 material.
I'm not in any rush, this should go in when its ready, but 1.7
might be reasonable.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
12