Quantcast

Clone fails on a repo with too many heads/tags

classic Classic list List threaded Threaded
54 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Clone fails on a repo with too many heads/tags

Ivan Todoroski
I recently tried cloning a fresh copy of a large repo (converted from CVS,
nearly 10 years of history) and to my surprise "git clone" failed with the
following message:

    error: cannot spawn git: No such file or directory

The problem is only reproduced using the Smart HTTP transport.

I used msysGit on Windows so my first instinct was to contact them, but after
some poking around I discovered that the problem is present in the Linux
version too, although harder to trigger.

Try executing this script:

-------------------------------
git init too-many-refs
cd too-many-refs
echo bla > bla.txt
git add .
git commit -m test
sha=$(git rev-parse HEAD)
for ((i=0; i<100000; i++)); do
        echo $sha refs/tags/artificially-long-tag-name-to-more-easily-
demonstrate-the-problem-$i >> .git/packed-refs
done
-------------------------------

Now share this repo using the Smart HTTP transport (git-http-backend) and then
try cloning it in a different directory. This is what you would get:

$ git clone http://localhost/.../too-many-refs/.git
Cloning into 'too-many-refs'...
fatal: cannot exec 'fetch-pack': Argument list too long

So we come to the real reason for the failure: somewhere inside Git a
subcommand is invoked with all the tags/heads on the command line and if you
have enough of them it overflows the command line length limit of the OS.

Obviously the number of tags in the "too-many-refs" repo above is absurd (100k)
because the cmdline length in Linux is much more generous, but on Windows the
clone fails with as little as 500 tags in the above loop! I am already hitting
this problem with msysGit on real repos, not just artificial test cases.

I tracked down the problem to remote-curl.c:fetch_git(). That's where the
"fetch-pack" command line is being constructed with all the refs on one line:

git fetch-pack --stateless-rpc --lock-pack ...<all the refs>...

The solution is conceptually simple: if the list of refs results in a too long
command line, split the refs in batches and call fetch-pack multiple times such
that each call is under the cmdline limit:

git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
...
git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Ivan Todoroski
Ivan Todoroski <grnch_lists <at> gmx.net> writes:
> Now share this repo using the Smart HTTP transport (git-http-backend) and
then
> try cloning it in a different directory. This is what you would get:
>
> $ git clone http://localhost/.../too-many-refs/.git
> Cloning into 'too-many-refs'...
> fatal: cannot exec 'fetch-pack': Argument list too long
>
> [...]
>
> The solution is conceptually simple: if the list of refs results in a too
long
> command line, split the refs in batches and call fetch-pack multiple times
such
> that each call is under the cmdline limit:
>
> git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
> git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
> ...
> git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...


BTW, I didn't want to sound like I am expecting or demanding a fix. If the
experienced Git devs lack the time or inclination to work on this bug
(understandable), I am certainly willing to try it myself. My C skills are a
bit rusty and I'm not very familiar with the Git codebase, but I will do my
best to follow Documentation/SubmittingPatches as well as the existing code
structure.

I will need a few pointers to get me started in the right direction though...


1) Is splitting the cmdline in batches and executing fetch-pack multiple times
the right approach? If you have another solution please suggest.


2) Should I add the test case for this bug to existing scripts like t/t5551-
http-fetch.sh and t/t5561-http-backend.sh, or should I create a new test script
under t/ following their example? There will probably be only one test case for
this bug, basically the script I pasted in the original email to reproduce it.


3) What would be the most portable way to get the cmdline length limit between
POSIX and Windows? Would something like this be acceptable:

#ifder _WIN32
        int cmdline_limit = 32767;
#else
        int cmdline_limit = sysconf(_SC_ARG_MAX);
#endif

I couldn't actually find a Windows API to get the cmdline limit, but this blog
post by one of the Windows people tells the value:

http://blogs.msdn.com/b/oldnewthing/archive/2003/12/10/56028.aspx


4) Should this problem be fixed only in remote-curl.c:fetch_git() or should it
be solved more generally in run-command.c:start_command(), which is used by
fetch_git() for the actual invocation?

If this is fixed only in remote-curl:fetch_git(), then the same logic would
need to be open coded in any other such place that might be found. Are you
aware of any other internal sub-commands that put all refs on the command line
and could be susceptible to the same issue?


If it's fixed at a lower level in run-command.c:start_command(), the logic
would become available to any other sub-command that needs it.

However, this would mean that struct child_process as well as struct rpc_state
would need an additional field that would tell whether the command is safe to
execute in multiple batches and how many of the arguments at the beginning of
child_process.argv must be preserved on every invocation (the switches and
such).

Something like child_process.split_after, which if non-zero would mean that
start_command() is free to invoke the command multiple times when argv exceeds
the cmdline limit, by grouping any arguments after argv[split_after] in smaller
batches for each invocation.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Nguyễn Thái Ngọc Duy
I had a look but could not come up with a simple solution for this. CC
Shawn who knows smart-http really well.

On Sun, Mar 18, 2012 at 6:37 PM, Ivan Todoroski <[hidden email]> wrote:

> Ivan Todoroski <grnch_lists <at> gmx.net> writes:
>> Now share this repo using the Smart HTTP transport (git-http-backend) and
> then
>> try cloning it in a different directory. This is what you would get:
>>
>> $ git clone http://localhost/.../too-many-refs/.git
>> Cloning into 'too-many-refs'...
>> fatal: cannot exec 'fetch-pack': Argument list too long
>>
>> [...]
>>
>> The solution is conceptually simple: if the list of refs results in a too
> long
>> command line, split the refs in batches and call fetch-pack multiple times
> such
>> that each call is under the cmdline limit:
>>
>> git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
>> git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
>> ...
>> git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...
>
>
> BTW, I didn't want to sound like I am expecting or demanding a fix. If the
> experienced Git devs lack the time or inclination to work on this bug
> (understandable), I am certainly willing to try it myself. My C skills are a
> bit rusty and I'm not very familiar with the Git codebase, but I will do my
> best to follow Documentation/SubmittingPatches as well as the existing code
> structure.
>
> I will need a few pointers to get me started in the right direction though...
>
>
> 1) Is splitting the cmdline in batches and executing fetch-pack multiple times
> the right approach? If you have another solution please suggest.
>
>
> 2) Should I add the test case for this bug to existing scripts like t/t5551-
> http-fetch.sh and t/t5561-http-backend.sh, or should I create a new test script
> under t/ following their example? There will probably be only one test case for
> this bug, basically the script I pasted in the original email to reproduce it.
>
>
> 3) What would be the most portable way to get the cmdline length limit between
> POSIX and Windows? Would something like this be acceptable:
>
> #ifder _WIN32
>        int cmdline_limit = 32767;
> #else
>        int cmdline_limit = sysconf(_SC_ARG_MAX);
> #endif
>
> I couldn't actually find a Windows API to get the cmdline limit, but this blog
> post by one of the Windows people tells the value:
>
> http://blogs.msdn.com/b/oldnewthing/archive/2003/12/10/56028.aspx
>
>
> 4) Should this problem be fixed only in remote-curl.c:fetch_git() or should it
> be solved more generally in run-command.c:start_command(), which is used by
> fetch_git() for the actual invocation?
>
> If this is fixed only in remote-curl:fetch_git(), then the same logic would
> need to be open coded in any other such place that might be found. Are you
> aware of any other internal sub-commands that put all refs on the command line
> and could be susceptible to the same issue?
>
>
> If it's fixed at a lower level in run-command.c:start_command(), the logic
> would become available to any other sub-command that needs it.
>
> However, this would mean that struct child_process as well as struct rpc_state
> would need an additional field that would tell whether the command is safe to
> execute in multiple batches and how many of the arguments at the beginning of
> child_process.argv must be preserved on every invocation (the switches and
> such).
>
> Something like child_process.split_after, which if non-zero would mean that
> start_command() is free to invoke the command multiple times when argv exceeds
> the cmdline limit, by grouping any arguments after argv[split_after] in smaller
> batches for each invocation.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Jakub Narębski
In reply to this post by Ivan Todoroski
Ivan Todoroski <[hidden email]> writes:

> I recently tried cloning a fresh copy of a large repo (converted from CVS,
> nearly 10 years of history) and to my surprise "git clone" failed with the
> following message:
>
>     error: cannot spawn git: No such file or directory
>
> The problem is only reproduced using the Smart HTTP transport.

[...]

> I tracked down the problem to remote-curl.c:fetch_git(). That's where the
> "fetch-pack" command line is being constructed with all the refs on one line:
>
> git fetch-pack --stateless-rpc --lock-pack ...<all the refs>...
>
> The solution is conceptually simple: if the list of refs results in a too long
> command line, split the refs in batches and call fetch-pack multiple times such
> that each call is under the cmdline limit:
>
> git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
> git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
> ...
> git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...

That, or implement --stdin / --revs in git-fetch-pach (perhaps
following git-pack-objects that implements --revs).

--
Jakub Narebski

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Jeff King
On Sun, Mar 18, 2012 at 09:36:24AM -0700, Jakub Narebski wrote:

> > The solution is conceptually simple: if the list of refs results in a too long
> > command line, split the refs in batches and call fetch-pack multiple times such
> > that each call is under the cmdline limit:
> >
> > git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
> > git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
> > ...
> > git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...
>
> That, or implement --stdin / --revs in git-fetch-pach (perhaps
> following git-pack-objects that implements --revs).

I don't think that will work, as stateless-rpc fetch-pack already uses
stdin to receive the list of advertised refs from the remote. Nor would
you want to have multiple invocations of fetch-pack, since that would
mean multiple http requests and multiple pack responses (which could not
delta between themselves).

And you can't condense the list in the general case. It is the set of
refs that we actually want to fetch. We could try passing just the
original refspecs (not the expansion) and letting fetch-pack try to do
the expansion, but in the worst case, you might really just have a
gigantic list of refs.

I think the only sane solution is to write the values to a temporary
file, and do something like:

  git fetch-pack --stateless-rpc --refs-from=$tmpfile

Even if you put the tmpfile in $GIT_DIR, I don't think this should run
afoul of any read-only repositories, since by definition you are
fetching into the repository (but you could also just put it in /tmp).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Jakub Narębski
Jeff King wrote:
[...]
> I think the only sane solution is to write the values to a temporary
> file, and do something like:
>
>   git fetch-pack --stateless-rpc --refs-from=$tmpfile
>
> Even if you put the tmpfile in $GIT_DIR, I don't think this should run
> afoul of any read-only repositories, since by definition you are
> fetching into the repository (but you could also just put it in /tmp).

Or

   git fetch-pack --stateless-rpc --refs-fd=$n

and there would be no need for temporary file.

--
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Ivan Todoroski
In reply to this post by Jeff King
On 18.03.2012 20:07, Jeff King wrote:
> Nor would you want to have multiple invocations of fetch-pack, since
> that would mean multiple http requests and multiple pack responses
> (which could not delta between themselves).

Sure, but if fetch-pack is called with so many refs that it overflows
the command line, it's looking at a lot of work ahead of it anyway, it's
not going to be a fast operation. So the overhead of multiple HTTP
requests and lack of delta compression between huge batches of refs
wouldn't be all that significant in practice.

That said, I really like your temp file idea below. It's the best
solution so far. Simple, efficient, non-intrusive.


> I think the only sane solution is to write the values to a temporary
> file, and do something like:
>
> git fetch-pack --stateless-rpc --refs-from=$tmpfile
>
> Even if you put the tmpfile in $GIT_DIR, I don't think this should
> run afoul of any read-only repositories, since by definition you are
> fetching into the repository (but you could also just put it in
> /tmp).

OK, if nobody beats me to it I will try to code this next weekend. Maybe
sooner, but I don't usually have much free time over the week (day job,
family, etc.).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Nguyễn Thái Ngọc Duy
In reply to this post by Jeff King
On Mon, Mar 19, 2012 at 2:07 AM, Jeff King <[hidden email]> wrote:
> I don't think that will work, as stateless-rpc fetch-pack already uses
> stdin to receive the list of advertised refs from the remote. Nor would
> you want to have multiple invocations of fetch-pack, since that would
> mean multiple http requests and multiple pack responses (which could not
> delta between themselves).

remote-curl functions as middle man between http client and
fetch-pack. Can we just send ref list over fetch-pack.stdin first,
followed by maybe empty line, then normal stuff that remote-curl feeds
fetch-pack?
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Jeff King
In reply to this post by Jakub Narębski
On Sun, Mar 18, 2012 at 11:07:42PM +0100, Jakub Narebski wrote:

> > I think the only sane solution is to write the values to a temporary
> > file, and do something like:
> >
> >   git fetch-pack --stateless-rpc --refs-from=$tmpfile
> >
> > Even if you put the tmpfile in $GIT_DIR, I don't think this should run
> > afoul of any read-only repositories, since by definition you are
> > fetching into the repository (but you could also just put it in /tmp).
>
>    git fetch-pack --stateless-rpc --refs-fd=$n
>
> and there would be no need for temporary file.

Yeah. That is a slightly more awkward interface for command-line users,
but this is really meant to be an internal-to-git thing (just like
stateless-rpc is in the first place). And avoiding a tempfile is a good
thing.

However, I think you would need to teach the run-command API how to open
extra pipes to a child beyond stdout/stdin/stderr.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Nguyễn Thái Ngọc Duy
On Mon, Mar 19, 2012 at 9:32 AM, Jeff King <[hidden email]> wrote:
> However, I think you would need to teach the run-command API how to open
> extra pipes to a child beyond stdout/stdin/stderr.

That might be a problem on Windows. I think Windows exec* has special
support for stdin/stdout/stderr and only those.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Jeff King
In reply to this post by Nguyễn Thái Ngọc Duy
On Mon, Mar 19, 2012 at 08:30:38AM +0700, Nguyen Thai Ngoc Duy wrote:

> On Mon, Mar 19, 2012 at 2:07 AM, Jeff King <[hidden email]> wrote:
> > I don't think that will work, as stateless-rpc fetch-pack already uses
> > stdin to receive the list of advertised refs from the remote. Nor would
> > you want to have multiple invocations of fetch-pack, since that would
> > mean multiple http requests and multiple pack responses (which could not
> > delta between themselves).
>
> remote-curl functions as middle man between http client and
> fetch-pack. Can we just send ref list over fetch-pack.stdin first,
> followed by maybe empty line, then normal stuff that remote-curl feeds
> fetch-pack?

Yes, I think that would work. You'd just have to take care to pass the
residual buffer (i.e., what is left in your input buffer after you
notice that reading the list of wanted refs is finished) along to
the git-protocol code.  So it would require a little refactoring of
get_remote_heads, I think.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Jeff King
In reply to this post by Nguyễn Thái Ngọc Duy
On Mon, Mar 19, 2012 at 09:43:05AM +0700, Nguyen Thai Ngoc Duy wrote:

> On Mon, Mar 19, 2012 at 9:32 AM, Jeff King <[hidden email]> wrote:
> > However, I think you would need to teach the run-command API how to open
> > extra pipes to a child beyond stdout/stdin/stderr.
>
> That might be a problem on Windows. I think Windows exec* has special
> support for stdin/stdout/stderr and only those.

Then your sneak-it-across-stdin trick is probably the best bet. It's a
little ugly, but this is for stateless-rpc, so it's not like anything
outside of git is going to actually see this.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Ivan Todoroski
In reply to this post by Jeff King
On 19.03.2012 03:44, Jeff King wrote:

> On Mon, Mar 19, 2012 at 08:30:38AM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> On Mon, Mar 19, 2012 at 2:07 AM, Jeff King <[hidden email]> wrote:
>>> I don't think that will work, as stateless-rpc fetch-pack already uses
>>> stdin to receive the list of advertised refs from the remote. Nor would
>>> you want to have multiple invocations of fetch-pack, since that would
>>> mean multiple http requests and multiple pack responses (which could not
>>> delta between themselves).
>> remote-curl functions as middle man between http client and
>> fetch-pack. Can we just send ref list over fetch-pack.stdin first,
>> followed by maybe empty line, then normal stuff that remote-curl feeds
>> fetch-pack?
>
> Yes, I think that would work. You'd just have to take care to pass the
> residual buffer (i.e., what is left in your input buffer after you
> notice that reading the list of wanted refs is finished) along to
> the git-protocol code.  So it would require a little refactoring of
> get_remote_heads, I think.

Would it be OK for fetch-pack.c to use the packetized format
(pkt-line.h) for reading the list of refs from stdin?

This way we can read() the exact number of bytes needed for the refs
from the fd and there would be no need to refactor get_remote_heads() to
pass a residual buffer, it could just continue reading straight from the
same fd.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Shawn Pearce
On Wed, Mar 21, 2012 at 04:05, Ivan Todoroski <[hidden email]> wrote:

> On 19.03.2012 03:44, Jeff King wrote:
>> Yes, I think that would work. You'd just have to take care to pass the
>> residual buffer (i.e., what is left in your input buffer after you
>> notice that reading the list of wanted refs is finished) along to
>> the git-protocol code.  So it would require a little refactoring of
>> get_remote_heads, I think.
>
>
> Would it be OK for fetch-pack.c to use the packetized format (pkt-line.h)
> for reading the list of refs from stdin?

This is probably the easiest way to implement the sneak-into-stdin
patch. Use a pkt-line for each argument that should have been in the
argv array from the command line, and a flush pkt to terminate the
list.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Jeff King
On Wed, Mar 21, 2012 at 07:28:24AM -0700, Shawn O. Pearce wrote:

> > Would it be OK for fetch-pack.c to use the packetized format (pkt-line.h)
> > for reading the list of refs from stdin?
>
> This is probably the easiest way to implement the sneak-into-stdin
> patch. Use a pkt-line for each argument that should have been in the
> argv array from the command line, and a flush pkt to terminate the
> list.

Something in me feels slightly uncomfortable with that, just because
simple newline-delimited formats make it easy for people to hack on the
tool and feed input from unexpected sources.

But stateless-rpc is already a pretty deep internal interface anyway,
and the format is already weird (the second half is already packetized
input from a remote anyway). So it's probably not worth caring about
hackability here.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Jakub Narębski
Jeff King wrote:

> On Wed, Mar 21, 2012 at 07:28:24AM -0700, Shawn O. Pearce wrote:
>
> > > Would it be OK for fetch-pack.c to use the packetized format (pkt-line.h)
> > > for reading the list of refs from stdin?
> >
> > This is probably the easiest way to implement the sneak-into-stdin
> > patch. Use a pkt-line for each argument that should have been in the
> > argv array from the command line, and a flush pkt to terminate the
> > list.
>
> Something in me feels slightly uncomfortable with that, just because
> simple newline-delimited formats make it easy for people to hack on the
> tool and feed input from unexpected sources.

Well, give people "pkt-line" command line tool which would transform
ordinary textual output on input into pkt-line (which is almost pure
text anyway) output and vice versa.

> But stateless-rpc is already a pretty deep internal interface anyway,
> and the format is already weird (the second half is already packetized
> input from a remote anyway). So it's probably not worth caring about
> hackability here.

Right.

--
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Ivan Todoroski
In reply to this post by Jeff King
On 21.03.2012 18:14, Jeff King wrote:

> On Wed, Mar 21, 2012 at 07:28:24AM -0700, Shawn O. Pearce wrote:
>
>>> Would it be OK for fetch-pack.c to use the packetized format (pkt-line.h)
>>> for reading the list of refs from stdin?
>> This is probably the easiest way to implement the sneak-into-stdin
>> patch. Use a pkt-line for each argument that should have been in the
>> argv array from the command line, and a flush pkt to terminate the
>> list.
>
> Something in me feels slightly uncomfortable with that, just because
> simple newline-delimited formats make it easy for people to hack on the
> tool and feed input from unexpected sources.

I understand what you mean. How about this:

If both --stdin and --stateless-rpc are specified to fetch-pack, it will
use pkt-line to read the refs from stdin before handing off stdin to
get_remote_heads().

However, if only --stdin is specified, it will read refs from stdin in a
script-friendly newline delimited format, one ref per line. This is okay
because when --stateless-rpc is not specified get_remote_heads() reads
from an fd different from stdin so there is no issue with residual
buffers in this case.

This way you preserve scriptability for any other callers who don't use
--stateless-rpc.

How does this sound?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Jeff King
On Wed, Mar 21, 2012 at 09:02:13PM +0100, Ivan Todoroski wrote:

> >Something in me feels slightly uncomfortable with that, just because
> >simple newline-delimited formats make it easy for people to hack on the
> >tool and feed input from unexpected sources.
>
> I understand what you mean. How about this:
>
> If both --stdin and --stateless-rpc are specified to fetch-pack, it
> will use pkt-line to read the refs from stdin before handing off
> stdin to get_remote_heads().
>
> However, if only --stdin is specified, it will read refs from stdin
> in a script-friendly newline delimited format, one ref per line. This
> is okay because when --stateless-rpc is not specified
> get_remote_heads() reads from an fd different from stdin so there is
> no issue with residual buffers in this case.
>
> This way you preserve scriptability for any other callers who don't
> use --stateless-rpc.
>
> How does this sound?

I think that sounds quite reasonable, and shouldn't be more than a few
extra lines to implement.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clone fails on a repo with too many heads/tags

Ivan Todoroski
On 21.03.2012 21:17, Jeff King wrote:

> On Wed, Mar 21, 2012 at 09:02:13PM +0100, Ivan Todoroski wrote:
>
>>> Something in me feels slightly uncomfortable with that, just because
>>> simple newline-delimited formats make it easy for people to hack on the
>>> tool and feed input from unexpected sources.
>> I understand what you mean. How about this:
>>
>> If both --stdin and --stateless-rpc are specified to fetch-pack, it
>> will use pkt-line to read the refs from stdin before handing off
>> stdin to get_remote_heads().
>>
>> However, if only --stdin is specified, it will read refs from stdin
>> in a script-friendly newline delimited format, one ref per line. This
>> is okay because when --stateless-rpc is not specified
>> get_remote_heads() reads from an fd different from stdin so there is
>> no issue with residual buffers in this case.
>>
>> This way you preserve scriptability for any other callers who don't
>> use --stateless-rpc.
>>
>> How does this sound?
>
> I think that sounds quite reasonable, and shouldn't be more than a few
> extra lines to implement.
>
> -Peff

I wrote the code for this and went to write the test cases, but the test
suite is not cooperating. :(

I'll send just the code changes for comments while I'm figuring out the
test suite. I'll include the test cases along with better commit
messages in the next version of the patches.

So, back to the test suite problem. To get familiar with it I first ran
the test suite against the vanilla "maint" branch without any of my
changes, just to see what I should expect on properly running code.

Unfortunately it failed, and to make matters worse it failed exactly on
the parts dealing with smart HTTP, which is what I need to test in the
first place. Talk about Murphy's law...

Is it failing for anyone else on the vanilla "maint" branch? I would
appreciate any help I could get here.

My machine is CentOS 5.8 with latest updates and I have the httpd
package installed. Here is the command that is failing (you can find the
full output in the attachment):


$ GIT_TEST_HTTPD=yes ./t5551-http-fetch.sh -i -v
[... skip successful tests ...]
ok 5 - used upload-pack service

expecting success:
         git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet repo-p

error: RPC failed; result=22, HTTP code = 405
fatal: The remote end hung up unexpectedly
not ok - 6 follow redirects (301)
#
#               git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet
repo-p
#


Initialized empty Git repository in /home/itodoroski/git/t/trash directory.t5551-http-fetch/.git/
expecting success:
        echo content >file &&
        git add file &&
        git commit -m one

[master (root-commit) ba36540] one
 Author: A U Thor <[hidden email]>
 1 file changed, 1 insertion(+)
 create mode 100644 file
ok 1 - setup repository

expecting success:
        mkdir "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
        (cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
         git --bare init
        ) &&
        git remote add public "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
        git push public master:master

Initialized empty Git repository in /home/itodoroski/git/t/trash directory.t5551-http-fetch/httpd/www/repo.git/
To /home/itodoroski/git/t/trash directory.t5551-http-fetch/httpd/www/repo.git
 * [new branch]      master -> master
ok 2 - create http-accessible bare repository

expecting success:
        GIT_CURL_VERBOSE=1 git clone --quiet $HTTPD_URL/smart/repo.git clone 2>err &&
        test_cmp file clone/file &&
        tr '\015' Q <err |
        sed -e "
                s/Q\$//
                /^[*] /d
                /^$/d
                /^< $/d

                /^[^><]/{
                        s/^/> /
                }

                /^> User-Agent: /d
                /^> Host: /d
                /^> POST /,$ {
                        /^> Accept: [*]\\/[*]/d
                }
                s/^> Content-Length: .*/> Content-Length: xxx/
                /^> 00..want /d
                /^> 00.*done/d

                /^< Server: /d
                /^< Expires: /d
                /^< Date: /d
                /^< Content-Length: /d
                /^< Transfer-Encoding: /d
        " >act &&
        test_cmp exp act

ok 3 - clone http repository

expecting success:
        echo content >>file &&
        git commit -a -m two &&
        git push public
        (cd clone && git pull) &&
        test_cmp file clone/file

[master ace4881] two
 Author: A U Thor <[hidden email]>
 1 file changed, 1 insertion(+)
To /home/itodoroski/git/t/trash directory.t5551-http-fetch/httpd/www/repo.git
   ba36540..ace4881  master -> master
From http://127.0.0.1:5551/smart/repo
   ba36540..ace4881  master     -> origin/master
Updating ba36540..ace4881
Fast-forward
 file |    1 +
 1 file changed, 1 insertion(+)
ok 4 - fetch changes via http

expecting success:
        sed -e "
                s/^.* \"//
                s/\"//
                s/ [1-9][0-9]*\$//
                s/^GET /GET  /
        " >act <"$HTTPD_ROOT_PATH"/access.log &&
        test_cmp exp act

ok 5 - used upload-pack service

expecting success:
        git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet repo-p

error: RPC failed; result=22, HTTP code = 405
fatal: The remote end hung up unexpectedly
not ok - 6 follow redirects (301)
#
# git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet repo-p
#
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin

Ivan Todoroski
In reply to this post by Jeff King

From c4bb55f9f27569faa368d823ca6fe4b236e37cd6 Mon Sep 17 00:00:00 2001
From: Ivan Todoroski <[hidden email]>
Date: Sat, 24 Mar 2012 15:13:05 +0100
Subject: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin

---
 Documentation/git-fetch-pack.txt |    9 ++++++++
 builtin/fetch-pack.c             |   44 ++++++++++++++++++++++++++++++++++++--
 fetch-pack.h                     |    3 ++-
 3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt
index ed1bdaacd1..7cd7c785bc 100644
--- a/Documentation/git-fetch-pack.txt
+++ b/Documentation/git-fetch-pack.txt
@@ -32,6 +32,15 @@ OPTIONS
 --all::
  Fetch all remote refs.
 
+--stdin::
+ Take the list of refs from stdin, one per line. If there
+ are refs specified on the command line in addition to this
+ option, then the refs from stdin are processed after those
+ on the command line.
+ If '--stateless-rpc' is specified together with this option
+ then the list of refs must be in packet format (pkt-line)
+ with a flush packet terminating the list.
+
 -q::
 --quiet::
  Pass '-q' flag to 'git unpack-objects'; this makes the
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index a4d3e90a86..3c4c193e45 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -23,7 +23,7 @@ static struct fetch_pack_args args = {
 };
 
 static const char fetch_pack_usage[] =
-"git fetch-pack [--all] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] [--no-progress] [-v] [<host>:]<directory> [<refs>...]";
+"git fetch-pack [--all] [--stdin] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] [--no-progress] [-v] [<host>:]<directory> [<refs>...]";
 
 #define COMPLETE (1U << 0)
 #define COMMON (1U << 1)
@@ -896,7 +896,7 @@ static void fetch_pack_setup(void)
 
 int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 {
- int i, ret, nr_heads;
+ int i, ret, nr_heads, alloc_heads;
  struct ref *ref = NULL;
  char *dest = NULL, **heads;
  int fd[2];
@@ -941,6 +941,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
  args.fetch_all = 1;
  continue;
  }
+ if (!strcmp("--stdin", arg)) {
+ args.refs_from_stdin = 1;
+ continue;
+ }
  if (!strcmp("-v", arg)) {
  args.verbose = 1;
  continue;
@@ -972,6 +976,42 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
  if (!dest)
  usage(fetch_pack_usage);
 
+ if (args.refs_from_stdin) {
+ char ref[1000];
+ /* copy refs from cmdline to new growable list */
+ int size = nr_heads * sizeof(*heads);
+ heads = memcpy(xmalloc(size), heads, size);
+ alloc_heads = nr_heads;
+ /* append refs from stdin to ones from cmdline */
+ for (;;) {
+ if (args.stateless_rpc) {
+ /* read using pkt-line in stateless RPC mode,
+   flush packet signifies end of refs */
+ int n = packet_read_line(0, ref, sizeof(ref));
+ if (!n)
+ break;
+ if (ref[n-1] == '\n')
+ ref[n-1] = '\0';
+ }
+ else {
+ int n;
+ if (!fgets(ref, sizeof(ref), stdin)) {
+ if (ferror(stdin))
+ die_errno("cannot read ref");
+ if (feof(stdin))
+ break;
+ }
+ n = strlen(ref);
+ if (ref[n-1] == '\n')
+ ref[n-1] = '\0';
+ else if (!feof(stdin))
+ die("ref too long on stdin");
+ }
+ ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
+ heads[nr_heads++] = xstrdup(ref);
+ }
+ }
+
  if (args.stateless_rpc) {
  conn = NULL;
  fd[0] = 0;
diff --git a/fetch-pack.h b/fetch-pack.h
index 0608edae3f..292d69389e 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -13,7 +13,8 @@ struct fetch_pack_args {
  verbose:1,
  no_progress:1,
  include_tag:1,
- stateless_rpc:1;
+ stateless_rpc:1,
+ refs_from_stdin:1;
 };
 
 struct ref *fetch_pack(struct fetch_pack_args *args,
--
1.7.9.4.16.gd24fa.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
123
Loading...