Quantcast

Unable to clone via git protocol / early EOF / index-pack failed

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Unable to clone via git protocol / early EOF / index-pack failed

Harald Welte
Hi all,

since a couple of days ago, I'm encountering a very strange problem regarding
one of my git-daemon installations.  It started when I updated the kernel on
the git server to Debians "2.6.32-5-openvz-amd64".

However, all openvz limits have been set to 'unlimited', and there is plenty
of memory available, the system is idle otherwise.  The repositories are _very_
small, so I do not think resource limitations are an issue.  Also, thre is no
-ENOMEM or any similar error message from the kernel to git-daemon that would
make OpenVZ resource constraints become a suspect.

This is the output I get on the client side:
---
vishnu:/tmp# GIT_TRACE=1 git clone git://git.osmocom.org/cyberflex-shell.git
trace: built-in: git 'clone' 'git://git.osmocom.org/cyberflex-shell.git'
Initialized empty Git repository in /tmp/cyberflex-shell/.git/
trace: run_command: 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 19371 on vishnu.netfilter.org'
trace: exec: 'git-index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 19371 on vishnu.netfilter.org'
remote: trace: exec: 'git-pack-objects' '--stdout' '--progress' '--delta-base-offset'
remote: trace: built-in: git 'pack-objects' '--stdout' '--progress' '--delta-base-offset'
remote: Counting objects: 1310, done.
remote: Compressing objects: 100% (341/341), done.
remote: Total 1310 (delta 968), reused 1299 (delta 963)
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
---

The output on the server side looks normal:
---
trace: run_command: '/root/git/git-daemon' '--serve' '--verbose' '--base-path=/srv/gitosis/repositories' '/srv/gitosis/repositories'
[27202] Connection from 82.99.25.66:5329
[27202] Extended attributes (22 bytes) exist <host=git.osmocom.org>
[27202] Request upload-pack for '/openggsn.git'
trace: run_command: 'upload-pack' '--strict' '--timeout=0' '.'
[27202] trace: exec: 'git' 'upload-pack' '--strict' '--timeout=0' '.'
[27202] trace: exec: 'git-upload-pack' '--strict' '--timeout=0' '.'
[27202] trace: run_command: 'pack-objects' '--stdout' '--progress' '--delta-base-offset'
[24307] [27202] Disconnected
---

When I retry the 'git clone' on the client many times, in about 1 of every 10
to 20 cases it will succeed, but will fail with the same EOF the other 90-95%.

I have tried git version 1.5.6.5, current git.git master, and 1.7.2.5 - they
all show the same problem, though I had the feeling that git.git master has
a higher probability of success.

The probability of successful cloning also seems to be influenced by the amount
of latency between the server and the client.  If I run it on the same machine
it always works, if I run it on the same ethernet segment, it almost always works,
and via internet it almost always fails.

Looking at a strace of git-daemon, I get a segment close to the process
termination, where a child process is first doing close(1), then fstat(1),
which returns -EBADF.  I'm not sure if this is the cause of the problem,
but it looks strange anyway:
---
[pid 24306] write(1, "X\261_v\312\256\265\240\220\227\350\4\201WA@Y\341y7"..., 20) = 20
[pid 24306] close(1)                    = 0
[pid 24306] brk(0x2411000)              = 0x2411000
[pid 24306] write(2, "Total 102 (delta 58), reused 0 (delta 0)\n"..., 41) = 41
[pid 24306] fstat(1, 0x7fff9070e0a0)    = -1 EBADF (Bad file descriptor)
[pid 24306] exit_group(0)               = ?
Process 24306 detached
[pid 24302] <... write resumed> )       = 7832
[pid 24302] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 24302] write(1, "0G\n\34182w\227['\307\243\232^cn'H\30\243\25Y\2032yU\24\213\3653+\205`\0309\331kFo\4\330\310wE\4\274kk\244c\32\331HJ\32\300\213r.\272Q\24FF\345\255\355*\260\241d\265\20\225\352F\0102\231Y\4\262:\314Pw\241\4b\16\10\31\205\\\31ma\336#\216\31\251\35\366\350,\234\v\272\374x\336\347\34\344\353\30\201\344z\302\214\n\202\31#\374\332\331\31\340/\256+\300\342\n\374\306w\201\263\26v\261\211$@\250;\370P\335R.]`\252\23c\235\273\306\216@_sd\2325\3450\6|L\5\274\265^D[:hV\t\315\300(z\233\0106\10\270QTL\253\2\376L\275^\247\323{\327\350\235\17{\215\17\262q\246\2546\356\342\2+\271\344\216Ya&p\311\300\227\201\240\17R\207X+z#\225\256\214\275\\0*\252Kz\361\206=r#"..., 360) = 360
[pid 24302] read(6, "jects:  49% (49/99)   \rCompressing objects:  50% (50/99)   \rCompressing objects:  51% (51/99)   \rCompressing objects:  52% (52/9"..., 128) = 128
[pid 24302] write(1, "0085\2"..., 5)    = 5
---

I have tried 'git fsck' and 'git gc' on the server repositories, this caused
no change whatsoever in the behaviour.

I have taken pcap files on the server and client.  They both agree, so I don't
expect any firewall, NAT or other IP device to interfere with the communication.

It is clearly git-daemon who terminates the TCP connection with a [RST,ACK]
packet towards the client.

Do you have any idea how to further debug the problem?  I've been searching
the web for quite soem time, and while I can find a number of reports
describing similar issues, even with small repositories, and as early as 2009,
there is no response or solution to any of those postings.

If you want to experience the problem on the client side, try cloning any
of the repositories listed on http://cgit.osmocom.org/

Regards,
        Harald
--
- Harald Welte <[hidden email]>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unable to clone via git protocol / early EOF / index-pack failed

Harald Welte
Hi again,

> since a couple of days ago, I'm encountering a very strange problem regarding
> one of my git-daemon installations.

What I forgot to mention:  The same repositories work fine using the
'ssh' transport, we can clone, fetch, update, push without any problems
at all.  So I am quite sure the problem is not related to the repository
data/metadata, but specifically related to git-daemon.

Thanks again,
        Harald
--
- Harald Welte <[hidden email]>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unable to clone via git protocol / early EOF / index-pack failed

Holger Freyther
In reply to this post by Harald Welte
Harald Welte <laforge <at> gnumonks.org> writes:

>
> Hi all,

Hi,

I was tracing this on the same server with these flags[1] and in the success
and failure case the commands send to and from the server are the same.
My current guess is that something is not getting flushed and the tcp socket
is closed too early.

It is a bit difficult to track all the processes that get started and what they
should do and to figure out at which point the fd for the tcp socket is really
closed.

anyone has an idea of what we could do?

cheers
   holger

[1] GIT_TRACE_PACKET=1 GIT_TRACE=2 GIT_DEBUG_SEND_PACK=2




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unable to clone via git protocol / early EOF / index-pack failed

Holger Freyther
Holger Freyther <zecke <at> selfish.org> writes:


> It is a bit difficult to track all the processes that get started and what they
> should do and to figure out at which point the fd for the tcp socket is really
> closed.

If I do the below hack it is working fine. Adding a fflush(NULL).. or a
close(fileno(stdout)).. fsync... sched_yield() is not fixing it though.

diff --git a/upload-pack.c b/upload-pack.c
index 72aa661..4cd12c9 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -695,6 +695,8 @@ static void upload_pack(void)
                get_common_commits();
                create_pack_file();
        }
+
+       sleep(1);
 }
 
 int main(int argc, char **argv)




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unable to clone via git protocol / early EOF / index-pack failed

Matthieu Moy-2
Holger Freyther <[hidden email]> writes:

> Holger Freyther <zecke <at> selfish.org> writes:
>
>
>> It is a bit difficult to track all the processes that get started and what they
>> should do and to figure out at which point the fd for the tcp socket is really
>> closed.
>
> If I do the below hack it is working fine. Adding a fflush(NULL).. or a
> close(fileno(stdout)).. fsync... sched_yield() is not fixing it though.
>
> diff --git a/upload-pack.c b/upload-pack.c
> index 72aa661..4cd12c9 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -695,6 +695,8 @@ static void upload_pack(void)
>                 get_common_commits();
>                 create_pack_file();
>         }
> +
> +       sleep(1);
>  }
>  
>  int main(int argc, char **argv)

That reminds me an issue I had with git over SSH. The problem was indeed
not Git-related:

  http://readlist.com/lists/securityfocus.com/secureshell/0/3071.html

In short, when sending 196481 bytes or more, _and_ if the server did not
consume the data fast enough, then the connection was closed after
65536=2^16 bytes.

That was on RHEL, and upgrading the kernel solved the issue. Scary...

--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unable to clone via git protocol / early EOF / index-pack failed

Holger Freyther
On 03/26/2011 06:58 PM, Matthieu Moy wrote:

>
> That reminds me an issue I had with git over SSH. The problem was indeed
> not Git-related:
>
>   http://readlist.com/lists/securityfocus.com/secureshell/0/3071.html
>
> In short, when sending 196481 bytes or more, _and_ if the server did not
> consume the data fast enough, then the connection was closed after
> 65536=2^16 bytes.
>
> That was on RHEL, and upgrading the kernel solved the issue. Scary...
>

Yes, scary. I assume that fflush(stdout) has no real meaning for sockets
either? So the next best thing would be to use ioctl and SIOCOUTQ on the
socket? Any other ideas?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...