More on git over HTTP POST

classic Classic list List threaded Threaded
42 messages Options
123
Reply | Threaded
Open this post in threaded view
|

More on git over HTTP POST

H. Peter Anvin-4
Hi all,

I have investigated a bit what it would take to support git protocol
(smart transport) over HTTP POST transactions.

The current proxy system is broken, for a very simple reason: it doesn't
convey information about when the channel should be turned around.

HTTP POST -- or, for that matter, any RPC-style transport, is a half
duplex transport: only one direction can be active at a time, after
which the channel has to be explicitly turned around.  The "turning
around" consists of posting the queued transaction and listening for the
reply.

Ultimately, it comes down to the following: the transactor needs to be
given explicit information when the git protocol goes from writing to
reading (the opposite direction information is obvious.)  I was hoping
that it would be possible to get this information from snooping the
protocol, but it doesn't seem to be so lucky.

I started to hack on a variant which would embed a VFS-style interface
in git itself, looking something like:

struct transactor;

struct transact_ops {
        ssize_t (*read)(struct transactor *, void *, size_t);
        ssize_t (*write)(struct transactor *, const void *, size_t);
        int (*close)(struct transactor *);
};

struct transactor {
        union {
                void *p;
                intptr_t i;
        } u;
        const struct transact_ops *ops;
};

Replacing the usual fd operations with this interface would allow a
different transactor to see the phase changes explicitly; the
replacement to use xread() and xwrite() is obvious.

Of course, I started hacking on it and found myself with zero time to
continue, but I thought I'd post what I had come up with.

        -hpa
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Shawn Pearce
"H. Peter Anvin" <[hidden email]> wrote:
> I have investigated a bit what it would take to support git protocol  
> (smart transport) over HTTP POST transactions.

I have started to think about this more myself, not just for POST
put also for some form of GET that can return an efficient pack,
rather than making the client walk the object chains itself.

Have you looked at the Mecurial wire protocol?  It runs over HTTP
and uses a relatively efficient means of deciding where to cut the
transfer at.

  http://www.selenic.com/mercurial/wiki/index.cgi/WireProtocol

Most of their smarts are in the branches() and between() operations.

Unfortunately this documentation isn't very complete and/or there
are some simplifications that the Mecurial team took due to their
repository format not initially supporting multiple branches like
the Git format does.

> The current proxy system is broken, for a very simple reason: it doesn't  
> convey information about when the channel should be turned around.

Well, over git:// (or any protocol that wraps git:// like ssh)
we assume a full-duplex channel.  Some proxy systems are able to
do such a channel.  HTTP however does not offer it.

> I started to hack on a variant which would embed a VFS-style interface  
> in git itself, looking something like:
>
> struct transactor;
>
> struct transact_ops {
> ssize_t (*read)(struct transactor *, void *, size_t);
> ssize_t (*write)(struct transactor *, const void *, size_t);
> int (*close)(struct transactor *);
> };

No, the git:// protocol implementation in fetch-pack/upload-pack
runs more efficient than that by keeping a sliding window of stuff
that is in-flight.  Its I guess two async RPCs running in parallel,
but from the client and server perspective both RPCs go into the
same computation.

HTTP POST is actually trivial if you don't want to support the new
tell-me-more extension that was added to git-push.  Hell, I could
write the CGI in a few minutes I think.  Its really just a small
wrapper around git-receive-pack.

What's a bitch is the efficient fetch, and getting tell-me-more to
work on push.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Daniel Stenberg
On Sat, 2 Aug 2008, Shawn O. Pearce wrote:

> Well, over git:// (or any protocol that wraps git:// like ssh) we assume a
> full-duplex channel.  Some proxy systems are able to do such a channel.
> HTTP however does not offer it.

Yes it does. The CONNECT method is used to get a full-duplex channel to a
remote site through a HTTP proxy. The downside with that is of course that
most proxies are setup to disallow CONNECT to other ports than 443 (the https
default port).

--

  / daniel.haxx.se
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Shawn Pearce
Daniel Stenberg <[hidden email]> wrote:

> On Sat, 2 Aug 2008, Shawn O. Pearce wrote:
>
>> Well, over git:// (or any protocol that wraps git:// like ssh) we
>> assume a full-duplex channel.  Some proxy systems are able to do such a
>> channel. HTTP however does not offer it.
>
> Yes it does. The CONNECT method is used to get a full-duplex channel to a
> remote site through a HTTP proxy. The downside with that is of course
> that most proxies are setup to disallow CONNECT to other ports than 443
> (the https default port).

Ah, yes.  CONNECT.  Very few servers wind up supporting it I think.

I know one very big company who cannot use or support Git because
Git over HTTP is too slow to be useful.  They support other tools
like Subversion instead.  :-|

Really we just need smart protocol support in half-duplex RPC like
hpa was going after.  Then it doesn't matter what we serialize it
into, almost any RPC system will be useful.  Of course the only
one that probably matters in practice is HTTP.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Petr Baudis
On Sat, Aug 02, 2008 at 02:08:28PM -0700, Shawn O. Pearce wrote:
> I know one very big company who cannot use or support Git because
> Git over HTTP is too slow to be useful.  They support other tools
> like Subversion instead.  :-|

On what projects? I'm currently using Git over HTTP (read-only) a lot
and it doesn't seem really all that impractical to me. Maybe just using
a more dumb-friendly packing scheme could help a lot?

                                Petr "Pasky" Baudis
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Shawn Pearce
Petr Baudis <[hidden email]> wrote:
> On Sat, Aug 02, 2008 at 02:08:28PM -0700, Shawn O. Pearce wrote:
> > I know one very big company who cannot use or support Git because
> > Git over HTTP is too slow to be useful.  They support other tools
> > like Subversion instead.  :-|
>
> On what projects? I'm currently using Git over HTTP (read-only) a lot
> and it doesn't seem really all that impractical to me. Maybe just using
> a more dumb-friendly packing scheme could help a lot?

They tested by taking the SVN source code and importing it into
both Git and Hg, then cloned them both over a WAN link.  Git was
22x slower.  I suspect they didn't pack the Git repository at all,
so Git had to issue thousands of HTTP GET requests for the loose
objects.  But I also suspect there was bias in the testing so they
didn't realize they needed to repack, and didn't care to find out.

I've probably already said too much.  I'm under NDAs.

But anyway.  The point I was trying to make was that there are
not just some proxy servers, but also some server platforms, that
cannot handle bidirectional communiction.  E.g. servers that are
behind reverse proxies, where the reverse proxy is acting as a sort
of firewall or content cache accelerator.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Shawn Pearce
In reply to this post by Shawn Pearce
"Shawn O. Pearce" <[hidden email]> wrote:
> "H. Peter Anvin" <[hidden email]> wrote:
> > I have investigated a bit what it would take to support git protocol  
> > (smart transport) over HTTP POST transactions.
>
> I have started to think about this more myself, not just for POST
> put also for some form of GET that can return an efficient pack,
> rather than making the client walk the object chains itself.
...
> HTTP POST is actually trivial if you don't want to support the new
> tell-me-more extension that was added to git-push.  Hell, I could
> write the CGI in a few minutes I think.  Its really just a small
> wrapper around git-receive-pack.

So I have this draft of how smart push might work.  Its slated
for the Documentation/technical directory.  Thus far I have only
written about push support, but Ilari on #git has some ideas about
how to do a smart fetch protocol.

Implementation wise in C git I think this is just a new C
program (git-http-backend?) that turns around and proxies
into git-receive-pack, at least for the push support.

What I don't know is how we could configure URI translation from
/path/to/repository.git received out of the $PATH_INFO in the
CGI environment to a physical directory.  Should we rely on the
server's $PATH_TRANSLATED?


Smart HTTP transfer protocols
=============================

Git supports two HTTP based transfer protocols.  A "dumb" protocol
which requires only a standard HTTP server on the server end of the
connection, and a "smart" protocol which requires a Git aware CGI
(or server module).  This document describes the "smart" protocol.

Authentication
--------------

Standard HTTP authentication is used, and must be configured and
enforced by the HTTP server software.

Chunked Transfer Encoding
-------------------------

For performance reasons the HTTP/1.1 chunked transfer encoding is
used frequently to transfer variable length objects.  This avoids
needing to produce large results in memory to compute the proper
content-length.

Detecting Smart Servers
-----------------------

HTTP clients can detect a smart Git-aware server by sending the
show-ref request (below) to the server.  If the response has a
status of 200 and the magic x-application/git-refs content type
then the server can be assumed to be a smart Git-aware server.

If any other response is received the client must assume dumb
protocol support, as the server did not correctly response to
the request.


Show Refs
---------

Obtains the available refs from the remote repository.  The response
is a sequence of git "packet lines", one per ref, and a final flush
packet line to indicate the end of stream.

        C: GET /path/to/repository.git?show-ref HTTP/1.0

        S: HTTP/1.1 200 OK
        S: Content-Type: x-application/git-refs
        S: Transfer-Encoding: chunked
        S:
        S: 62
        S: 003e95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint
        S:
        S: 63
        S: 003fd049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
        S:
        S: 59
        S: 003b2cb58b79488a98d2721cea644875a8dd0026b115 refs/heads/pu
        S:
        S: 4
        S: 0000
        S: 0

Push Pack
---------

Uploads a pack and updates refs.  The start of the stream is the
commands to update the refs and the remainder of the stream is the
pack file itself.  See git-receive-pack and its network protocol
in pack-protocol.txt, as this is essentially the same.

        C: POST /path/to/repository.git?receive-pack HTTP/1.0
        C: Content-Type: x-application/git-receive-pack
        C: Transfer-Encoding: chunked
        C:
        C: 103
        C: 006395dcfa3633004da0049d3d0fa03f80589cbcaf31 d049f6c27a2244e12041955e262a404c7faba355 refs/heads/maint
        C: 4
        C: 0000
        C: 12
        C: PACK
        ...
        C: 0

        S: HTTP/1.0 200 OK
        S: Content-type: x-application/git-status
        S: Transfer-Encoding: chunked
        S:
        S: ...<output of receive-pack>...



--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Junio C Hamano
"Shawn O. Pearce" <[hidden email]> writes:

> Show Refs
> ---------
>
> Obtains the available refs from the remote repository.  The response
> is a sequence of git "packet lines", one per ref, and a final flush
> packet line to indicate the end of stream.

As the initial protocol exchange request, I suspect that you would regret
if you do not leave room for some "capability advertisement" in this
exchange.

With the git native protocol, we luckily found space to do so after the
ref payload (because pkt-line is "length + payload" format but the code
that reads payload happened to ignore anything after NUL).  You would want
to define how these are given by the server to the client over HTTP
channel.  For example, putting them on extra HTTP headers is probably Ok.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Shawn Pearce
Junio C Hamano <[hidden email]> wrote:

> "Shawn O. Pearce" <[hidden email]> writes:
>
> > Show Refs
> > ---------
> >
> > Obtains the available refs from the remote repository.  The response
> > is a sequence of git "packet lines", one per ref, and a final flush
> > packet line to indicate the end of stream.
>
> As the initial protocol exchange request, I suspect that you would regret
> if you do not leave room for some "capability advertisement" in this
> exchange.
>
> With the git native protocol, we luckily found space to do so after the
> ref payload (because pkt-line is "length + payload" format but the code
> that reads payload happened to ignore anything after NUL).  You would want
> to define how these are given by the server to the client over HTTP
> channel.  For example, putting them on extra HTTP headers is probably Ok.

Yea, I thought that the HTTP headers would be more than enough
space to add capability advertisements.  Most client libraries
will happily parse and store these for the application, and won't
make a fuss if the application doesn't read them.

Hence there's more than enough room in the protocol to extend it
in the future with additional capabilities.

We do have to be careful though.  Any cachable resource must only
rely upon the URI and the standard headers which compute into the
cache key for a request.  There aren't many, though I think the
Content-Type header may be among them.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

H. Peter Anvin-4
In reply to this post by Junio C Hamano
Junio C Hamano wrote:
> With the git native protocol, we luckily found space to do so after the
> ref payload (because pkt-line is "length + payload" format but the code
> that reads payload happened to ignore anything after NUL).  You would want
> to define how these are given by the server to the client over HTTP
> channel.  For example, putting them on extra HTTP headers is probably Ok.

I think that would be a mistake, just because it's one more thing for
proxies to screw up on.  It's better to have negotiation information in
the payload, before the "real" data.

Obviously one thing that needs to be included in each transaction is a
transaction ID that will be reported back on the next transaction, since
you can't rely on a persistent connection.

        -hpa
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

H. Peter Anvin-4
In reply to this post by Shawn Pearce
Shawn O. Pearce wrote:
> Chunked Transfer Encoding
> -------------------------
>
> For performance reasons the HTTP/1.1 chunked transfer encoding is
> used frequently to transfer variable length objects.  This avoids
> needing to produce large results in memory to compute the proper
> content-length.

Note: you cannot rely on HTTP/1.1 being supported by an intermediate
proxy; you might have to handle HTTP/1.0, where the data is terminated
by connection close.

        -hpa
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

H. Peter Anvin-4
In reply to this post by Shawn Pearce
Shawn O. Pearce wrote:
> Chunked Transfer Encoding
> -------------------------
>
> For performance reasons the HTTP/1.1 chunked transfer encoding is
> used frequently to transfer variable length objects.  This avoids
> needing to produce large results in memory to compute the proper
> content-length.

One more thing about chunked transfer encodings: you cannot assume that
a proxy will maintain chunk boundaries, any more than you can assume
that a firewall will maintain TCP packet boundaries.

> Detecting Smart Servers
> -----------------------
>
> HTTP clients can detect a smart Git-aware server by sending the
> show-ref request (below) to the server.  If the response has a
> status of 200 and the magic x-application/git-refs content type
> then the server can be assumed to be a smart Git-aware server.
>
> If any other response is received the client must assume dumb
> protocol support, as the server did not correctly response to
> the request.

I think it should be application/x-git-refs, but that's splitting hairs.

> Obtains the available refs from the remote repository.  The response
> is a sequence of git "packet lines", one per ref, and a final flush
> packet line to indicate the end of stream.
>
> C: GET /path/to/repository.git?show-ref HTTP/1.0
>

I really think it would make more sense to use POST requests for
everything, and have the command part of the POSTed payload.  Putting
stuff in the URL just complicates the namespace to the detriment of the
admin.

> S: HTTP/1.1 200 OK
> S: Content-Type: x-application/git-refs
> S: Transfer-Encoding: chunked

Transfer-encoding: chunked is illegal with a HTTP/1.0 client.

        -hpa
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Shawn Pearce
In reply to this post by H. Peter Anvin-4
"H. Peter Anvin" <[hidden email]> wrote:
> Junio C Hamano wrote:
>>  For example, putting them [capabilities] on extra HTTP headers is probably Ok.
>
> I think that would be a mistake, just because it's one more thing for  
> proxies to screw up on.

I didn't realize we were in an era of proxies that are that
brain-damaged that they cannot relay the other headers.  The Amazon
S3 service relies heavily upon their own extended headers to make
their REST API work.  If proxies stripped that stuff out then the
client wouldn't work at all.

IOW I had thought we were past this dark age of the Internet.

> It's better to have negotiation information in  
> the payload, before the "real" data.

I guess I could do that.  At least for the really complex stuff.

> Obviously one thing that needs to be included in each transaction is a  
> transaction ID that will be reported back on the next transaction, since  
> you can't rely on a persistent connection.

No.  That requires the server to maintain state.  We don't want to
do that if we can avoid it.  I would much rather have the clients
handle the state management as it simplifies the server side,
especially when you start talking about reverse proxies and/or
load-balancers running in front of the server farm.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Shawn Pearce
In reply to this post by H. Peter Anvin-4
"H. Peter Anvin" <[hidden email]> wrote:

> Shawn O. Pearce wrote:
>> Chunked Transfer Encoding
>> -------------------------
>>
>> For performance reasons the HTTP/1.1 chunked transfer encoding is
>> used frequently to transfer variable length objects.  This avoids
>> needing to produce large results in memory to compute the proper
>> content-length.
>
> Note: you cannot rely on HTTP/1.1 being supported by an intermediate  
> proxy; you might have to handle HTTP/1.0, where the data is terminated  
> by connection close.

Well, that proxy is going to be crying when we upload a 120M pack
during a push to it, and it buffers the damn thing to figure out
the proper Content-Length so it can convert an HTTP/1.1 client
request into an HTTP/1.0 request to forward to the server.  That's
just _stupid_.

But from the client side perspective the chunked transfer encoding
is used only to avoid generating in advance and producing the
content-length header.  I fully expect the encoding to disappear
(e.g. in a proxy, or in the HTTP client library) before any sort
of Git code gets its fingers on the data.

Hence to your other remark, I _do not_ rely upon the encoding
boundaries to remain intact.  That is why there is Git pkt-line
encodings inside of the HTTP data stream.  We can rely on the
pkt-line encoding being present, even if the HTTP chunks were
moved around (or removed entirely) by a proxy.

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

Mike Hommey-3
In reply to this post by Shawn Pearce
On Sat, Aug 02, 2008 at 07:56:02PM -0700, Shawn O. Pearce wrote:
> Smart HTTP transfer protocols
> =============================
>
> Git supports two HTTP based transfer protocols.  A "dumb" protocol
> which requires only a standard HTTP server on the server end of the
> connection, and a "smart" protocol which requires a Git aware CGI
> (or server module).  This document describes the "smart" protocol.

If you want, I have a patch series that introduces a small API to make
HTTP requests easier to make.

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[RFC 1/2] Add backdoor options to receive-pack for use in Git-aware CGI

Shawn Pearce
In reply to this post by Shawn Pearce
The new --report-status flag forces the status report feature of
the push protocol to be enabled.  This can be useful in a CGI
program that implements the server side of a "smart" Git-aware
HTTP transport.  The CGI code can perform the selection of the
feature and ask receive-pack to enable it automatically.

The new --no-advertise-heads causes receive-pack to bypass its usual
display of known refs to the client, and instead immediately start
reading the commands and pack from stdin.  This is useful in a CGI
situation where we want to hand off all input to receive-pack.

Signed-off-by: Shawn O. Pearce <[hidden email]>
---
 receive-pack.c |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/receive-pack.c b/receive-pack.c
index d44c19e..512eae6 100644
--- a/receive-pack.c
+++ b/receive-pack.c
@@ -464,6 +464,7 @@ static int delete_only(struct command *cmd)
 
 int main(int argc, char **argv)
 {
+ int advertise_heads = 1;
  int i;
  char *dir = NULL;
 
@@ -472,7 +473,15 @@ int main(int argc, char **argv)
  char *arg = *argv++;
 
  if (*arg == '-') {
- /* Do flag handling here */
+ if (!strcmp(arg, "--report-status")) {
+ report_status = 1;
+ continue;
+ }
+ if (!strcmp(arg, "--no-advertise-heads")) {
+ advertise_heads = 0;
+ continue;
+ }
+
  usage(receive_pack_usage);
  }
  if (dir)
@@ -497,10 +506,10 @@ int main(int argc, char **argv)
  else if (0 <= receive_unpack_limit)
  unpack_limit = receive_unpack_limit;
 
- write_head_info();
-
- /* EOF */
- packet_flush(1);
+ if (advertise_heads) {
+ write_head_info();
+ packet_flush(1);
+ }
 
  read_head_info();
  if (commands) {
--
1.6.0.rc1.221.g9ae23

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[RFC 2/2] Add Git-aware CGI for Git-aware smart HTTP transport

Shawn Pearce
This CGI can be loaded into an Apache server using ScriptAlias,
such as with the following configuration:

  LoadModule cgi_module /usr/libexec/apache2/mod_cgi.so
  LoadModule alias_module /usr/libexec/apache2/mod_alias.so
  ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

Repositories are accessed via the translated PATH_INFO.

The CGI is backwards compatible with the dumb client, allowing the
client to detect the server's smarts by looking at the content-type
returned from "GET /repo.git/info/refs".  If the returned content
type is the magic application/x-git-refs type then the client can
assume the server is Git-aware.

Signed-off-by: Shawn O. Pearce <[hidden email]>
---
 .gitignore                                |    1 +
 Documentation/technical/http-protocol.txt |   88 +++++++++
 Makefile                                  |    1 +
 http-backend.c                            |  302 +++++++++++++++++++++++++++++
 4 files changed, 392 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/http-protocol.txt
 create mode 100644 http-backend.c

diff --git a/.gitignore b/.gitignore
index a213e8e..02eaf3a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -51,6 +51,7 @@ git-gc
 git-get-tar-commit-id
 git-grep
 git-hash-object
+git-http-backend
 git-http-fetch
 git-http-push
 git-imap-send
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
new file mode 100644
index 0000000..6cb96f3
--- /dev/null
+++ b/Documentation/technical/http-protocol.txt
@@ -0,0 +1,88 @@
+Smart HTTP transfer protocols
+=============================
+
+Git supports two HTTP based transfer protocols.  A "dumb" protocol
+which requires only a standard HTTP server on the server end of the
+connection, and a "smart" protocol which requires a Git aware CGI
+(or server module).  This document describes the "smart" protocol.
+
+As a design feature smart servers automatically degrade to the
+dumb protocol when speaking with a dumb client.  This may cause
+more load to be placed on the server as the file GET requests are
+handled by a CGI rather than the server itself.
+
+
+Authentication
+--------------
+
+Standard HTTP authentication is used, and must be configured and
+enforced by the HTTP server software.
+
+Chunked Transfer Encoding
+-------------------------
+
+For performance reasons the HTTP/1.1 chunked transfer encoding is
+used frequently to transfer variable length objects.  This avoids
+needing to produce large results in memory to compute the proper
+content-length.
+
+Detecting Smart Servers
+-----------------------
+
+HTTP clients can detect a smart Git-aware server by sending the
+/info/refs request (below) to the server.  If the response has a
+status of 200 and the magic application/x-git-refs content type
+then the server can be assumed to be a smart Git-aware server.
+
+
+Show Refs
+---------
+
+Obtains the available refs from the remote repository.  The response
+is a sequence of refs, one per line.  The actual format matches that
+of the $GIT_DIR/info/refs file normally used by a "dumb" protocol.
+
+ C: GET /path/to/repository.git/info/refs HTTP/1.0
+
+ S: HTTP/1.1 200 OK
+ S: Content-Type: application/x-git-refs
+ S: Transfer-Encoding: chunked
+ S:
+ S: 62
+ S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint
+ S:
+ S: 63
+ S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
+ S:
+ S: 59
+ S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/heads/pu
+ S:
+
+Push Pack
+---------
+
+Uploads a pack and updates refs.  The start of the stream is the
+commands to update the refs and the remainder of the stream is the
+pack file itself.  See git-receive-pack and its network protocol
+in pack-protocol.txt, as this is essentially the same.
+
+ C: POST /path/to/repository.git/receive-pack HTTP/1.0
+ C: Content-Type: application/x-git-receive-pack
+ C: Transfer-Encoding: chunked
+ C:
+ C: 103
+ C: 006395dcfa3633004da0049d3d0fa03f80589cbcaf31 d049f6c27a2244e12041955e262a404c7faba355 refs/heads/maint
+ C: 4
+ C: 0000
+ C: 12
+ C: PACK
+ ...
+ C: 0
+
+ S: HTTP/1.0 200 OK
+ S: Content-type: application/x-git-receive-pack-status
+ S: Transfer-Encoding: chunked
+ S:
+ S: ...<output of receive-pack>...
+
+
diff --git a/Makefile b/Makefile
index 52c67c1..3a93bf6 100644
--- a/Makefile
+++ b/Makefile
@@ -298,6 +298,7 @@ PROGRAMS += git-unpack-file$X
 PROGRAMS += git-update-server-info$X
 PROGRAMS += git-upload-pack$X
 PROGRAMS += git-var$X
+PROGRAMS += git-http-backend$X
 
 # List built-in command $C whose implementation cmd_$C() is not in
 # builtin-$C.o but is linked in as part of some other command.
diff --git a/http-backend.c b/http-backend.c
new file mode 100644
index 0000000..a498f89
--- /dev/null
+++ b/http-backend.c
@@ -0,0 +1,302 @@
+#include "cache.h"
+#include "refs.h"
+#include "pkt-line.h"
+#include "object.h"
+#include "tag.h"
+#include "exec_cmd.h"
+#include "run-command.h"
+
+static const char content_type[] = "Content-Type";
+static const char content_length[] = "Content-Length";
+
+static int can_chunk;
+static char buffer[1000];
+
+static void send_status(unsigned code, const char *msg)
+{
+ size_t n;
+
+ n = snprintf(buffer, sizeof(buffer), "Status: %u %s\r\n", code, msg);
+ if (n >= sizeof(buffer))
+ die("protocol error: impossibly long header");
+ safe_write(1, buffer, n);
+}
+
+static void send_header(const char *name, const char *value)
+{
+ size_t n;
+
+ n = snprintf(buffer, sizeof(buffer), "%s: %s\r\n", name, value);
+ if (n >= sizeof(buffer))
+ die("protocol error: impossibly long header");
+ safe_write(1, buffer, n);
+}
+
+static void end_headers(void)
+{
+ safe_write(1, "\r\n", 2);
+}
+
+static void send_nocaching(void)
+{
+ const char *proto = getenv("SERVER_PROTOCOL");
+ if (!proto || !strcmp(proto, "HTTP/1.0"))
+ send_header("Expires", "Mon, 17 Sep 2001 00:00:00 GMT");
+ else
+ send_header("Cache-Control", "no-cache");
+}
+
+static void send_connection_close(void)
+{
+ send_header("Connection", "close");
+}
+
+static void enable_chunking(void)
+{
+ const char *proto = getenv("SERVER_PROTOCOL");
+
+ can_chunk = proto && strcmp(proto, "HTTP/1.0");
+ if (can_chunk)
+ send_header("Transfer-Encoding", "chunked");
+ else
+ send_connection_close();
+}
+
+#define hex(a) (hexchar[(a) & 15])
+static void chunked_write(const char *fmt, ...)
+{
+ static const char hexchar[] = "0123456789abcdef";
+ va_list args;
+ unsigned n;
+
+ va_start(args, fmt);
+ n = vsnprintf(buffer + 6, sizeof(buffer) - 8, fmt, args);
+ va_end(args);
+ if (n >= sizeof(buffer) - 8)
+ die("protocol error: impossibly long line");
+
+ if (can_chunk) {
+ unsigned len = n + 4, b = 4;
+
+ buffer[4] = '\r';
+ buffer[5] = '\n';
+ buffer[n + 6] = '\r';
+ buffer[n + 7] = '\n';
+
+ while (n > 0) {
+ buffer[--b] = hex(n);
+ n >>= 4;
+ len++;
+ }
+
+ safe_write(1, buffer + b, len);
+ } else
+ safe_write(1, buffer + 6, n);
+}
+
+static void end_chunking(void)
+{
+ static const char flush_chunk[] = "0\r\n\r\n";
+ if (can_chunk)
+ safe_write(1, flush_chunk, strlen(flush_chunk));
+}
+
+static void NORETURN invalid_request(const char *msg)
+{
+ static const char header[] = "error: ";
+
+ send_status(400, "Bad Request");
+ send_header(content_type, "text/plain");
+ end_headers();
+
+ safe_write(1, header, strlen(header));
+ safe_write(1, msg, strlen(msg));
+ safe_write(1, "\n", 1);
+
+ exit(0);
+}
+
+static void not_found(void)
+{
+ send_status(404, "Not Found");
+ end_headers();
+}
+
+static void server_error(void)
+{
+ send_status(500, "Internal Error");
+ end_headers();
+}
+
+static void require_content_type(const char *need_type)
+{
+ const char *input_type = getenv("CONTENT_TYPE");
+ if (!input_type || strcmp(input_type, need_type))
+ invalid_request("Unsupported content-type");
+}
+
+static void do_GET_any_file(char *name)
+{
+ const char *p = git_path("%s", name);
+ struct stat sb;
+ uintmax_t remaining;
+ size_t n;
+ int fd = open(p, O_RDONLY);
+
+ if (fd < 0) {
+ not_found();
+ return;
+ }
+ if (fstat(fd, &sb) < 0) {
+ close(fd);
+ server_error();
+ die("fstat on plain file failed");
+ }
+ remaining = (uintmax_t)sb.st_size;
+
+ n = snprintf(buffer, sizeof(buffer),
+ "Content-Length: %" PRIuMAX "\r\n", remaining);
+ if (n >= sizeof(buffer))
+ die("protocol error: impossibly long header");
+ safe_write(1, buffer, n);
+ send_header(content_type, "application/octet-stream");
+ end_headers();
+
+ while (remaining) {
+ n = xread(fd, buffer, sizeof(buffer));
+ if (n < 0)
+ die("error reading from %s", p);
+ n = safe_write(1, buffer, n);
+ if (n <= 0)
+ break;
+ }
+ close(fd);
+}
+
+static int show_one_ref(const char *name, const unsigned char *sha1,
+ int flag, void *cb_data)
+{
+ struct object *o = parse_object(sha1);
+ if (!o)
+ return 0;
+
+ chunked_write("%s\t%s\n", sha1_to_hex(sha1), name);
+ if (o->type == OBJ_TAG) {
+ o = deref_tag(o, name, 0);
+ if (!o)
+ return 0;
+ chunked_write("%s\t%s^{}\n", sha1_to_hex(o->sha1), name);
+ }
+
+ return 0;
+}
+
+static void do_GET_info_refs(char *arg)
+{
+ send_header(content_type, "application/x-git-refs");
+ send_nocaching();
+ enable_chunking();
+ end_headers();
+
+ for_each_ref(show_one_ref, NULL);
+ end_chunking();
+}
+
+static void do_GET_info_packs(char *arg)
+{
+ size_t objdirlen = strlen(get_object_directory());
+ struct packed_git *p;
+
+ send_nocaching();
+ enable_chunking();
+ end_headers();
+
+ prepare_packed_git();
+ for (p = packed_git; p; p = p->next) {
+ if (!p->pack_local)
+ continue;
+ chunked_write("P %s\n", p->pack_name + objdirlen + 6);
+ }
+ chunked_write("\n");
+ end_chunking();
+}
+
+static void do_POST_receive_pack(char *arg)
+{
+ require_content_type("application/x-git-receive-pack");
+ send_header(content_type, "application/x-git-receive-pack-status");
+ send_nocaching();
+ send_connection_close();
+ end_headers();
+
+ execl_git_cmd("receive-pack",
+ "--report-status",
+ "--no-advertise-heads",
+ ".",
+ NULL);
+ die("Failed to start receive-pack");
+}
+
+static struct service_cmd {
+ const char *method;
+ const char *pattern;
+ void (*imp)(char *);
+} services[] = {
+ {"GET", "/info/refs$", do_GET_info_refs},
+ {"GET", "/objects/info/packs", do_GET_info_packs},
+
+ {"GET", "/HEAD$", do_GET_any_file},
+ {"GET", "/objects/../.{38}$", do_GET_any_file},
+ {"GET", "/objects/pack/pack-[^/]*$", do_GET_any_file},
+ {"GET", "/objects/info/[^/]*$", do_GET_any_file},
+
+ {"POST", "/receive-pack", do_POST_receive_pack}
+};
+
+int main(int argc, char **argv)
+{
+ char *input_method = getenv("REQUEST_METHOD");
+ char *dir = getenv("PATH_TRANSLATED");
+ struct service_cmd *cmd = NULL;
+ char *cmd_arg = NULL;
+ int i;
+
+ if (!input_method)
+ die("No REQUEST_METHOD from server");
+ if (!strcmp(input_method, "HEAD"))
+ input_method = "GET";
+
+ if (!dir)
+ die("No PATH_TRANSLATED from server");
+
+ for (i = 0; i < ARRAY_SIZE(services); i++) {
+ struct service_cmd *c = &services[i];
+ regex_t re;
+ regmatch_t out[1];
+
+ if (strcmp(input_method, c->method))
+ continue;
+ if (regcomp(&re, c->pattern, REG_EXTENDED))
+ die("Bogus re in service table: %s", c->pattern);
+ if (!regexec(&re, dir, 2, out, 0)) {
+ size_t n = out[0].rm_eo - out[0].rm_so;
+ cmd = c;
+ cmd_arg = xmalloc(n);
+ strncpy(cmd_arg, dir + out[0].rm_so + 1, n);
+ cmd_arg[n] = 0;
+ dir[out[0].rm_so] = 0;
+ break;
+ }
+ regfree(&re);
+ }
+
+ if (!cmd)
+ invalid_request("Unsupported query request");
+
+ setup_path();
+ if (!enter_repo(dir, 0))
+ invalid_request("Not a Git repository");
+
+ cmd->imp(cmd_arg);
+ return 0;
+}
--
1.6.0.rc1.221.g9ae23

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

David Lang
In reply to this post by Shawn Pearce
On Sat, 2 Aug 2008, Shawn O. Pearce wrote:

>
> "H. Peter Anvin" <[hidden email]> wrote:
>> Junio C Hamano wrote:
>>>  For example, putting them [capabilities] on extra HTTP headers is probably Ok.
>>
>> I think that would be a mistake, just because it's one more thing for
>> proxies to screw up on.
>
> I didn't realize we were in an era of proxies that are that
> brain-damaged that they cannot relay the other headers.  The Amazon
> S3 service relies heavily upon their own extended headers to make
> their REST API work.  If proxies stripped that stuff out then the
> client wouldn't work at all.
>
> IOW I had thought we were past this dark age of the Internet.

actually, it's not just a matter of not getting 'past this dark age of the
Internet', it's an issue that so many people are tunneling _everyting_
over http (including the bad guys tunneling malware) that proxies are
getting more aggressive then they have ever been before in pulling apart
the payload and analysing it before letting it get through to the far
side.

David Lang

>> It's better to have negotiation information in
>> the payload, before the "real" data.
>
> I guess I could do that.  At least for the really complex stuff.
>
>> Obviously one thing that needs to be included in each transaction is a
>> transaction ID that will be reported back on the next transaction, since
>> you can't rely on a persistent connection.
>
> No.  That requires the server to maintain state.  We don't want to
> do that if we can avoid it.  I would much rather have the clients
> handle the state management as it simplifies the server side,
> especially when you start talking about reverse proxies and/or
> load-balancers running in front of the server farm.
>
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

H. Peter Anvin-4
In reply to this post by Shawn Pearce
Shawn O. Pearce wrote:
>
> IOW I had thought we were past this dark age of the Internet.
>

If we were, there wouldn't be a need for this project at all.  The whole
purpose of it is to deal with corporate proxies that try to prevent
actual communication because of "security", and it's really hard to
predict what utterly arbitrary heuristics they have applied.

        -hpa

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: More on git over HTTP POST

H. Peter Anvin-4
In reply to this post by Shawn Pearce
Shawn O. Pearce wrote:

>
> But from the client side perspective the chunked transfer encoding
> is used only to avoid generating in advance and producing the
> content-length header.  I fully expect the encoding to disappear
> (e.g. in a proxy, or in the HTTP client library) before any sort
> of Git code gets its fingers on the data.
>
> Hence to your other remark, I _do not_ rely upon the encoding
> boundaries to remain intact.  That is why there is Git pkt-line
> encodings inside of the HTTP data stream.  We can rely on the
> pkt-line encoding being present, even if the HTTP chunks were
> moved around (or removed entirely) by a proxy.
>

Excellent.  I did not mean that as criticism, obviously, I just wanted
that to be clear.

HTTP/1.1 does chunked encoding, and HTTP/1.0 does terminate on
connection close; both serve the same purpose.

        -hpa
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
123