[PATCH v7 0/9] connect: various cleanups

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 0/9] connect: various cleanups

Mike Hommey-3
Difference with v6:
- Dropped the core.gitProxy change as per discussion.
- Added a comment about the extra get_port.

Note that after this series, parse_connect_url can be further refactored to
avoid the kind of back-and-forth with host_end, get_host_and_port and
get_port.

Mike Hommey (9):
  connect: document why we sometimes call get_port after
    get_host_and_port
  connect: call get_host_and_port() earlier
  connect: re-derive a host:port string from the separate host and port
    variables
  connect: make parse_connect_url() return separated host and port
  connect: group CONNECT_DIAG_URL handling code
  connect: make parse_connect_url() return the user part of the url as a
    separate value
  connect: change the --diag-url output to separate user and host
  connect: actively reject git:// urls with a user part
  connect: move ssh command line preparation to a separate function

 connect.c             | 231 +++++++++++++++++++++++++++++---------------------
 t/t5500-fetch-pack.sh |  42 ++++++---
 2 files changed, 166 insertions(+), 107 deletions(-)

--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Mike Hommey-3
Signed-off-by: Mike Hommey <[hidden email]>
---
 connect.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/connect.c b/connect.c
index c53f3f1..caa2a3c 100644
--- a/connect.c
+++ b/connect.c
@@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
  transport_check_allowed("ssh");
  get_host_and_port(&ssh_host, &port);
 
+ /* get_host_and_port may not return a port even when
+ * there is one: In the [host:port]:path case,
+ * get_host_and_port is called with "[host:port]" and
+ * returns "host:port" and NULL.
+ * In that specific case, we still need to split the
+ * port. */
  if (!port)
  port = get_port(ssh_host);
 
--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 2/9] connect: call get_host_and_port() earlier

Mike Hommey-3
In reply to this post by Mike Hommey-3
Currently, get_host_and_port() is called in git_connect() for the ssh
protocol, and in git_tcp_connect_sock() for the git protocol. Instead
of doing this, just call it from a single place, right after
parse_connect_url(), and pass the host and port separately to
git_*_connect() functions.

We however preserve hostandport, at least for now.

Note that in git_tcp_connect_sock, the port was set to "<none>" in a
case that never can happen, so that code path was removed.

Signed-off-by: Mike Hommey <[hidden email]>
---
 connect.c | 47 +++++++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/connect.c b/connect.c
index caa2a3c..0cdbeb2 100644
--- a/connect.c
+++ b/connect.c
@@ -343,18 +343,16 @@ static const char *ai_name(const struct addrinfo *ai)
 /*
  * Returns a connected socket() fd, or else die()s.
  */
-static int git_tcp_connect_sock(char *host, int flags)
+static int git_tcp_connect_sock(const char *host, const char *port, int flags)
 {
  struct strbuf error_message = STRBUF_INIT;
  int sockfd = -1;
- const char *port = STR(DEFAULT_GIT_PORT);
  struct addrinfo hints, *ai0, *ai;
  int gai;
  int cnt = 0;
 
- get_host_and_port(&host, &port);
- if (!*port)
- port = "<none>";
+ if (!port)
+ port = STR(DEFAULT_GIT_PORT);
 
  memset(&hints, 0, sizeof(hints));
  if (flags & CONNECT_IPV4)
@@ -411,11 +409,10 @@ static int git_tcp_connect_sock(char *host, int flags)
 /*
  * Returns a connected socket() fd, or else die()s.
  */
-static int git_tcp_connect_sock(char *host, int flags)
+static int git_tcp_connect_sock(const char *host, const char *port, int flags)
 {
  struct strbuf error_message = STRBUF_INIT;
  int sockfd = -1;
- const char *port = STR(DEFAULT_GIT_PORT);
  char *ep;
  struct hostent *he;
  struct sockaddr_in sa;
@@ -423,7 +420,8 @@ static int git_tcp_connect_sock(char *host, int flags)
  unsigned int nport;
  int cnt;
 
- get_host_and_port(&host, &port);
+ if (!port)
+ port = STR(DEFAULT_GIT_PORT);
 
  if (flags & CONNECT_VERBOSE)
  fprintf(stderr, "Looking up %s ... ", host);
@@ -482,9 +480,10 @@ static int git_tcp_connect_sock(char *host, int flags)
 #endif /* NO_IPV6 */
 
 
-static void git_tcp_connect(int fd[2], char *host, int flags)
+static void git_tcp_connect(int fd[2], const char *host, const char *port,
+    int flags)
 {
- int sockfd = git_tcp_connect_sock(host, flags);
+ int sockfd = git_tcp_connect_sock(host, port, flags);
 
  fd[0] = sockfd;
  fd[1] = dup(sockfd);
@@ -550,18 +549,16 @@ static int git_use_proxy(const char *host)
  return (git_proxy_command && *git_proxy_command);
 }
 
-static struct child_process *git_proxy_connect(int fd[2], char *host)
+static struct child_process *git_proxy_connect(int fd[2], const char *host,
+       const char *port)
 {
- const char *port = STR(DEFAULT_GIT_PORT);
  struct child_process *proxy;
 
- get_host_and_port(&host, &port);
-
  proxy = xmalloc(sizeof(*proxy));
  child_process_init(proxy);
  argv_array_push(&proxy->args, git_proxy_command);
  argv_array_push(&proxy->args, host);
- argv_array_push(&proxy->args, port);
+ argv_array_push(&proxy->args, port ? port : STR(DEFAULT_GIT_PORT));
  proxy->in = -1;
  proxy->out = -1;
  if (start_command(proxy))
@@ -672,7 +669,8 @@ static struct child_process no_fork = CHILD_PROCESS_INIT;
 struct child_process *git_connect(int fd[2], const char *url,
   const char *prog, int flags)
 {
- char *hostandport, *path;
+ char *hostandport, *path, *host;
+ const char *port = NULL;
  struct child_process *conn = &no_fork;
  enum protocol protocol;
  struct strbuf cmd = STRBUF_INIT;
@@ -683,6 +681,8 @@ struct child_process *git_connect(int fd[2], const char *url,
  signal(SIGCHLD, SIG_DFL);
 
  protocol = parse_connect_url(url, &hostandport, &path);
+ host = xstrdup(hostandport);
+ get_host_and_port(&host, &port);
  if ((flags & CONNECT_DIAG_URL) && (protocol != PROTO_SSH)) {
  printf("Diag: url=%s\n", url ? url : "NULL");
  printf("Diag: protocol=%s\n", prot_name(protocol));
@@ -707,9 +707,9 @@ struct child_process *git_connect(int fd[2], const char *url,
  * cannot connect.
  */
  if (git_use_proxy(hostandport))
- conn = git_proxy_connect(fd, hostandport);
+ conn = git_proxy_connect(fd, host, port);
  else
- git_tcp_connect(fd, hostandport, flags);
+ git_tcp_connect(fd, host, port, flags);
  /*
  * Separate original protocol components prog and path
  * from extended host header with a NUL byte.
@@ -737,10 +737,7 @@ struct child_process *git_connect(int fd[2], const char *url,
  if (protocol == PROTO_SSH) {
  const char *ssh;
  int putty = 0, tortoiseplink = 0;
- char *ssh_host = hostandport;
- const char *port = NULL;
  transport_check_allowed("ssh");
- get_host_and_port(&ssh_host, &port);
 
  /* get_host_and_port may not return a port even when
  * there is one: In the [host:port]:path case,
@@ -749,16 +746,17 @@ struct child_process *git_connect(int fd[2], const char *url,
  * In that specific case, we still need to split the
  * port. */
  if (!port)
- port = get_port(ssh_host);
+ port = get_port(host);
 
  if (flags & CONNECT_DIAG_URL) {
  printf("Diag: url=%s\n", url ? url : "NULL");
  printf("Diag: protocol=%s\n", prot_name(protocol));
- printf("Diag: userandhost=%s\n", ssh_host ? ssh_host : "NULL");
+ printf("Diag: userandhost=%s\n", host ? host : "NULL");
  printf("Diag: port=%s\n", port ? port : "NONE");
  printf("Diag: path=%s\n", path ? path : "NULL");
 
  free(hostandport);
+ free(host);
  free(path);
  free(conn);
  return NULL;
@@ -804,7 +802,7 @@ struct child_process *git_connect(int fd[2], const char *url,
  argv_array_push(&conn->args, putty ? "-P" : "-p");
  argv_array_push(&conn->args, port);
  }
- argv_array_push(&conn->args, ssh_host);
+ argv_array_push(&conn->args, host);
  } else {
  transport_check_allowed("file");
  }
@@ -818,6 +816,7 @@ struct child_process *git_connect(int fd[2], const char *url,
  strbuf_release(&cmd);
  }
  free(hostandport);
+ free(host);
  free(path);
  return conn;
 }
--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 3/9] connect: re-derive a host:port string from the separate host and port variables

Mike Hommey-3
In reply to this post by Mike Hommey-3
The last uses of the hostandport variable, besides being strdup'ed
before being split into host and port, is to fill the host header in the
git protocol and to test whether to proxy the request.

Instead of relying on parse_connect_url() to return a host:port string
that makes sense there, re-derive one from the host and port variables.
This will allow to refactor parse_connect_url() to return separate host
and port strings.

Signed-off-by: Mike Hommey <[hidden email]>
---
 connect.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/connect.c b/connect.c
index 0cdbeb2..7cdaed1 100644
--- a/connect.c
+++ b/connect.c
@@ -695,18 +695,32 @@ struct child_process *git_connect(int fd[2], const char *url,
  * connect, unless the user has overridden us in
  * the environment.
  */
- char *target_host = getenv("GIT_OVERRIDE_VIRTUAL_HOST");
- if (target_host)
- target_host = xstrdup(target_host);
- else
- target_host = xstrdup(hostandport);
+ struct strbuf target_host = STRBUF_INIT;
+ struct strbuf virtual_host = STRBUF_INIT;
+ const char *colon = strchr(host, ':');
+ char *override_vhost = getenv("GIT_OVERRIDE_VIRTUAL_HOST");
+
+ /* If the host contains a colon (ipv6 address), it needs to
+ * be enclosed with square brackets. */
+ if (colon)
+ strbuf_addch(&target_host, '[');
+ strbuf_addstr(&target_host, host);
+ if (colon)
+ strbuf_addch(&target_host, ']');
+ if (port) {
+ strbuf_addch(&target_host, ':');
+ strbuf_addstr(&target_host, port);
+ }
+
+ strbuf_addstr(&virtual_host, override_vhost ? override_vhost
+    : target_host.buf);
 
  transport_check_allowed("git");
 
  /* These underlying connection commands die() if they
  * cannot connect.
  */
- if (git_use_proxy(hostandport))
+ if (git_use_proxy(target_host.buf))
  conn = git_proxy_connect(fd, host, port);
  else
  git_tcp_connect(fd, host, port, flags);
@@ -720,8 +734,9 @@ struct child_process *git_connect(int fd[2], const char *url,
  packet_write(fd[1],
      "%s %s%chost=%s%c",
      prog, path, 0,
-     target_host, 0);
- free(target_host);
+     virtual_host.buf, 0);
+ strbuf_release(&virtual_host);
+ strbuf_release(&target_host);
  } else {
  conn = xmalloc(sizeof(*conn));
  child_process_init(conn);
--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 4/9] connect: make parse_connect_url() return separated host and port

Mike Hommey-3
In reply to this post by Mike Hommey-3
Now that nothing besides CONNECT_DIAG_URL is using hostandport, we can
have parse_connect_url() itself do the host and port splitting.

This still leaves "user@" part of the host, if there is one, which will
be addressed in a subsequent change. This however does add /some/
handling of the "user@" part of the host, in order to pick the port
properly.

Signed-off-by: Mike Hommey <[hidden email]>
---
 connect.c             | 43 +++++++++++++++++++++++++------------------
 t/t5500-fetch-pack.sh | 32 +++++++++++++++++++++-----------
 2 files changed, 46 insertions(+), 29 deletions(-)

diff --git a/connect.c b/connect.c
index 7cdaed1..edbf0e2 100644
--- a/connect.c
+++ b/connect.c
@@ -589,10 +589,11 @@ static char *get_port(char *host)
  * The caller must free() the returned strings.
  */
 static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
-       char **ret_path)
+       char **ret_port, char **ret_path)
 {
  char *url;
  char *host, *path;
+ const char *port = NULL;
  char *end;
  int separator = '/';
  enum protocol protocol = PROTO_LOCAL;
@@ -647,7 +648,24 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
  path = xstrdup(path);
  *end = '\0';
 
+ get_host_and_port(&host, &port);
+
+ if (*host && !port) {
+ /* get_host_and_port may not return a port even when there is
+ * one: In the [host:port]:path case, get_host_and_port is
+ * called with "[host:port]" and returns "host:port" and NULL.
+ * In that specific case, we still need to split the port.
+ * "host:port" may also look like "user@host:port". As the
+ * `user` portion tends to be less strict than `host:port`,
+ * we first put it out of the equation: since a hostname
+ * cannot contain a '@', we start from the last '@' in the
+ * string. */
+ char *end_user = strrchr(host, '@');
+ port = get_port(end_user ? end_user : host);
+ }
+
  *ret_host = xstrdup(host);
+ *ret_port = port ? xstrdup(port) : NULL;
  *ret_path = path;
  free(url);
  return protocol;
@@ -669,8 +687,7 @@ static struct child_process no_fork = CHILD_PROCESS_INIT;
 struct child_process *git_connect(int fd[2], const char *url,
   const char *prog, int flags)
 {
- char *hostandport, *path, *host;
- const char *port = NULL;
+ char *host, *port, *path;
  struct child_process *conn = &no_fork;
  enum protocol protocol;
  struct strbuf cmd = STRBUF_INIT;
@@ -680,13 +697,12 @@ struct child_process *git_connect(int fd[2], const char *url,
  */
  signal(SIGCHLD, SIG_DFL);
 
- protocol = parse_connect_url(url, &hostandport, &path);
- host = xstrdup(hostandport);
- get_host_and_port(&host, &port);
+ protocol = parse_connect_url(url, &host, &port, &path);
  if ((flags & CONNECT_DIAG_URL) && (protocol != PROTO_SSH)) {
  printf("Diag: url=%s\n", url ? url : "NULL");
  printf("Diag: protocol=%s\n", prot_name(protocol));
- printf("Diag: hostandport=%s\n", hostandport ? hostandport : "NULL");
+ printf("Diag: userandhost=%s\n", host ? host : "NULL");
+ printf("Diag: port=%s\n", port ? port : "NONE");
  printf("Diag: path=%s\n", path ? path : "NULL");
  conn = NULL;
  } else if (protocol == PROTO_GIT) {
@@ -754,15 +770,6 @@ struct child_process *git_connect(int fd[2], const char *url,
  int putty = 0, tortoiseplink = 0;
  transport_check_allowed("ssh");
 
- /* get_host_and_port may not return a port even when
- * there is one: In the [host:port]:path case,
- * get_host_and_port is called with "[host:port]" and
- * returns "host:port" and NULL.
- * In that specific case, we still need to split the
- * port. */
- if (!port)
- port = get_port(host);
-
  if (flags & CONNECT_DIAG_URL) {
  printf("Diag: url=%s\n", url ? url : "NULL");
  printf("Diag: protocol=%s\n", prot_name(protocol));
@@ -770,8 +777,8 @@ struct child_process *git_connect(int fd[2], const char *url,
  printf("Diag: port=%s\n", port ? port : "NONE");
  printf("Diag: path=%s\n", path ? path : "NULL");
 
- free(hostandport);
  free(host);
+ free(port);
  free(path);
  free(conn);
  return NULL;
@@ -830,8 +837,8 @@ struct child_process *git_connect(int fd[2], const char *url,
  fd[1] = conn->in;  /* write to child's stdin */
  strbuf_release(&cmd);
  }
- free(hostandport);
  free(host);
+ free(port);
  free(path);
  return conn;
 }
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 91a69fc..739c6b1 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -553,7 +553,7 @@ check_prot_path () {
  Diag: protocol=$2
  Diag: path=$3
  EOF
- git fetch-pack --diag-url "$1" | grep -v hostandport= >actual &&
+ git fetch-pack --diag-url "$1" | egrep -v '(host|port)=' >actual &&
  test_cmp expected actual
 }
 
@@ -562,22 +562,17 @@ check_prot_host_port_path () {
  case "$2" in
  *ssh*)
  pp=ssh
- uah=userandhost
- ehost=$(echo $3 | tr -d "[]")
- diagport="Diag: port=$4"
  ;;
  *)
- pp=$p
- uah=hostandport
- ehost=$(echo $3$4 | sed -e "s/22$/:22/" -e "s/NONE//")
- diagport=""
+ pp=$2
  ;;
  esac
+ ehost=$(echo $3 | tr -d "[]")
  cat >exp <<-EOF &&
  Diag: url=$1
  Diag: protocol=$pp
- Diag: $uah=$ehost
- $diagport
+ Diag: userandhost=$ehost
+ Diag: port=$4
  Diag: path=$5
  EOF
  grep -v "^$" exp >expected
@@ -585,7 +580,7 @@ check_prot_host_port_path () {
  test_cmp expected actual
 }
 
-for r in repo re:po re/po
+for r in repo re:po re/po re@po
 do
  # git or ssh with scheme
  for p in "ssh+git" "git+ssh" git ssh
@@ -608,6 +603,9 @@ do
  test_expect_success "fetch-pack --diag-url $p://$h:22/$r" '
  check_prot_host_port_path $p://$h:22/$r $p "$h" 22 "/$r"
  '
+ test_expect_success "fetch-pack --diag-url $p://$h:22/$r" '
+ check_prot_host_port_path $p://$h:22/$r $p "$h" 22 "/$r"
+ '
  done
  done
  # file with scheme
@@ -644,6 +642,18 @@ do
  check_prot_host_port_path $h:/~$r $p "$h" NONE "~$r"
  '
  done
+ #ssh without scheme with port
+ p=ssh
+ for h in host user@host
+ do
+ test_expect_success "fetch-pack --diag-url [$h:22]:$r" '
+ check_prot_host_port_path [$h:22]:$r $p $h 22 "$r"
+ '
+ # Do "/~" -> "~" conversion
+ test_expect_success "fetch-pack --diag-url [$h:22]:/~$r" '
+ check_prot_host_port_path [$h:22]:/~$r $p $h 22 "~$r"
+ '
+ done
 done
 
 test_expect_success MINGW 'fetch-pack --diag-url file://c:/repo' '
--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 5/9] connect: group CONNECT_DIAG_URL handling code

Mike Hommey-3
In reply to this post by Mike Hommey-3
Previous changes made both branches handling CONNECT_DIAG_URL identical.
We can now remove one of those branches and have CONNECT_DIAG_URL be
handled in one place.

Signed-off-by: Mike Hommey <[hidden email]>
---
 connect.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/connect.c b/connect.c
index edbf0e2..c0fad4f 100644
--- a/connect.c
+++ b/connect.c
@@ -698,7 +698,7 @@ struct child_process *git_connect(int fd[2], const char *url,
  signal(SIGCHLD, SIG_DFL);
 
  protocol = parse_connect_url(url, &host, &port, &path);
- if ((flags & CONNECT_DIAG_URL) && (protocol != PROTO_SSH)) {
+ if (flags & CONNECT_DIAG_URL) {
  printf("Diag: url=%s\n", url ? url : "NULL");
  printf("Diag: protocol=%s\n", prot_name(protocol));
  printf("Diag: userandhost=%s\n", host ? host : "NULL");
@@ -770,20 +770,6 @@ struct child_process *git_connect(int fd[2], const char *url,
  int putty = 0, tortoiseplink = 0;
  transport_check_allowed("ssh");
 
- if (flags & CONNECT_DIAG_URL) {
- printf("Diag: url=%s\n", url ? url : "NULL");
- printf("Diag: protocol=%s\n", prot_name(protocol));
- printf("Diag: userandhost=%s\n", host ? host : "NULL");
- printf("Diag: port=%s\n", port ? port : "NONE");
- printf("Diag: path=%s\n", path ? path : "NULL");
-
- free(host);
- free(port);
- free(path);
- free(conn);
- return NULL;
- }
-
  ssh = getenv("GIT_SSH_COMMAND");
  if (!ssh) {
  const char *base;
--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 6/9] connect: make parse_connect_url() return the user part of the url as a separate value

Mike Hommey-3
In reply to this post by Mike Hommey-3
Signed-off-by: Mike Hommey <[hidden email]>
---
 connect.c | 51 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/connect.c b/connect.c
index c0fad4f..48d9cd2 100644
--- a/connect.c
+++ b/connect.c
@@ -588,11 +588,13 @@ static char *get_port(char *host)
  * Extract protocol and relevant parts from the specified connection URL.
  * The caller must free() the returned strings.
  */
-static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
-       char **ret_port, char **ret_path)
+static enum protocol parse_connect_url(const char *url_orig, char **ret_user,
+       char **ret_host, char **ret_port,
+       char **ret_path)
 {
  char *url;
  char *host, *path;
+ const char *user = NULL;
  const char *port = NULL;
  char *end;
  int separator = '/';
@@ -650,20 +652,27 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
 
  get_host_and_port(&host, &port);
 
- if (*host && !port) {
- /* get_host_and_port may not return a port even when there is
- * one: In the [host:port]:path case, get_host_and_port is
- * called with "[host:port]" and returns "host:port" and NULL.
- * In that specific case, we still need to split the port.
- * "host:port" may also look like "user@host:port". As the
- * `user` portion tends to be less strict than `host:port`,
- * we first put it out of the equation: since a hostname
- * cannot contain a '@', we start from the last '@' in the
- * string. */
+ if (*host) {
+ /* At this point, the host part may look like user@host:port.
+ * As the `user` portion tends to be less strict than
+ * `host:port`, we first put it out of the equation: since a
+ * hostname cannot contain a '@', we start from the last '@'
+ * in the string. */
  char *end_user = strrchr(host, '@');
- port = get_port(end_user ? end_user : host);
+ if (end_user) {
+ *end_user = '\0';
+ user = host;
+ host = end_user + 1;
+ }
  }
+ /* get_host_and_port may not have returned a port even when there is
+ * one: In the [host:port]:path case, get_host_and_port is called
+ * with "[host:port]" and returns "host:port" and NULL.
+ * In that specific case, we still need to split the port. */
+ if (!port)
+ port = get_port(host);
 
+ *ret_user = user ? xstrdup(user) : NULL;
  *ret_host = xstrdup(host);
  *ret_port = port ? xstrdup(port) : NULL;
  *ret_path = path;
@@ -687,7 +696,7 @@ static struct child_process no_fork = CHILD_PROCESS_INIT;
 struct child_process *git_connect(int fd[2], const char *url,
   const char *prog, int flags)
 {
- char *host, *port, *path;
+ char *user, *host, *port, *path;
  struct child_process *conn = &no_fork;
  enum protocol protocol;
  struct strbuf cmd = STRBUF_INIT;
@@ -697,11 +706,14 @@ struct child_process *git_connect(int fd[2], const char *url,
  */
  signal(SIGCHLD, SIG_DFL);
 
- protocol = parse_connect_url(url, &host, &port, &path);
+ protocol = parse_connect_url(url, &user, &host, &port, &path);
  if (flags & CONNECT_DIAG_URL) {
  printf("Diag: url=%s\n", url ? url : "NULL");
  printf("Diag: protocol=%s\n", prot_name(protocol));
- printf("Diag: userandhost=%s\n", host ? host : "NULL");
+ if (user)
+ printf("Diag: userandhost=%s@%s\n", user, host);
+ else
+ printf("Diag: userandhost=%s\n", host ? host : "NULL");
  printf("Diag: port=%s\n", port ? port : "NONE");
  printf("Diag: path=%s\n", path ? path : "NULL");
  conn = NULL;
@@ -810,7 +822,11 @@ struct child_process *git_connect(int fd[2], const char *url,
  argv_array_push(&conn->args, putty ? "-P" : "-p");
  argv_array_push(&conn->args, port);
  }
- argv_array_push(&conn->args, host);
+ if (user)
+ argv_array_pushf(&conn->args, "%s@%s",
+ user, host);
+ else
+ argv_array_push(&conn->args, host);
  } else {
  transport_check_allowed("file");
  }
@@ -823,6 +839,7 @@ struct child_process *git_connect(int fd[2], const char *url,
  fd[1] = conn->in;  /* write to child's stdin */
  strbuf_release(&cmd);
  }
+ free(user);
  free(host);
  free(port);
  free(path);
--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 7/9] connect: change the --diag-url output to separate user and host

Mike Hommey-3
In reply to this post by Mike Hommey-3
Signed-off-by: Mike Hommey <[hidden email]>
---
 connect.c             |  6 ++----
 t/t5500-fetch-pack.sh | 14 ++++++++++++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/connect.c b/connect.c
index 48d9cd2..52d34c7 100644
--- a/connect.c
+++ b/connect.c
@@ -710,10 +710,8 @@ struct child_process *git_connect(int fd[2], const char *url,
  if (flags & CONNECT_DIAG_URL) {
  printf("Diag: url=%s\n", url ? url : "NULL");
  printf("Diag: protocol=%s\n", prot_name(protocol));
- if (user)
- printf("Diag: userandhost=%s@%s\n", user, host);
- else
- printf("Diag: userandhost=%s\n", host ? host : "NULL");
+ printf("Diag: user=%s\n", user ? user : "NULL");
+ printf("Diag: host=%s\n", host ? host : "NULL");
  printf("Diag: port=%s\n", port ? port : "NONE");
  printf("Diag: path=%s\n", path ? path : "NULL");
  conn = NULL;
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 739c6b1..2d9c4be 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -553,7 +553,7 @@ check_prot_path () {
  Diag: protocol=$2
  Diag: path=$3
  EOF
- git fetch-pack --diag-url "$1" | egrep -v '(host|port)=' >actual &&
+ git fetch-pack --diag-url "$1" | egrep -v "(user|host|port)=" >actual &&
  test_cmp expected actual
 }
 
@@ -568,10 +568,20 @@ check_prot_host_port_path () {
  ;;
  esac
  ehost=$(echo $3 | tr -d "[]")
+ case "$ehost" in
+ *@*)
+ user=${ehost%@*}
+ ehost=${ehost#$user@}
+ ;;
+ *)
+ user=NULL
+ ;;
+ esac
  cat >exp <<-EOF &&
  Diag: url=$1
  Diag: protocol=$pp
- Diag: userandhost=$ehost
+ Diag: user=$user
+ Diag: host=$ehost
  Diag: port=$4
  Diag: path=$5
  EOF
--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 8/9] connect: actively reject git:// urls with a user part

Mike Hommey-3
In reply to this post by Mike Hommey-3
Currently, urls of the for git://user@host don't work because user@host
is not resolving at the DNS level, but we shouldn't be relying on it
being an invalid host name, and actively reject it for containing a
username in the first place.

Signed-off-by: Mike Hommey <[hidden email]>
---
 connect.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/connect.c b/connect.c
index 52d34c7..04ce210 100644
--- a/connect.c
+++ b/connect.c
@@ -726,6 +726,9 @@ struct child_process *git_connect(int fd[2], const char *url,
  const char *colon = strchr(host, ':');
  char *override_vhost = getenv("GIT_OVERRIDE_VIRTUAL_HOST");
 
+ if (user)
+ die("user@host is not allowed in git:// urls");
+
  /* If the host contains a colon (ipv6 address), it needs to
  * be enclosed with square brackets. */
  if (colon)
--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v7 9/9] connect: move ssh command line preparation to a separate function

Mike Hommey-3
In reply to this post by Mike Hommey-3
Signed-off-by: Mike Hommey <[hidden email]>
---
 connect.c | 108 +++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 58 insertions(+), 50 deletions(-)

diff --git a/connect.c b/connect.c
index 04ce210..ddfda4e 100644
--- a/connect.c
+++ b/connect.c
@@ -680,6 +680,61 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_user,
  return protocol;
 }
 
+static int prepare_ssh_command(struct argv_array *cmd, const char *user,
+       const char *host, const char *port, int flags)
+{
+ const char *ssh;
+ int putty = 0, tortoiseplink = 0, use_shell = 1;
+ transport_check_allowed("ssh");
+
+ ssh = getenv("GIT_SSH_COMMAND");
+ if (!ssh) {
+ const char *base;
+ char *ssh_dup;
+
+ /*
+ * GIT_SSH is the no-shell version of
+ * GIT_SSH_COMMAND (and must remain so for
+ * historical compatibility).
+ */
+ use_shell = 0;
+
+ ssh = getenv("GIT_SSH");
+ if (!ssh)
+ ssh = "ssh";
+
+ ssh_dup = xstrdup(ssh);
+ base = basename(ssh_dup);
+
+ tortoiseplink = !strcasecmp(base, "tortoiseplink") ||
+ !strcasecmp(base, "tortoiseplink.exe");
+ putty = tortoiseplink ||
+ !strcasecmp(base, "plink") ||
+ !strcasecmp(base, "plink.exe");
+
+ free(ssh_dup);
+ }
+
+ argv_array_push(cmd, ssh);
+ if (flags & CONNECT_IPV4)
+ argv_array_push(cmd, "-4");
+ else if (flags & CONNECT_IPV6)
+ argv_array_push(cmd, "-6");
+ if (tortoiseplink)
+ argv_array_push(cmd, "-batch");
+ if (port) {
+ /* P is for PuTTY, p is for OpenSSH */
+ argv_array_push(cmd, putty ? "-P" : "-p");
+ argv_array_push(cmd, port);
+ }
+ if (user)
+ argv_array_pushf(cmd, "%s@%s", user, host);
+ else
+ argv_array_push(cmd, host);
+
+ return use_shell;
+}
+
 static struct child_process no_fork = CHILD_PROCESS_INIT;
 
 /*
@@ -776,59 +831,12 @@ struct child_process *git_connect(int fd[2], const char *url,
 
  /* remove repo-local variables from the environment */
  conn->env = local_repo_env;
- conn->use_shell = 1;
  conn->in = conn->out = -1;
  if (protocol == PROTO_SSH) {
- const char *ssh;
- int putty = 0, tortoiseplink = 0;
- transport_check_allowed("ssh");
-
- ssh = getenv("GIT_SSH_COMMAND");
- if (!ssh) {
- const char *base;
- char *ssh_dup;
-
- /*
- * GIT_SSH is the no-shell version of
- * GIT_SSH_COMMAND (and must remain so for
- * historical compatibility).
- */
- conn->use_shell = 0;
-
- ssh = getenv("GIT_SSH");
- if (!ssh)
- ssh = "ssh";
-
- ssh_dup = xstrdup(ssh);
- base = basename(ssh_dup);
-
- tortoiseplink = !strcasecmp(base, "tortoiseplink") ||
- !strcasecmp(base, "tortoiseplink.exe");
- putty = tortoiseplink ||
- !strcasecmp(base, "plink") ||
- !strcasecmp(base, "plink.exe");
-
- free(ssh_dup);
- }
-
- argv_array_push(&conn->args, ssh);
- if (flags & CONNECT_IPV4)
- argv_array_push(&conn->args, "-4");
- else if (flags & CONNECT_IPV6)
- argv_array_push(&conn->args, "-6");
- if (tortoiseplink)
- argv_array_push(&conn->args, "-batch");
- if (port) {
- /* P is for PuTTY, p is for OpenSSH */
- argv_array_push(&conn->args, putty ? "-P" : "-p");
- argv_array_push(&conn->args, port);
- }
- if (user)
- argv_array_pushf(&conn->args, "%s@%s",
- user, host);
- else
- argv_array_push(&conn->args, host);
+ conn->use_shell = prepare_ssh_command(
+ &conn->args, user, host, port, flags);
  } else {
+ conn->use_shell = 1;
  transport_check_allowed("file");
  }
  argv_array_push(&conn->args, cmd.buf);
--
2.8.3.401.ga81c606.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Eric Sunshine
In reply to this post by Mike Hommey-3
On Sat, May 21, 2016 at 7:17 PM, Mike Hommey <[hidden email]> wrote:

> Signed-off-by: Mike Hommey <[hidden email]>
> ---
> diff --git a/connect.c b/connect.c
> @@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
>                         transport_check_allowed("ssh");
>                         get_host_and_port(&ssh_host, &port);
>
> +                       /* get_host_and_port may not return a port even when
> +                        * there is one: In the [host:port]:path case,
> +                        * get_host_and_port is called with "[host:port]" and
> +                        * returns "host:port" and NULL.
> +                        * In that specific case, we still need to split the
> +                        * port. */

Style:

    /*
     * This is a multi-
     * line comment.
     */

>                         if (!port)
>                                 port = get_port(ssh_host);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Torsten Bögershausen-2
In reply to this post by Mike Hommey-3
On 22.05.16 01:17, Mike Hommey wrote:

> Signed-off-by: Mike Hommey <[hidden email]>
> ---
>  connect.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/connect.c b/connect.c
> index c53f3f1..caa2a3c 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
>   transport_check_allowed("ssh");
>   get_host_and_port(&ssh_host, &port);
>  
> + /* get_host_and_port may not return a port even when
> + * there is one: In the [host:port]:path case,
> + * get_host_and_port is called with "[host:port]" and
> + * returns "host:port" and NULL.
> + * In that specific case, we still need to split the
> + * port. */
Is it worth to mention that this case is "still supported legacy" ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Mike Hommey-3
On Sun, May 22, 2016 at 08:07:05AM +0200, Torsten Bögershausen wrote:

> On 22.05.16 01:17, Mike Hommey wrote:
> > Signed-off-by: Mike Hommey <[hidden email]>
> > ---
> >  connect.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/connect.c b/connect.c
> > index c53f3f1..caa2a3c 100644
> > --- a/connect.c
> > +++ b/connect.c
> > @@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
> >   transport_check_allowed("ssh");
> >   get_host_and_port(&ssh_host, &port);
> >  
> > + /* get_host_and_port may not return a port even when
> > + * there is one: In the [host:port]:path case,
> > + * get_host_and_port is called with "[host:port]" and
> > + * returns "host:port" and NULL.
> > + * In that specific case, we still need to split the
> > + * port. */
> Is it worth to mention that this case is "still supported legacy" ?

If it's worth mentioning anywhere, it seems to me it would start with
urls.txt?

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Torsten Bögershausen-2
On 05/22/2016 10:03 AM, Mike Hommey wrote:

> On Sun, May 22, 2016 at 08:07:05AM +0200, Torsten Bögershausen wrote:
>> On 22.05.16 01:17, Mike Hommey wrote:
>>> Signed-off-by: Mike Hommey <[hidden email]>
>>> ---
>>>   connect.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/connect.c b/connect.c
>>> index c53f3f1..caa2a3c 100644
>>> --- a/connect.c
>>> +++ b/connect.c
>>> @@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
>>>   transport_check_allowed("ssh");
>>>   get_host_and_port(&ssh_host, &port);
>>>  
>>> + /* get_host_and_port may not return a port even when
>>> + * there is one: In the [host:port]:path case,
>>> + * get_host_and_port is called with "[host:port]" and
>>> + * returns "host:port" and NULL.
>>> + * In that specific case, we still need to split the
>>> + * port. */
>> Is it worth to mention that this case is "still supported legacy" ?
> If it's worth mentioning anywhere, it seems to me it would start with
> urls.txt?
>
> Mike
>
I don't know.
urls.txt is for Git users, and connect.c is for Git developers.
urls.txt does not mention that Git follows any RFC when parsing the
URLS', it doesn't claim to be 100% compliant.
Even if it makes sense to do so, as many user simply expect Git to accept
RFC compliant URL's, and it makes the development easier, if there is an
already
written specification, that describes all the details.
The parser is not 100% RFC compliant, one example:
- old-style usgage like "git clone [host:222]:~/path/to/repo are supported
To easy the live for developers, it could make sense why the code is as
it is,
simply to avoid that people around the world suddenly run into problems,
when they upgrade the Git version.
So if there is a comment in the code, it could help new developers to
understand
things easier.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Junio C Hamano
Torsten Bögershausen <[hidden email]> writes:

>>>>   get_host_and_port(&ssh_host, &port);
>>>>   + /* get_host_and_port may not return a port
>>>> even when
>>>> + * there is one: In the [host:port]:path case,
>>>> + * get_host_and_port is called with "[host:port]" and
>>>> + * returns "host:port" and NULL.
>>>> + * In that specific case, we still need to split the
>>>> + * port. */
>>> Is it worth to mention that this case is "still supported legacy" ?
>> If it's worth mentioning anywhere, it seems to me it would start with
>> urls.txt?
>>
>> Mike
>>
> I don't know.
> urls.txt is for Git users, and connect.c is for Git developers.
> urls.txt does not mention that Git follows any RFC when parsing the
> URLS', it doesn't claim to be 100% compliant.
> Even if it makes sense to do so, as many user simply expect Git to accept
> RFC compliant URL's, and it makes the development easier, if there is
> an already
> written specification, that describes all the details.
> The parser is not 100% RFC compliant, one example:
> - old-style usgage like "git clone [host:222]:~/path/to/repo are supported

Is it an option to fix get_host_and_port() so that it returns what
the caller expects even when it is given "[host:port]"?  When the
caller passes other forms like "host:port", it expects to get "host"
and "port" parsed out into two variables.  Why can't the caller
expect to see the same happen when feeding "[host:port]" to the
function?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Mike Hommey-3
On Mon, May 23, 2016 at 02:30:57PM -0700, Junio C Hamano wrote:

> Torsten Bögershausen <[hidden email]> writes:
>
> >>>>   get_host_and_port(&ssh_host, &port);
> >>>>   + /* get_host_and_port may not return a port
> >>>> even when
> >>>> + * there is one: In the [host:port]:path case,
> >>>> + * get_host_and_port is called with "[host:port]" and
> >>>> + * returns "host:port" and NULL.
> >>>> + * In that specific case, we still need to split the
> >>>> + * port. */
> >>> Is it worth to mention that this case is "still supported legacy" ?
> >> If it's worth mentioning anywhere, it seems to me it would start with
> >> urls.txt?
> >>
> >> Mike
> >>
> > I don't know.
> > urls.txt is for Git users, and connect.c is for Git developers.
> > urls.txt does not mention that Git follows any RFC when parsing the
> > URLS', it doesn't claim to be 100% compliant.
> > Even if it makes sense to do so, as many user simply expect Git to accept
> > RFC compliant URL's, and it makes the development easier, if there is
> > an already
> > written specification, that describes all the details.
> > The parser is not 100% RFC compliant, one example:
> > - old-style usgage like "git clone [host:222]:~/path/to/repo are supported

"This parser handles the urls described in urls.txt" is about as much as
there is to say IMHO.

> Is it an option to fix get_host_and_port() so that it returns what
> the caller expects even when it is given "[host:port]"?  When the
> caller passes other forms like "host:port", it expects to get "host"
> and "port" parsed out into two variables.  Why can't the caller
> expect to see the same happen when feeding "[host:port]" to the
> function?

After this series, there's only one use of get_host_and_port(). As well
as one use of host_end() and get_port(). They become implementation
details of the parse_connect_url function. And as I mentioned in the
cover letter this round, parse_connect_url could be further modified to
avoid the kind of back-and-forth that's going on between these
functions. But I'd rather leave that for later.

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Torsten Bögershausen-2
In reply to this post by Junio C Hamano
On 05/23/2016 11:30 PM, Junio C Hamano wrote:

> Torsten Bögershausen <[hidden email]> writes:
>
>>>>>     get_host_and_port(&ssh_host, &port);
>>>>>    + /* get_host_and_port may not return a port
>>>>> even when
>>>>> + * there is one: In the [host:port]:path case,
>>>>> + * get_host_and_port is called with "[host:port]" and
>>>>> + * returns "host:port" and NULL.
>>>>> + * In that specific case, we still need to split the
>>>>> + * port. */
>>>> Is it worth to mention that this case is "still supported legacy" ?
>>> If it's worth mentioning anywhere, it seems to me it would start with
>>> urls.txt?
>>>
>>> Mike
>>>
>> I don't know.
>> urls.txt is for Git users, and connect.c is for Git developers.
>> urls.txt does not mention that Git follows any RFC when parsing the
>> URLS', it doesn't claim to be 100% compliant.
>> Even if it makes sense to do so, as many user simply expect Git to accept
>> RFC compliant URL's, and it makes the development easier, if there is
>> an already
>> written specification, that describes all the details.
>> The parser is not 100% RFC compliant, one example:
>> - old-style usgage like "git clone [host:222]:~/path/to/repo are supported
> Is it an option to fix get_host_and_port() so that it returns what
> the caller expects even when it is given "[host:port]"?  When the
> caller passes other forms like "host:port", it expects to get "host"
> and "port" parsed out into two variables.  Why can't the caller
> expect to see the same happen when feeding "[host:port]" to the
> function?
>
This is somewhat out of my head:
git clone   git://[example.com:123]:/test        #illegal, malformated URL
git clone   [example.com:123]:/test               #scp-like URL, legacy
git clone   ssh://[example.com:123]:/test       #illegal, but supported
as legacy, because
git clone  ssh://[user@::1]/test                       # was the only
way to specify a user name at a literal IPv6 address

May be we should have some  more test cases for malformated git:// URLs?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Mike Hommey-3
On Tue, May 24, 2016 at 06:44:26AM +0200, Torsten Bögershausen wrote:

> On 05/23/2016 11:30 PM, Junio C Hamano wrote:
> > Torsten Bögershausen <[hidden email]> writes:
> >
> > > > > >     get_host_and_port(&ssh_host, &port);
> > > > > >    + /* get_host_and_port may not return a port
> > > > > > even when
> > > > > > + * there is one: In the [host:port]:path case,
> > > > > > + * get_host_and_port is called with "[host:port]" and
> > > > > > + * returns "host:port" and NULL.
> > > > > > + * In that specific case, we still need to split the
> > > > > > + * port. */
> > > > > Is it worth to mention that this case is "still supported legacy" ?
> > > > If it's worth mentioning anywhere, it seems to me it would start with
> > > > urls.txt?
> > > >
> > > > Mike
> > > >
> > > I don't know.
> > > urls.txt is for Git users, and connect.c is for Git developers.
> > > urls.txt does not mention that Git follows any RFC when parsing the
> > > URLS', it doesn't claim to be 100% compliant.
> > > Even if it makes sense to do so, as many user simply expect Git to accept
> > > RFC compliant URL's, and it makes the development easier, if there is
> > > an already
> > > written specification, that describes all the details.
> > > The parser is not 100% RFC compliant, one example:
> > > - old-style usgage like "git clone [host:222]:~/path/to/repo are supported
> > Is it an option to fix get_host_and_port() so that it returns what
> > the caller expects even when it is given "[host:port]"?  When the
> > caller passes other forms like "host:port", it expects to get "host"
> > and "port" parsed out into two variables.  Why can't the caller
> > expect to see the same happen when feeding "[host:port]" to the
> > function?
> >
> This is somewhat out of my head:
> git clone   git://[example.com:123]:/test        #illegal, malformated URL
> git clone   [example.com:123]:/test               #scp-like URL, legacy
> git clone   ssh://[example.com:123]:/test       #illegal, but supported as
> legacy, because
> git clone  ssh://[user@::1]/test                       # was the only way to
> specify a user name at a literal IPv6 address
>
> May be we should have some  more test cases for malformated git:// URLs?

None of these malformed urls are rejected with or without my series
applied:

Without:
$ git fetch-pack --diag-url git://[example.com:123]:/test
Diag: url=git://[example.com:123]:/test
Diag: protocol=git
Diag: hostandport=[example.com:123]:
Diag: path=/test
$ git fetch-pack --diag-url
ssh://[example.com:123]:/test
Diag: url=ssh://[example.com:123]:/test
Diag: protocol=ssh
Diag: userandhost=example.com
Diag: port=123
Diag: path=/test

With:
$ git fetch-pack --diag-url git://[example.com:123]:/test
Diag: url=git://[example.com:123]:/test
Diag: protocol=git
Diag: user=NULL
Diag: host=example.com
Diag: port=123
Diag: path=/test
$ git fetch-pack --diag-url ssh://[example.com:123]:/test
Diag: url=ssh://[example.com:123]:/test
Diag: protocol=ssh
Diag: user=NULL
Diag: host=example.com
Diag: port=123
Diag: path=/test

Note in the first case, hostandport is "[example.com:123]:", and that
is treated as host=example.com:123 and port=NULL further down, so my
series is changing something here, but arguably, it makes it less worse.
(note that both with and without my series,
"git://[example.com:123]:42/path" is treated the same, so only a corner
case changed)

Can we go forward with the current series (modulo the comment style
thing Eric noted, and maybe adding a note about the parser handling urls
as per urls.txt), and not bloat scope it? If anything, the state of the
code after the series should make further parser changes easier.

Cheers,

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

Torsten Bögershausen-2
On 05/26/2016 01:34 AM, Mike Hommey wrote:

> On Tue, May 24, 2016 at 06:44:26AM +0200, Torsten Bögershausen wrote:
>> On 05/23/2016 11:30 PM, Junio C Hamano wrote:
>>> Torsten Bögershausen <[hidden email]> writes:
>>>
>>>>>>>     get_host_and_port(&ssh_host, &port);
>>>>>>>     + /* get_host_and_port may not return a port
>>>>>>> even when
>>>>>>> + * there is one: In the [host:port]:path case,
>>>>>>> + * get_host_and_port is called with "[host:port]" and
>>>>>>> + * returns "host:port" and NULL.
>>>>>>> + * In that specific case, we still need to split the
>>>>>>> + * port. */
>>>>>> Is it worth to mention that this case is "still supported legacy" ?
>>>>> If it's worth mentioning anywhere, it seems to me it would start with
>>>>> urls.txt?
>>>>>
>>>>> Mike
>>>>>
>>>> I don't know.
>>>> urls.txt is for Git users, and connect.c is for Git developers.
>>>> urls.txt does not mention that Git follows any RFC when parsing the
>>>> URLS', it doesn't claim to be 100% compliant.
>>>> Even if it makes sense to do so, as many user simply expect Git to accept
>>>> RFC compliant URL's, and it makes the development easier, if there is
>>>> an already
>>>> written specification, that describes all the details.
>>>> The parser is not 100% RFC compliant, one example:
>>>> - old-style usgage like "git clone [host:222]:~/path/to/repo are supported
>>> Is it an option to fix get_host_and_port() so that it returns what
>>> the caller expects even when it is given "[host:port]"?  When the
>>> caller passes other forms like "host:port", it expects to get "host"
>>> and "port" parsed out into two variables.  Why can't the caller
>>> expect to see the same happen when feeding "[host:port]" to the
>>> function?
>>>
>> This is somewhat out of my head:
>> git clone   git://[example.com:123]:/test        #illegal, malformated URL
>> git clone   [example.com:123]:/test               #scp-like URL, legacy
>> git clone   ssh://[example.com:123]:/test       #illegal, but supported as
>> legacy, because
>> git clone  ssh://[user@::1]/test                       # was the only way to
>> specify a user name at a literal IPv6 address
>>
>> May be we should have some  more test cases for malformated git:// URLs?
>
> None of these malformed urls are rejected with or without my series
> applied:
>
> Without:
> $ git fetch-pack --diag-url git://[example.com:123]:/test
> Diag: url=git://[example.com:123]:/test
> Diag: protocol=git
> Diag: hostandport=[example.com:123]:
> Diag: path=/test
> $ git fetch-pack --diag-url
> ssh://[example.com:123]:/test
> Diag: url=ssh://[example.com:123]:/test
> Diag: protocol=ssh
> Diag: userandhost=example.com
> Diag: port=123
> Diag: path=/test
>
> With:
> $ git fetch-pack --diag-url git://[example.com:123]:/test
> Diag: url=git://[example.com:123]:/test
> Diag: protocol=git
> Diag: user=NULL
> Diag: host=example.com
> Diag: port=123
> Diag: path=/test
> $ git fetch-pack --diag-url ssh://[example.com:123]:/test
> Diag: url=ssh://[example.com:123]:/test
> Diag: protocol=ssh
> Diag: user=NULL
> Diag: host=example.com
> Diag: port=123
> Diag: path=/test
>
> Note in the first case, hostandport is "[example.com:123]:", and that
> is treated as host=example.com:123 and port=NULL further down, so my
> series is changing something here, but arguably, it makes it less worse.
> (note that both with and without my series,
> "git://[example.com:123]:42/path" is treated the same, so only a corner
> case changed)
>
> Can we go forward with the current series (modulo the comment style
> thing Eric noted, and maybe adding a note about the parser handling urls
> as per urls.txt), and not bloat scope it? If anything, the state of the
> code after the series should make further parser changes easier.
>
> Cheers,
>
> Mike
>


Thanks for digging.

How about something like this:

/*
  * get_host_and_port may not return a port in the [host:port]:path case.
  * To support this undocumented legacy we still need to split the port.
*/

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...