[PATCH/RFC 0/6] pack-objects hook for upload-pack

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH/RFC 0/6] pack-objects hook for upload-pack

Jeff King
I've often wanted to intercept the call from upload-pack to
pack-objects. The final patch in this series goes into more detail, but
basically it's good for:

  1. Capturing the output from pack-objects for debugging/inspection.

  2. Capturing the input to pack-objects to replay for debugging or
     performance analysis.

  3. Caching pack-objects output.

It's pretty trivial to add a hook to run instead of pack-objects (and
the hook would just run pack-objects itself). But we don't want to run
hooks in upload-pack, because we don't necessarily trust the repository
we're running in.

So instead we'd have to learn about the hook from the environment, or
from any of the non-repo config files. And that in turn requires some
support from the config code, which is what patches 1-5 are doing (along
with some cleanups along the way).

I think the config refactoring is all pretty sane, and could support
other "dangerous" features besides this particular hook in a way that's
easy to use for site admins (whether they trust the repos they're
serving or not). I've marked this "RFC" because I'd like input on some
details of the final patch; I'll point out my questions separately
there.

  [1/6]: git_config_with_options: drop "found" counting
  [2/6]: git_config_parse_parameter: refactor cleanup code
  [3/6]: config: set up config_source for command-line config
  [4/6]: config: return configset value for current_config_ functions
  [5/6]: config: add a notion of "scope"
  [6/6]: upload-pack: provide a hook for running pack-objects

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 1/6] git_config_with_options: drop "found" counting

Jeff King
Prior to 1f2baa7 (config: treat non-existent config files as
empty, 2010-10-21), we returned an error if any config files
were missing. That commit made this a non-error, but
returned the number of sources found, in case any caller
wanted to distinguish this case.

In the past 5+ years, no caller has; the only two places
which bother to check the return value care only about the
error case.  Let's drop this code, which complicates the
function. Similarly, let's drop the "found anything" return
from git_config_from_parameters, which was present only to
support this (and similarly has never had other callers care
for the past 5+ years).

Note that we do need to update a comment in one of the
callers, even though the code immediately below it doesn't
care about this case.

Signed-off-by: Jeff King <[hidden email]>
---
This technically changes the function interface without breaking
compilation for any topics in flight. But given the history, and that
the now-unhandled case is not actually _useful_, I don't think it's
worth worrying about.

 config.c | 34 +++++++++-------------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/config.c b/config.c
index 096fe03..3f8f6aa 100644
--- a/config.c
+++ b/config.c
@@ -230,7 +230,7 @@ int git_config_from_parameters(config_fn_t fn, void *data)
 
  free(argv);
  free(envw);
- return nr > 0;
+ return 0;
 }
 
 static int get_next_char(void)
@@ -1201,47 +1201,31 @@ int git_config_system(void)
 
 static int do_git_config_sequence(config_fn_t fn, void *data)
 {
- int ret = 0, found = 0;
+ int ret = 0;
  char *xdg_config = xdg_config_home("config");
  char *user_config = expand_user_path("~/.gitconfig");
  char *repo_config = git_pathdup("config");
 
- if (git_config_system() && !access_or_die(git_etc_gitconfig(), R_OK, 0)) {
+ if (git_config_system() && !access_or_die(git_etc_gitconfig(), R_OK, 0))
  ret += git_config_from_file(fn, git_etc_gitconfig(),
     data);
- found += 1;
- }
 
- if (xdg_config && !access_or_die(xdg_config, R_OK, ACCESS_EACCES_OK)) {
+ if (xdg_config && !access_or_die(xdg_config, R_OK, ACCESS_EACCES_OK))
  ret += git_config_from_file(fn, xdg_config, data);
- found += 1;
- }
 
- if (user_config && !access_or_die(user_config, R_OK, ACCESS_EACCES_OK)) {
+ if (user_config && !access_or_die(user_config, R_OK, ACCESS_EACCES_OK))
  ret += git_config_from_file(fn, user_config, data);
- found += 1;
- }
 
- if (repo_config && !access_or_die(repo_config, R_OK, 0)) {
+ if (repo_config && !access_or_die(repo_config, R_OK, 0))
  ret += git_config_from_file(fn, repo_config, data);
- found += 1;
- }
 
- switch (git_config_from_parameters(fn, data)) {
- case -1: /* error */
+ if (git_config_from_parameters(fn, data) < 0)
  die(_("unable to parse command-line config"));
- break;
- case 0: /* found nothing */
- break;
- default: /* found at least one item */
- found++;
- break;
- }
 
  free(xdg_config);
  free(user_config);
  free(repo_config);
- return ret == 0 ? found : ret;
+ return ret;
 }
 
 int git_config_with_options(config_fn_t fn, void *data,
@@ -1276,7 +1260,7 @@ static void git_config_raw(config_fn_t fn, void *data)
  if (git_config_with_options(fn, data, NULL, 1) < 0)
  /*
  * git_config_with_options() normally returns only
- * positive values, as most errors are fatal, and
+ * zero, as most errors are fatal, and
  * non-fatal potential errors are guarded by "if"
  * statements that are entered only when no error is
  * possible.
--
2.8.2.888.gecb1fe3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 2/6] git_config_parse_parameter: refactor cleanup code

Jeff King
In reply to this post by Jeff King
We have several exits from the function, each of which has
to do some cleanup. Let's consolidate these in an "out"
label we can jump to. This doesn't save us much now, but it
will help as we add more things that need cleanup.

Signed-off-by: Jeff King <[hidden email]>
---
 config.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/config.c b/config.c
index 3f8f6aa..eb1e268 100644
--- a/config.c
+++ b/config.c
@@ -205,6 +205,7 @@ int git_config_parse_parameter(const char *text,
 int git_config_from_parameters(config_fn_t fn, void *data)
 {
  const char *env = getenv(CONFIG_DATA_ENVIRONMENT);
+ int ret = 0;
  char *envw;
  const char **argv = NULL;
  int nr = 0, alloc = 0;
@@ -216,21 +217,21 @@ int git_config_from_parameters(config_fn_t fn, void *data)
  envw = xstrdup(env);
 
  if (sq_dequote_to_argv(envw, &argv, &nr, &alloc) < 0) {
- free(envw);
- return error("bogus format in " CONFIG_DATA_ENVIRONMENT);
+ ret = error("bogus format in " CONFIG_DATA_ENVIRONMENT);
+ goto out;
  }
 
  for (i = 0; i < nr; i++) {
  if (git_config_parse_parameter(argv[i], fn, data) < 0) {
- free(argv);
- free(envw);
- return -1;
+ ret = -1;
+ goto out;
  }
  }
 
+out:
  free(argv);
  free(envw);
- return 0;
+ return ret;
 }
 
 static int get_next_char(void)
--
2.8.2.888.gecb1fe3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3/6] config: set up config_source for command-line config

Jeff King
In reply to this post by Jeff King
When we parse a config file, we set up the global "cf"
variable as a pointer to a "struct config_source" describing
the file we are parsing. This is used for error messages, as
well as for lookup functions like current_config_name().

The "cf" variable is NULL in two cases:

  1. When we are parsing command-line config, in which case
     there is no source file.

  2. When we are not parsing any config at all.

Callers like current_config_name() must assume we are in
case 1 if they see a NULL "cf". However, this means that if
they are accidentally used outside of a config parsing
callback, they will quietly return a bogus answer.

This might seem like an unlikely accident (why would you ask
for the current config file if you are not parsing config?),
but it's actually an easy mistake to make due to the
configset caching. git_config() serves the answers from a
configset cache, and any calls to current_config_name() will
claim that we are parsing command-line config, no matter
what the original source.

So let's distinguish these cases by having the command-line
config parser set up a config_source with a NULL name (which
callers already handle properly). We can use this to catch
programming errors in some cases, and to give better
messages to the user in others.

Signed-off-by: Jeff King <[hidden email]>
---
 config.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/config.c b/config.c
index eb1e268..4beba52 100644
--- a/config.c
+++ b/config.c
@@ -131,7 +131,9 @@ static int handle_path_include(const char *path, struct config_include_data *inc
  if (!access_or_die(path, R_OK, 0)) {
  if (++inc->depth > MAX_INCLUDE_DEPTH)
  die(include_depth_advice, MAX_INCLUDE_DEPTH, path,
-    cf && cf->name ? cf->name : "the command line");
+    !cf ? "<unknown>" :
+    cf->name ? cf->name :
+    "the command line");
  ret = git_config_from_file(git_config_include, path, inc);
  inc->depth--;
  }
@@ -210,9 +212,15 @@ int git_config_from_parameters(config_fn_t fn, void *data)
  const char **argv = NULL;
  int nr = 0, alloc = 0;
  int i;
+ struct config_source source;
 
  if (!env)
  return 0;
+
+ memset(&source, 0, sizeof(source));
+ source.prev = cf;
+ cf = &source;
+
  /* sq_dequote will write over it */
  envw = xstrdup(env);
 
@@ -231,6 +239,7 @@ int git_config_from_parameters(config_fn_t fn, void *data)
 out:
  free(argv);
  free(envw);
+ cf = source.prev;
  return ret;
 }
 
@@ -1345,7 +1354,9 @@ static int configset_add_value(struct config_set *cs, const char *key, const cha
  l_item->e = e;
  l_item->value_index = e->value_list.nr - 1;
 
- if (cf) {
+ if (!cf)
+ die("BUG: configset_add_value has no source");
+ if (cf->name) {
  kv_info->filename = strintern(cf->name);
  kv_info->linenr = cf->linenr;
  } else {
@@ -2431,10 +2442,14 @@ int parse_config_key(const char *var,
 
 const char *current_config_origin_type(void)
 {
- return cf && cf->origin_type ? cf->origin_type : "command line";
+ if (!cf)
+ die("BUG: current_config_origin_type called outside config callback");
+ return cf->origin_type ? cf->origin_type : "command line";
 }
 
 const char *current_config_name(void)
 {
- return cf && cf->name ? cf->name : "";
+ if (!cf)
+ die("BUG: current_config_name called outside config callback");
+ return cf->name ? cf->name : "";
 }
--
2.8.2.888.gecb1fe3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 4/6] config: return configset value for current_config_ functions

Jeff King
In reply to this post by Jeff King
When 473166b (config: add 'origin_type' to config_source
struct, 2016-02-19) added accessor functions for the origin
type and name, it taught them only to look at the "cf"
struct that is filled in while we are parsing the config.
This is sufficient to make it work with git-config, which
uses git_config_with_options() under the hood. That function
freshly parses the config files and triggers the callback
when it parses each key.

Most git programs, however, use git_config(). This interface
will populate a cache during the actual parse, and then
serve values from the cache. Calling current_config_filename()
in a callback here will find a NULL cf and produce an error.
There are no such callers right now, but let's prepare for
adding some by making this work.

We already record source information in a struct attached to
each value. We just need to make it globally available and
then consult it from the accessor functions.

Signed-off-by: Jeff King <[hidden email]>
---
 cache.h                |  1 +
 config.c               | 51 +++++++++++++++++++++++++++++++++++++++++---------
 t/helper/test-config.c | 20 ++++++++++++++++++++
 t/t1308-config-set.sh  | 23 +++++++++++++++++++++++
 4 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/cache.h b/cache.h
index 6b80a17..29c579b 100644
--- a/cache.h
+++ b/cache.h
@@ -1693,6 +1693,7 @@ extern int ignore_untracked_cache_config;
 struct key_value_info {
  const char *filename;
  int linenr;
+ const char *origin_type;
 };
 
 extern NORETURN void git_die_config(const char *key, const char *err, ...) __attribute__((format(printf, 2, 3)));
diff --git a/config.c b/config.c
index 4beba52..75afdd7 100644
--- a/config.c
+++ b/config.c
@@ -38,7 +38,24 @@ struct config_source {
  long (*do_ftell)(struct config_source *c);
 };
 
+/*
+ * These variables record the "current" config source, which
+ * can be accessed by parsing callbacks.
+ *
+ * The "cf" variable will be non-NULL only when we are actually parsing a real
+ * config source (file, blob, cmdline, etc).
+ *
+ * The "current_config_kvi" variable will be non-NULL only when we are feeding
+ * cached config from a configset into a callback.
+ *
+ * They should generally never be non-NULL at the same time. If they are both
+ * NULL, then we aren't parsing anything (and depending on the function looking
+ * at the variables, it's either a bug for it to be called in the first place,
+ * or it's a function which can be reused for non-config purposes, and should
+ * fall back to some sane behavior).
+ */
 static struct config_source *cf;
+static struct key_value_info *current_config_kvi;
 
 static int zlib_compression_seen;
 
@@ -1288,16 +1305,20 @@ static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
  struct string_list *values;
  struct config_set_element *entry;
  struct configset_list *list = &cs->list;
- struct key_value_info *kv_info;
 
  for (i = 0; i < list->nr; i++) {
  entry = list->items[i].e;
  value_index = list->items[i].value_index;
  values = &entry->value_list;
- if (fn(entry->key, values->items[value_index].string, data) < 0) {
- kv_info = values->items[value_index].util;
- git_die_config_linenr(entry->key, kv_info->filename, kv_info->linenr);
- }
+
+ current_config_kvi = values->items[value_index].util;
+
+ if (fn(entry->key, values->items[value_index].string, data) < 0)
+ git_die_config_linenr(entry->key,
+      current_config_kvi->filename,
+      current_config_kvi->linenr);
+
+ current_config_kvi = NULL;
  }
 }
 
@@ -1359,10 +1380,12 @@ static int configset_add_value(struct config_set *cs, const char *key, const cha
  if (cf->name) {
  kv_info->filename = strintern(cf->name);
  kv_info->linenr = cf->linenr;
+ kv_info->origin_type = strintern(cf->origin_type);
  } else {
  /* for values read from `git_config_from_parameters()` */
  kv_info->filename = NULL;
  kv_info->linenr = -1;
+ kv_info->origin_type = NULL;
  }
  si->util = kv_info;
 
@@ -2442,14 +2465,24 @@ int parse_config_key(const char *var,
 
 const char *current_config_origin_type(void)
 {
- if (!cf)
+ const char *type;
+ if (current_config_kvi)
+ type = current_config_kvi->origin_type;
+ else if(cf)
+ type = cf->origin_type;
+ else
  die("BUG: current_config_origin_type called outside config callback");
- return cf->origin_type ? cf->origin_type : "command line";
+ return type ? type : "command line";
 }
 
 const char *current_config_name(void)
 {
- if (!cf)
+ const char *name;
+ if (current_config_kvi)
+ name = current_config_kvi->filename;
+ else if (cf)
+ name = cf->name;
+ else
  die("BUG: current_config_name called outside config callback");
- return cf->name ? cf->name : "";
+ return name ? name : "";
 }
diff --git a/t/helper/test-config.c b/t/helper/test-config.c
index 6a77552..3605ef8 100644
--- a/t/helper/test-config.c
+++ b/t/helper/test-config.c
@@ -25,6 +25,9 @@
  * ascending order of priority from a config_set
  * constructed from files entered as arguments.
  *
+ * iterate -> iterate over all values using git_config(), and print some
+ *            data for each
+ *
  * Examples:
  *
  * To print the value with highest priority for key "foo.bAr Baz.rock":
@@ -32,6 +35,20 @@
  *
  */
 
+static int iterate_cb(const char *var, const char *value, void *data)
+{
+ static int nr;
+
+ if (nr++)
+ putchar('\n');
+
+ printf("key=%s\n", var);
+ printf("value=%s\n", value ? value : "(null)");
+ printf("origin=%s\n", current_config_origin_type());
+ printf("name=%s\n", current_config_name());
+
+ return 0;
+}
 
 int main(int argc, char **argv)
 {
@@ -134,6 +151,9 @@ int main(int argc, char **argv)
  printf("Value not found for \"%s\"\n", argv[2]);
  goto exit1;
  }
+ } else if (!strcmp(argv[1], "iterate")) {
+ git_config(iterate_cb, NULL);
+ goto exit0;
  }
 
  die("%s: Please check the syntax and the function name", argv[0]);
diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh
index 005d66d..b241338 100755
--- a/t/t1308-config-set.sh
+++ b/t/t1308-config-set.sh
@@ -229,4 +229,27 @@ test_expect_success 'error on modifying repo config without repo' '
  )
 '
 
+test_expect_success 'iteration shows correct origins' '
+ echo "[foo]bar = from-repo" >.git/config &&
+ echo "[alias]test-config = !test-config" >.gitconfig &&
+ cat >expect <<-EOF &&
+ key=alias.test-config
+ value=!test-config
+ origin=file
+ name=$(pwd)/.gitconfig
+
+ key=foo.bar
+ value=from-repo
+ origin=file
+ name=.git/config
+
+ key=foo.bar
+ value=from-cmdline
+ origin=command line
+ name=
+ EOF
+ git -c foo.bar=from-cmdline test-config iterate >actual &&
+ test_cmp expect actual
+'
+
 test_done
--
2.8.2.888.gecb1fe3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 5/6] config: add a notion of "scope"

Jeff King
In reply to this post by Jeff King
A config callback passed to git_config() doesn't know very
much about the context in which it sees a variable. It can
ask whether the variable comes from a file, and get the file
name. But without analyzing the filename (which is hard to
do accurately), it cannot tell whether it is in system-level
config, user-level config, or repo-specific config.

Generally this doesn't matter; the point of not passing this
to the callback is that it should treat the config the same
no matter where it comes from. But some programs, like
upload-pack, are a special case: we should be able to run
them in an untrusted repository, which means we cannot use
any "dangerous" config from the repository config file (but
it is OK to use it from system or user config).

This patch teaches the config code to record the "scope" of
each variable, and make it available inside config
callbacks, similar to how we give access to the filename.
The scope is the starting source for a particular parsing
operation, and remains the same even if we include other
files (so a .git/config which includes another file will
remain CONFIG_SCOPE_REPO, as it would be similarly
untrusted).

Signed-off-by: Jeff King <[hidden email]>
---
 cache.h                | 11 +++++++++++
 config.c               | 23 +++++++++++++++++++++++
 t/helper/test-config.c | 16 ++++++++++++++++
 t/t1308-config-set.sh  |  3 +++
 4 files changed, 53 insertions(+)

diff --git a/cache.h b/cache.h
index 29c579b..81bd96a 100644
--- a/cache.h
+++ b/cache.h
@@ -1601,6 +1601,16 @@ extern const char *get_log_output_encoding(void);
 extern const char *get_commit_output_encoding(void);
 
 extern int git_config_parse_parameter(const char *, config_fn_t fn, void *data);
+
+enum config_scope {
+ CONFIG_SCOPE_UNKNOWN = 0,
+ CONFIG_SCOPE_SYSTEM,
+ CONFIG_SCOPE_GLOBAL,
+ CONFIG_SCOPE_REPO,
+ CONFIG_SCOPE_CMDLINE,
+};
+
+extern enum config_scope current_config_scope(void);
 extern const char *current_config_origin_type(void);
 extern const char *current_config_name(void);
 
@@ -1694,6 +1704,7 @@ struct key_value_info {
  const char *filename;
  int linenr;
  const char *origin_type;
+ enum config_scope scope;
 };
 
 extern NORETURN void git_die_config(const char *key, const char *err, ...) __attribute__((format(printf, 2, 3)));
diff --git a/config.c b/config.c
index 75afdd7..995e886 100644
--- a/config.c
+++ b/config.c
@@ -57,6 +57,15 @@ struct config_source {
 static struct config_source *cf;
 static struct key_value_info *current_config_kvi;
 
+/*
+ * Similar to the variables above, this gives access to the "scope" of the
+ * current value (repo, global, etc). For cached values, it can be found via
+ * the current_config_kvi as above. During parsing, the current value can be
+ * found in this variable. It's not part of "cf" because it transcends a single
+ * file (i.e., a file included from .git/config is still in "repo" scope).
+ */
+static enum config_scope current_parsing_scope;
+
 static int zlib_compression_seen;
 
 /*
@@ -1233,22 +1242,27 @@ static int do_git_config_sequence(config_fn_t fn, void *data)
  char *user_config = expand_user_path("~/.gitconfig");
  char *repo_config = git_pathdup("config");
 
+ current_parsing_scope = CONFIG_SCOPE_SYSTEM;
  if (git_config_system() && !access_or_die(git_etc_gitconfig(), R_OK, 0))
  ret += git_config_from_file(fn, git_etc_gitconfig(),
     data);
 
+ current_parsing_scope = CONFIG_SCOPE_GLOBAL;
  if (xdg_config && !access_or_die(xdg_config, R_OK, ACCESS_EACCES_OK))
  ret += git_config_from_file(fn, xdg_config, data);
 
  if (user_config && !access_or_die(user_config, R_OK, ACCESS_EACCES_OK))
  ret += git_config_from_file(fn, user_config, data);
 
+ current_parsing_scope = CONFIG_SCOPE_REPO;
  if (repo_config && !access_or_die(repo_config, R_OK, 0))
  ret += git_config_from_file(fn, repo_config, data);
 
+ current_parsing_scope = CONFIG_SCOPE_CMDLINE;
  if (git_config_from_parameters(fn, data) < 0)
  die(_("unable to parse command-line config"));
 
+ current_parsing_scope = CONFIG_SCOPE_UNKNOWN;
  free(xdg_config);
  free(user_config);
  free(repo_config);
@@ -1387,6 +1401,7 @@ static int configset_add_value(struct config_set *cs, const char *key, const cha
  kv_info->linenr = -1;
  kv_info->origin_type = NULL;
  }
+ kv_info->scope = current_parsing_scope;
  si->util = kv_info;
 
  return 0;
@@ -2486,3 +2501,11 @@ const char *current_config_name(void)
  die("BUG: current_config_name called outside config callback");
  return name ? name : "";
 }
+
+enum config_scope current_config_scope(void)
+{
+ if (current_config_kvi)
+ return current_config_kvi->scope;
+ else
+ return current_parsing_scope;
+}
diff --git a/t/helper/test-config.c b/t/helper/test-config.c
index 3605ef8..509aeef 100644
--- a/t/helper/test-config.c
+++ b/t/helper/test-config.c
@@ -35,6 +35,21 @@
  *
  */
 
+static const char *scope_name(enum config_scope scope)
+{
+ switch (scope) {
+ case CONFIG_SCOPE_SYSTEM:
+ return "system";
+ case CONFIG_SCOPE_GLOBAL:
+ return "global";
+ case CONFIG_SCOPE_REPO:
+ return "repo";
+ case CONFIG_SCOPE_CMDLINE:
+ return "cmdline";
+ default:
+ return "unknown";
+ }
+}
 static int iterate_cb(const char *var, const char *value, void *data)
 {
  static int nr;
@@ -46,6 +61,7 @@ static int iterate_cb(const char *var, const char *value, void *data)
  printf("value=%s\n", value ? value : "(null)");
  printf("origin=%s\n", current_config_origin_type());
  printf("name=%s\n", current_config_name());
+ printf("scope=%s\n", scope_name(current_config_scope()));
 
  return 0;
 }
diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh
index b241338..486b41c 100755
--- a/t/t1308-config-set.sh
+++ b/t/t1308-config-set.sh
@@ -237,16 +237,19 @@ test_expect_success 'iteration shows correct origins' '
  value=!test-config
  origin=file
  name=$(pwd)/.gitconfig
+ scope=global
 
  key=foo.bar
  value=from-repo
  origin=file
  name=.git/config
+ scope=repo
 
  key=foo.bar
  value=from-cmdline
  origin=command line
  name=
+ scope=cmdline
  EOF
  git -c foo.bar=from-cmdline test-config iterate >actual &&
  test_cmp expect actual
--
2.8.2.888.gecb1fe3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 6/6] upload-pack: provide a hook for running pack-objects

Jeff King
In reply to this post by Jeff King
When upload-pack serves a client request, it turns to
pack-objects to do the heavy lifting of creating a
packfile. There's no easy way to intercept the call to
pack-objects, but there are a few good reasons to want to do
so:

  1. If you're debugging a client or server issue with
     fetching, you may want to store a copy of the generated
     packfile.

  2. If you're gathering data from real-world fetches for
     performance analysis or debugging, storing a copy of
     the arguments and stdin lets you replay the pack
     generation at your leisure.

  3. You may want to insert a caching layer around
     pack-objects; it is the most CPU- and memory-intensive
     part of serving a fetch, and its output is a pure
     function[1] of its input, making it an ideal place to
     consolidate identical requests.

This patch adds a simple "hook" interface to intercept calls
to pack-objects. The new test demonstrates how it can be
used for debugging (using it for caching is a
straightforward extension; the tricky part is writing the
actual caching layer).

This hook is unlike the normal hook scripts found in the
"hooks/" directory of a repository. Because we promise that
upload-pack is safe to run in an untrusted repository, we
cannot execute arbitrary code or commands found in the
repository (neither in hooks/, nor in the config). So
instead, this hook is triggered from a config variable that
is explicitly ignored in the per-repo config.

The config variable holds the actual shell command to run as
the hook.  Another approach would be to simply treat it as a
boolean: "should I respect the upload-pack hooks in this
repo?", and then run the script from "hooks/" as we usually
do. However, that isn't as flexible; there's no way to run a
hook approved by the site administrator (e.g., in
"/etc/gitconfig") on a repository whose contents are not
trusted. The approach taken by this patch is more
fine-grained, if a little less conventional for git hooks
(it does behave similar to other configured commands like
diff.external, etc).

[1] Pack-objects isn't _actually_ a pure function. Its
    output depends on the exact packing of the object
    database, and if multi-threading is used for delta
    compression, can even differ racily. But for the
    purposes of caching, that's OK; of the many possible
    outputs for a given input, it is sufficient only that we
    output one of them.

Signed-off-by: Jeff King <[hidden email]>
---
 Documentation/config.txt     | 15 +++++++++++
 t/t5544-pack-objects-hook.sh | 62 ++++++++++++++++++++++++++++++++++++++++++++
 upload-pack.c                | 13 +++++++++-
 3 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100755 t/t5544-pack-objects-hook.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e4cd291..b9b0541 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2872,6 +2872,21 @@ uploadpack.keepAlive::
  `uploadpack.keepAlive` seconds. Setting this option to 0
  disables keepalive packets entirely. The default is 5 seconds.
 
+uploadpack.packObjectsHook::
+ If this option is set, when `upload-pack` would run
+ `git pack-objects` to create a packfile for a client, it will
+ run this shell command instead.  The `pack-objects` command and
+ arguments it _would_ have run (including the `git pack-objects`
+ at the beginning) are appended to the shell command. The stdin
+ and stdout of the hook are treated as if `pack-objects` itself
+ was run. I.e., `upload-pack` will feed input intended for
+ `pack-objects` to the hook, and expects a completed packfile on
+ stdout.
++
+Note that this configuration variable is ignored if it is seen in the
+repository-level config (this is a safety measure against fetching from
+untrusted repositories).
+
 url.<base>.insteadOf::
  Any URL that starts with this value will be rewritten to
  start, instead, with <base>. In cases where some site serves a
diff --git a/t/t5544-pack-objects-hook.sh b/t/t5544-pack-objects-hook.sh
new file mode 100755
index 0000000..4357af1
--- /dev/null
+++ b/t/t5544-pack-objects-hook.sh
@@ -0,0 +1,62 @@
+#!/bin/sh
+
+test_description='test custom script in place of pack-objects'
+. ./test-lib.sh
+
+test_expect_success 'create some history to fetch' '
+ test_commit one &&
+ test_commit two
+'
+
+test_expect_success 'create debugging hook script' '
+ write_script .git/hook <<-\EOF
+ echo >&2 "hook running"
+ echo "$*" >hook.args
+ cat >hook.stdin
+ "$@" <hook.stdin >hook.stdout
+ cat hook.stdout
+ EOF
+'
+
+clear_hook_results () {
+ rm -rf .git/hook.* dst.git
+}
+
+test_expect_success 'hook runs via global config' '
+ clear_hook_results &&
+ test_config_global uploadpack.packObjectsHook ./hook &&
+ git clone --no-local . dst.git 2>stderr &&
+ grep "hook running" stderr
+'
+
+test_expect_success 'hook outputs are sane' '
+ # check that we recorded a usable pack
+ git index-pack --stdin <.git/hook.stdout &&
+
+ # check that we recorded args and stdin. We do not check
+ # the full argument list or the exact pack contents, as it would make
+ # the test brittle. So just sanity check that we could replay
+ # the packing procedure.
+ grep "^git" .git/hook.args &&
+ $(cat .git/hook.args) <.git/hook.stdin >replay
+'
+
+test_expect_success 'hook runs from -c config' '
+ clear_hook_results &&
+ git clone --no-local \
+  -u "git -c uploadpack.packObjectsHook=./hook upload-pack" \
+  . dst.git 2>stderr &&
+ grep "hook running" stderr
+'
+
+test_expect_success 'hook does not run from repo config' '
+ clear_hook_results &&
+ test_config uploadpack.packObjectsHook "./hook" &&
+ git clone --no-local . dst.git 2>stderr &&
+ ! grep "hook running" stderr &&
+ test_path_is_missing .git/hook.args &&
+ test_path_is_missing .git/hook.stdin &&
+ test_path_is_missing .git/hook.stdout
+'
+
+test_done
diff --git a/upload-pack.c b/upload-pack.c
index f19444d..8979be6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -52,6 +52,7 @@ static int keepalive = 5;
 static int use_sideband;
 static int advertise_refs;
 static int stateless_rpc;
+static const char *pack_objects_hook;
 
 static void reset_timeout(void)
 {
@@ -93,6 +94,14 @@ static void create_pack_file(void)
  int i;
  FILE *pipe_fd;
 
+ if (!pack_objects_hook)
+ pack_objects.git_cmd = 1;
+ else {
+ argv_array_push(&pack_objects.args, pack_objects_hook);
+ argv_array_push(&pack_objects.args, "git");
+ pack_objects.use_shell = 1;
+ }
+
  if (shallow_nr) {
  argv_array_push(&pack_objects.args, "--shallow-file");
  argv_array_push(&pack_objects.args, "");
@@ -115,7 +124,6 @@ static void create_pack_file(void)
  pack_objects.in = -1;
  pack_objects.out = -1;
  pack_objects.err = -1;
- pack_objects.git_cmd = 1;
 
  if (start_command(&pack_objects))
  die("git upload-pack: unable to fork git-pack-objects");
@@ -812,6 +820,9 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
  keepalive = git_config_int(var, value);
  if (!keepalive)
  keepalive = -1;
+ } else if (current_config_scope() != CONFIG_SCOPE_REPO) {
+ if (!strcmp("uploadpack.packobjectshook", var))
+ return git_config_string(&pack_objects_hook, var, value);
  }
  return parse_hide_refs_config(var, value, "uploadpack");
 }
--
2.8.2.888.gecb1fe3
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 4/6] config: return configset value for current_config_ functions

Jeff King
In reply to this post by Jeff King
On Wed, May 18, 2016 at 06:43:23PM -0400, Jeff King wrote:

>  cache.h                |  1 +
>  config.c               | 51 +++++++++++++++++++++++++++++++++++++++++---------
>  t/helper/test-config.c | 20 ++++++++++++++++++++
>  t/t1308-config-set.sh  | 23 +++++++++++++++++++++++
> [...]
> +test_expect_success 'iteration shows correct origins' '
> + echo "[alias]test-config = !test-config" >.gitconfig &&
> [...]
> + git -c foo.bar=from-cmdline test-config iterate >actual &&

While writing and testing this, I got bit by e6e7530 (test helpers: move
test-* to t/helper/ subdirectory, 2016-04-13). I had an old test-config
binary leftover in the root of my repository, and the new one was
correctly built in t/helper/. Running "test-config" is fine, but inside
the git alias, it sticks the repository root at the front of $PATH
(because it's the exec-path). And so it ran the old version of
test-config, which did not understand my new "iterate" option.

Now I'll admit what I'm doing here is pretty funny (running test-* from
an alias). I'm doing it because I want to see how the program operates
with the "-c" config, and it's nicer to spell it as a user would,
instead of munging $GIT_CONFIG_PARAMETERS directly.

So I'm not sure if it's worth working around or not. The single tree
state produced by this commit is fine, but it does behave badly if
there's leftover cruft from a pre-e6e7530 build. A more robust version
would look more like:

  sq=\' ;# to ease quoting later
  ...
  GIT_CONFIG_PARAMETERS=${sq}foo.bar=from-cmdline${sq} test-config ...

Which is ugly, but it's probably worth it to avoid the flakiness.

The other option is to somehow make bin-wrappers more robust. E.g., it
would be nice if we didn't actually point into the repository root
directly, but rather somehow linked all of the git-* entries that
_would_ be installed into the exec-path into a fake exec-path (or
alternatively, actually build them directly into that fake exec-path).

That's a much bigger change, though. Given how unlikely the sequence of
steps in my test is, maybe it's better to just work around it in this
one case.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 6/6] upload-pack: provide a hook for running pack-objects

Jeff King
In reply to this post by Jeff King
On Wed, May 18, 2016 at 06:45:37PM -0400, Jeff King wrote:

> @@ -93,6 +94,14 @@ static void create_pack_file(void)
>   int i;
>   FILE *pipe_fd;
>  
> + if (!pack_objects_hook)
> + pack_objects.git_cmd = 1;
> + else {
> + argv_array_push(&pack_objects.args, pack_objects_hook);
> + argv_array_push(&pack_objects.args, "git");
> + pack_objects.use_shell = 1;
> + }

I waffled on the "shell" thing here. It's more flexible, and matches
other commands we let the user specify (like diff.external). But it also
makes it harder to do more hook-like things, like "run this hook if it
exists and has the executable bit set".

So should it just be a path to a single command/script?

With the shell thing, you can set it to:

  test -e /path/to/script && /path/to/script ...

if you want, but that's a little more arcane (and incurs an extra shell
invocation that we could otherwise skip).

And on the same note, if it _were_ just path-to-script, would we want it
to skip the hook if it's missing or not executable? For normal hooks, if
that's the case, we know it's because the user doesn't want to use the
hook. But here, the user has taken an overt action to tell us about the
hook; would it be surprising if we silently skipped it because it wasn't
available?

I like that it would "fail open", so a misconfiguration doesn't break a
server. But maybe people would find that off-putting.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 6/6] upload-pack: provide a hook for running pack-objects

Ævar Arnfjörð Bjarmason
In reply to this post by Jeff King
On Thu, May 19, 2016 at 12:45 AM, Jeff King <[hidden email]> wrote:
>   3. You may want to insert a caching layer around
>      pack-objects; it is the most CPU- and memory-intensive
>      part of serving a fetch, and its output is a pure
>      function[1] of its input, making it an ideal place to
>      consolidate identical requests.

Cool to see this on the list after we talked briefly about this at Git
Merge. Being able to cache this so simply is a great optimization.

As I recall you guys at GitHub ended up writing your own utility to
cache output depending on stdin/argv because none existed already.

If anyone on-list knows about a generic command-line utility like that
(because apparently Peff couldn't think of any, and neither can I)
that would be useful to know.

> This hook is unlike the normal hook scripts found in the
> "hooks/" directory of a repository. Because we promise that
> upload-pack is safe to run in an untrusted repository, we
> cannot execute arbitrary code or commands found in the
> repository (neither in hooks/, nor in the config). So
> instead, this hook is triggered from a config variable that
> is explicitly ignored in the per-repo config.

So do I understand correctly that you're trying to guard against the
case where you e.g.:

    rsync untrusted.example.com:/tmp/poison.git /tmp/
    git clone /tmp/poison.git /tmp/safe.git

Not hosing your system if the poison.git/config has a
uploadpack.packObjectsHook that's "sudo rm -rf /".

And similarly having this run the hook on the remote:

    # foo.example.com has a /etc/gitconfig with
uploadpack.packObjectsHook "sudo rm -rf /";
    echo -n | ssh foo.example.com "git upload-pack /tmp/poison.git

But not this:

    # bar.example.com has a /tmp/poison.git/config with
uploadpack.packObjectsHook "sudo rm -rf /";
    echo -n | ssh foo.example.com "git upload-pack /tmp/poison.git

We've already accepted that "push" hooks like the pre-receive or
update hook can do something malicious like this, so on one hand maybe
we should say if you scp raw *.git repositories with hooks this sort
of thing might happen, or if you ssh to a remote box and run their
per-repo hooks it's really their problem to make sure their users
don't run malicious hooks on your behalf.

But I agree with you (if I've understand what this actually does) that
saying that it's always safe to "git clone" a repository is more
valuable and worth jumping through some hoops for.

But as you point out this makes the hook interface a bit unusual.
Wouldn't this give us the same security and normalize the hook
interface:

 * Don't do the uploadpack.packObjectsHook variable, just have a
normal "pack-objects" hook that works like any other git hook
 * By default we don't run this hook unless core.runDangerousHooks (or
whatever we call it) is true.
 * The core.runDangerousHooks variable cannot be set on a per-repo
basis using your new config facility.
 * If there's a pack-objects hook and core.runDangerousHooks isn't
true we warn "not executing potentially unsafe hook $path_to_hook" and
carry on

This would allow use-cases that are a bit inconvenient with your patch
(again, if I'm understanding it correctly):

 * I can set core.runDangerousHooks=true in /etc/gitconfig on my git
server because I also control all the repos, and I want to experiment
with trying this on a per-repo basis for users that are cloning from
me.
 * I can similarly play with this locally knowing I'm only cloning
repos I trust by setting core.runDangerousHooks=true in ~/.gitconfig
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 6/6] upload-pack: provide a hook for running pack-objects

Jeff King
On Thu, May 19, 2016 at 12:12:43PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Thu, May 19, 2016 at 12:45 AM, Jeff King <[hidden email]> wrote:
> >   3. You may want to insert a caching layer around
> >      pack-objects; it is the most CPU- and memory-intensive
> >      part of serving a fetch, and its output is a pure
> >      function[1] of its input, making it an ideal place to
> >      consolidate identical requests.
>
> Cool to see this on the list after we talked briefly about this at Git
> Merge. Being able to cache this so simply is a great optimization.
>
> As I recall you guys at GitHub ended up writing your own utility to
> cache output depending on stdin/argv because none existed already.

Yeah, we do have such a tool internally. It's possible we may one day
open-source that, but there aren't plans to do so right now.

I don't know whether this kind of caching would be useful to most sites
or not. It's good if you have lots of clients asking you for the same
thing at roughly the same time (say, somebody using "git pull" as a
deploy mechanism from their AWS cluster), but otherwise not.

> So do I understand correctly that you're trying to guard against the
> case where you e.g.:
>
>     rsync untrusted.example.com:/tmp/poison.git /tmp/
>     git clone /tmp/poison.git /tmp/safe.git
>
> Not hosing your system if the poison.git/config has a
> uploadpack.packObjectsHook that's "sudo rm -rf /".

I'm not that worried about this case, as it's just not that common.  I
think we're more concerned with two cases:

  1. multi-user servers where you ssh as yourself, but then access
     repositories owned by somebody else. This is basically the ssh case
     you described later.

  2. hosting sites that run git-daemon as the "daemon" user, but serve
     repositories owned by random untrusted users (where you would not
     want those users to run arbitrary code as "daemon").

> We've already accepted that "push" hooks like the pre-receive or
> update hook can do something malicious like this, so on one hand maybe
> we should say if you scp raw *.git repositories with hooks this sort
> of thing might happen, or if you ssh to a remote box and run their
> per-repo hooks it's really their problem to make sure their users
> don't run malicious hooks on your behalf.

Yeah, we make no promises for repositories that you push to. It's _only_
for the fetching side. It's kind of a funny distinction, but it's one we
have maintained since the beginning of git, and I do think there are
real sites that depend on it (see, e.g., the history of the
post-upload-pack hook added in the v1.6.x time frame).

Rsyncing a repository is generally of questionable safety. It's OK to
fetch from the result, but certainly not to run "git log" (which can run
arbitrary commands via external diff, etc).

> But as you point out this makes the hook interface a bit unusual.
> Wouldn't this give us the same security and normalize the hook
> interface:
>
>  * Don't do the uploadpack.packObjectsHook variable, just have a
> normal "pack-objects" hook that works like any other git hook
>  * By default we don't run this hook unless core.runDangerousHooks (or
> whatever we call it) is true.
>  * The core.runDangerousHooks variable cannot be set on a per-repo
> basis using your new config facility.
>  * If there's a pack-objects hook and core.runDangerousHooks isn't
> true we warn "not executing potentially unsafe hook $path_to_hook" and
> carry on

This is the "could we just set a bool" option I discussed in the commit
message. The problem is that it doesn't let the admin say "I don't trust
these repositories, but I _do_ want to run just this one hook when
serving them, and not any other hooks".

> This would allow use-cases that are a bit inconvenient with your patch
> (again, if I'm understanding it correctly):
>
>  * I can set core.runDangerousHooks=true in /etc/gitconfig on my git
> server because I also control all the repos, and I want to experiment
> with trying this on a per-repo basis for users that are cloning from
> me.
>  * I can similarly play with this locally knowing I'm only cloning
> repos I trust by setting core.runDangerousHooks=true in ~/.gitconfig

Yes, those use cases are not well served by the git config alone. But
you can do them (and much more) once your trusted hook is running (by
checking $GIT_DIR, or looking in a database, or whatever you want).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 6/6] upload-pack: provide a hook for running pack-objects

Ævar Arnfjörð Bjarmason
On Thu, May 19, 2016 at 2:08 PM, Jeff King <[hidden email]> wrote:

> On Thu, May 19, 2016 at 12:12:43PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> But as you point out this makes the hook interface a bit unusual.
>> Wouldn't this give us the same security and normalize the hook
>> interface:
>>
>>  * Don't do the uploadpack.packObjectsHook variable, just have a
>> normal "pack-objects" hook that works like any other git hook
>>  * By default we don't run this hook unless core.runDangerousHooks (or
>> whatever we call it) is true.
>>  * The core.runDangerousHooks variable cannot be set on a per-repo
>> basis using your new config facility.
>>  * If there's a pack-objects hook and core.runDangerousHooks isn't
>> true we warn "not executing potentially unsafe hook $path_to_hook" and
>> carry on
>
> This is the "could we just set a bool" option I discussed in the commit
> message. The problem is that it doesn't let the admin say "I don't trust
> these repositories, but I _do_ want to run just this one hook when
> serving them, and not any other hooks".

Indeed. I wonder if there's really any overlap in practice between
systems administrators on a central Git server that are going to want
this relatively obscure feature *but* have potentially malicious users
/ repos, or enough to warrant this unusual edge case in how this
specific hook is configured, as opposed to reducing the special case
in how the hook is run with something like core.runDangerousHooks.

I'm definitely not saying that these patches should be blocked by
this, but it occurs to me that both your uploadpack.packObjectsHook
implementation and my proposed core.runDangerousHooks which normalizes
it a bit in some ways, but leaves it as a special case in others, are
both stumbling their way toward hacks that we might also solve with
some generally configurable restrictions system that takes advantage
of your earlier patches, e.g.:

    $ cat /etc/gitconfig
    # Not "repository" so hooksPath can't be set per-repo
    [core]
        configRestriction                 = "core.hooksPath: system, global"
        hooksPath                            = "/etc/git/hooks"
        disableHook.pack-objects = false # "true" by default
    $ ls /etc/git/hooks/
    # pre-receive, update etc. would just wrap the repository hook,
    # but pack-objects is global
    pre-receive update pack-objects, [...]

Of course those are some rather large hoops to jump through just to
accomplish this particular thing, but it would be more generally
composable and you could e.g. say users can't disable gc.auto or
whatever on their repos if they're hosted on your server. Which of
course assumes that you control the git binary and they can't run
their own.

>> This would allow use-cases that are a bit inconvenient with your patch
>> (again, if I'm understanding it correctly):
>>
>>  * I can set core.runDangerousHooks=true in /etc/gitconfig on my git
>> server because I also control all the repos, and I want to experiment
>> with trying this on a per-repo basis for users that are cloning from
>> me.
>>  * I can similarly play with this locally knowing I'm only cloning
>> repos I trust by setting core.runDangerousHooks=true in ~/.gitconfig
>
> Yes, those use cases are not well served by the git config alone. But
> you can do them (and much more) once your trusted hook is running (by
> checking $GIT_DIR, or looking in a database, or whatever you want).

Yeah, the reason I'm prodding you about this is because I want to test
this out at some point, and a *really* nice thing about the Git
configuration facility is that you can test all these sorts of things
on a per-repo basis now due to how all the git-config variables work
now.

With uploadpack.packObjectsHook you *can* do that by defining a global
pass-through hook, but it makes it more of a hassle to test changes
that straddle the divide between testing & production.

I.e. I might test this on one repo on our main git server, or on one
active repo, i.e. I don't have to deal with the case where I make some
silly syntax/logic error in the uploadpack.packObjectsHook dispatch
code (which is only needed for the security consideration) and that
impacts all repositories on the machine.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH/RFC 0/6] pack-objects hook for upload-pack

Junio C Hamano
In reply to this post by Jeff King
On Wed, May 18, 2016 at 3:37 PM, Jeff King <[hidden email]> wrote:

> I've often wanted to intercept the call from upload-pack to
> pack-objects. The final patch in this series goes into more detail, but
> basically it's good for:
>
>   1. Capturing the output from pack-objects for debugging/inspection.
>
>   2. Capturing the input to pack-objects to replay for debugging or
>      performance analysis.
>
>   3. Caching pack-objects output.
>
> It's pretty trivial to add a hook to run instead of pack-objects (and
> the hook would just run pack-objects itself). But we don't want to run
> hooks in upload-pack, because we don't necessarily trust the repository
> we're running in.

Although I'd need to study the final step a bit more carefully than I did,
overall I think these are good changes. The way the callbacks learn
about the origin of the configuration may have to be rethought in the
long run, though. We already have been relying on a filename thing
kept separately as a global variable, and the approach taken by this
series extends it, so we are not making anything fundamentally worse,
but at some point we may need to bite the bullet and pass kv-info as
an extra callback parameter or something.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 6/6] upload-pack: provide a hook for running pack-objects

Jeff King
In reply to this post by Ævar Arnfjörð Bjarmason
On Thu, May 19, 2016 at 04:54:37PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > This is the "could we just set a bool" option I discussed in the commit
> > message. The problem is that it doesn't let the admin say "I don't trust
> > these repositories, but I _do_ want to run just this one hook when
> > serving them, and not any other hooks".
>
> Indeed. I wonder if there's really any overlap in practice between
> systems administrators on a central Git server that are going to want
> this relatively obscure feature *but* have potentially malicious users
> / repos, or enough to warrant this unusual edge case in how this
> specific hook is configured, as opposed to reducing the special case
> in how the hook is run with something like core.runDangerousHooks.

I dunno. Certainly I am not running such a site. But something like
kernel.org might fit into that boat. For a long time I think people had
actual shell accounts and a common git-daemon served the repositories. I
think that these days it might be more locked-down, though.

> I'm definitely not saying that these patches should be blocked by
> this, but it occurs to me that both your uploadpack.packObjectsHook
> implementation and my proposed core.runDangerousHooks which normalizes
> it a bit in some ways, but leaves it as a special case in others, are
> both stumbling their way toward hacks that we might also solve with
> some generally configurable restrictions system that takes advantage
> of your earlier patches, e.g.:
>
>     $ cat /etc/gitconfig
>     # Not "repository" so hooksPath can't be set per-repo
>     [core]
>         configRestriction                 = "core.hooksPath: system, global"

I was hoping to avoid setting up configuration restriction via the
configuration files, if only because it implies some ordering in the
parsing. So for example, you'd need to do a separate pass to load the
restrictions system, and then actually parse the config.

I guess that's not too bad with the caching system that's in place,
though.

> Of course those are some rather large hoops to jump through just to
> accomplish this particular thing, but it would be more generally
> composable and you could e.g. say users can't disable gc.auto or
> whatever on their repos if they're hosted on your server. Which of
> course assumes that you control the git binary and they can't run
> their own.

Yeah, I was also hoping to avoid something too baroque. :) I don't know
if there's much value in restricting things like gc.auto. If they can
make arbitrary edits to the config file, they can run arbitrary code. I
think this is _just_ about protecting a git-daemon serving the untrusted
repositories (or a user fetching from an untrusted other-user on the
system).

> Yeah, the reason I'm prodding you about this is because I want to test
> this out at some point, and a *really* nice thing about the Git
> configuration facility is that you can test all these sorts of things
> on a per-repo basis now due to how all the git-config variables work
> now.
>
> With uploadpack.packObjectsHook you *can* do that by defining a global
> pass-through hook, but it makes it more of a hassle to test changes
> that straddle the divide between testing & production.

One thing I didn't elaborate on is that the "don't respect this key from
the repo config" could be made more featureful. For example, your
core.allowDangerousHooks could just as easily be an environment
variable: $GIT_ALLOW_DANGEROUS_CONFIG. [1]

And then you could set that on your servers, and only set
uploadpack.packObjectsHook in the repositories you wanted, achieving
your goal.

This does still leave the pack-objects hook unlike the other hooks (in
that it leaves the command in the config rather than in a script), but I
actually like that flexibility. Being able to use "git -c" to set the
hook for a one-shot invocation is kind of nice (though you do have to do
tricks with "--upload-pack=" to get it to cross the remote boundary).

-Peff

[1] We also talked long ago (in the v1.6.x days, regarding a
post-upload-pack hook) of auto-enabling "dangerous" hooks when getuid()
matched the owner of the hook. We could do the same thing for the config
file (though TBH, it is confusing enough of a rule that I think I prefer
something like the explicit environment variable).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH/RFC 0/6] pack-objects hook for upload-pack

Jeff King
In reply to this post by Junio C Hamano
On Tue, May 24, 2016 at 05:59:15PM -0700, Junio C Hamano wrote:

> On Wed, May 18, 2016 at 3:37 PM, Jeff King <[hidden email]> wrote:
> > I've often wanted to intercept the call from upload-pack to
> > pack-objects. The final patch in this series goes into more detail, but
> > basically it's good for:
> >
> >   1. Capturing the output from pack-objects for debugging/inspection.
> >
> >   2. Capturing the input to pack-objects to replay for debugging or
> >      performance analysis.
> >
> >   3. Caching pack-objects output.
> >
> > It's pretty trivial to add a hook to run instead of pack-objects (and
> > the hook would just run pack-objects itself). But we don't want to run
> > hooks in upload-pack, because we don't necessarily trust the repository
> > we're running in.
>
> Although I'd need to study the final step a bit more carefully than I did,
> overall I think these are good changes. The way the callbacks learn
> about the origin of the configuration may have to be rethought in the
> long run, though. We already have been relying on a filename thing
> kept separately as a global variable, and the approach taken by this
> series extends it, so we are not making anything fundamentally worse,
> but at some point we may need to bite the bullet and pass kv-info as
> an extra callback parameter or something.

Yeah, I had the same thought while working on this, but just didn't want
to have to tweak every config callback. As you say, I don't think this
makes anything fundamentally worse, though. I'm inclined to go with this
strategy, especially with the extra die("BUG") safety added here. But I
can look into changing the callbacks if you feel strongly.

-Peff

PS Did you have any thoughts on the t/helper problem mentioned in:

     http://article.gmane.org/gmane.comp.version-control.git/295029

   I suspect it will bite you if you try merging/testing this.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 4/6] config: return configset value for current_config_ functions

Duy Nguyen
In reply to this post by Jeff King
On Thu, May 19, 2016 at 7:08 AM, Jeff King <[hidden email]> wrote:
> On Wed, May 18, 2016 at 06:43:23PM -0400, Jeff King wrote:
>
>>  cache.h                |  1 +
>>  config.c               | 51 +++++++++++++++++++++++++++++++++++++++++---------
>>  t/helper/test-config.c | 20 ++++++++++++++++++++
>>  t/t1308-config-set.sh  | 23 +++++++++++++++++++++++
>> [...]
>> +test_expect_success 'iteration shows correct origins' '
>> +     echo "[alias]test-config = !test-config" >.gitconfig &&

How about using 'which' to get absolute path for test-config and put
it here? Then we don't rely on $PATH anymore.

>> [...]
>> +     git -c foo.bar=from-cmdline test-config iterate >actual &&
>
> While writing and testing this, I got bit by e6e7530 (test helpers: move
> test-* to t/helper/ subdirectory, 2016-04-13). I had an old test-config
> binary leftover in the root of my repository, and the new one was
> correctly built in t/helper/. Running "test-config" is fine, but inside
> the git alias, it sticks the repository root at the front of $PATH
> (because it's the exec-path). And so it ran the old version of
> test-config, which did not understand my new "iterate" option.
>
> Now I'll admit what I'm doing here is pretty funny (running test-* from
> an alias). I'm doing it because I want to see how the program operates
> with the "-c" config, and it's nicer to spell it as a user would,
> instead of munging $GIT_CONFIG_PARAMETERS directly.
>
> So I'm not sure if it's worth working around or not. The single tree
> state produced by this commit is fine, but it does behave badly if
> there's leftover cruft from a pre-e6e7530 build. A more robust version
> would look more like:
>
>   sq=\' ;# to ease quoting later
>   ...
>   GIT_CONFIG_PARAMETERS=${sq}foo.bar=from-cmdline${sq} test-config ...
>
> Which is ugly, but it's probably worth it to avoid the flakiness.
>
> The other option is to somehow make bin-wrappers more robust. E.g., it
> would be nice if we didn't actually point into the repository root
> directly, but rather somehow linked all of the git-* entries that
> _would_ be installed into the exec-path into a fake exec-path (or
> alternatively, actually build them directly into that fake exec-path).
>
> That's a much bigger change, though. Given how unlikely the sequence of
> steps in my test is, maybe it's better to just work around it in this
> one case.
>
> -Peff



--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 4/6] config: return configset value for current_config_ functions

Junio C Hamano
Duy Nguyen <[hidden email]> writes:

> On Thu, May 19, 2016 at 7:08 AM, Jeff King <[hidden email]> wrote:
>> On Wed, May 18, 2016 at 06:43:23PM -0400, Jeff King wrote:
>>
>>>  cache.h                |  1 +
>>>  config.c               | 51 +++++++++++++++++++++++++++++++++++++++++---------
>>>  t/helper/test-config.c | 20 ++++++++++++++++++++
>>>  t/t1308-config-set.sh  | 23 +++++++++++++++++++++++
>>> [...]
>>> +test_expect_success 'iteration shows correct origins' '
>>> +     echo "[alias]test-config = !test-config" >.gitconfig &&
>
> How about using 'which' to get absolute path for test-config and put
> it here? Then we don't rely on $PATH anymore.

Don't use which, which is not portable.

Remind me why we end up running ./test-config instead of
./bin-wrappers/test-config?  Should our tests be running
bin-wrappers early in their $PATH, perhaps?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH/RFC 0/6] pack-objects hook for upload-pack

Junio C Hamano
In reply to this post by Jeff King
Jeff King <[hidden email]> writes:

> Yeah, I had the same thought while working on this, but just didn't want
> to have to tweak every config callback. As you say, I don't think this
> makes anything fundamentally worse, though. I'm inclined to go with this
> strategy, especially with the extra die("BUG") safety added here.

Fine by me.

> PS Did you have any thoughts on the t/helper problem mentioned in:
>
>      http://article.gmane.org/gmane.comp.version-control.git/295029
>
>    I suspect it will bite you if you try merging/testing this.

It already did but thanks to that message I didn't have to panic ;-).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 4/6] config: return configset value for current_config_ functions

Jeff King
In reply to this post by Junio C Hamano
On Thu, May 26, 2016 at 09:42:48AM -0700, Junio C Hamano wrote:

> Duy Nguyen <[hidden email]> writes:
>
> > On Thu, May 19, 2016 at 7:08 AM, Jeff King <[hidden email]> wrote:
> >> On Wed, May 18, 2016 at 06:43:23PM -0400, Jeff King wrote:
> >>
> >>>  cache.h                |  1 +
> >>>  config.c               | 51 +++++++++++++++++++++++++++++++++++++++++---------
> >>>  t/helper/test-config.c | 20 ++++++++++++++++++++
> >>>  t/t1308-config-set.sh  | 23 +++++++++++++++++++++++
> >>> [...]
> >>> +test_expect_success 'iteration shows correct origins' '
> >>> +     echo "[alias]test-config = !test-config" >.gitconfig &&
> >
> > How about using 'which' to get absolute path for test-config and put
> > it here? Then we don't rely on $PATH anymore.
>
> Don't use which, which is not portable.
>
> Remind me why we end up running ./test-config instead of
> ./bin-wrappers/test-config?  Should our tests be running
> bin-wrappers early in their $PATH, perhaps?

The problem is running test-config inside of a git alias. The
bin-wrappers will set the exec-path to the root-level of git's build
directory, which the git binary will then stick at the front of the
$PATH.

So the simplest solution really is: don't do that. The only debate in my
mind is whether this is rare enough that it won't bite somebody again in
the future, or if we should look into a solution that makes this Just
Work.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 4/6] config: return configset value for current_config_ functions

Junio C Hamano
Jeff King <[hidden email]> writes:

> The problem is running test-config inside of a git alias. The
> bin-wrappers will set the exec-path to the root-level of git's build
> directory, which the git binary will then stick at the front of the
> $PATH.

I was wondering why exec-path does not point at bin-wrappers in the
first place.

A wrapper script needs to know where the real thing lives in order
to "exec" (or "exec gdb") anyway, and it hardcodes the path to it.
It happens to use GIT_EXEC_PATH to shorten the hardcoded path it
uses when it does "exec", but it does not have to.

Wouldn't we want to see "git" use any of these wrapped ones when it
invokes a non-builtin subcommand when it does so normally, honoring
GIT_EXEC_PATH?  Pointing GIT_EXEC_PATH at the top-level means that
wrappers are bypassed for such an invocation (if what is run happens
to have executable at the top-level), and possibly a totally wrong
thing is run (when we start generating the binaries in different
directories, which is what we are seeing here).

> So the simplest solution really is: don't do that. The only debate
> in my mind is whether this is rare enough that it won't bite
> somebody again in the future, or if we should look into a solution
> that makes this Just Work.

I think it was you who alluded to revamping the test framework along
the lines of preparing a "test installation" (aka "make install"
with DESTDIR set to somewhere) and making bin-wrappers to point into
that installation (or if we are testing an installed Git that may be
different from what we have source for, that final installed
location).  An installed version of Git will not have test-* helpers
so they need to come from a freshly built source tree, not from
"test installation" or "installed Git".  There may be other details
that need to be worked out, but as a longer term direction that may
not be a bad idea.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...