[RFC/PATCH] Ordering of remotes for fetch --all

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC/PATCH] Ordering of remotes for fetch --all

Guido Martínez
Hi all,

I run a server with several git mirrors, that are updated every hour. On
that same server, users clone those projects and work on them. We use
the local mirrors to reduce network load: the users can fetch from the
mirror first (to get most of the objects with zero network cost) and
then fetch the real remote (to make sure they're completely up to date).

I would like this to be configurable in each git working directory,
so users can just configure the order they want and then just do "git
remote update".

I'm aware one can get this behavior by editing .git/config and
ordering the remotes as one wishes, but I find that very hacky and not
scripting-friendly.

This patch introduces a fetch priority for each remote, at a default of
50 and modifiable via git config. This new order will only matter when
doing fetch --all.

Do you think this is a useful feature? Hopefully you don't consider this
as just noise :)

(As a side note: for ordering the remotes a stable sort would be best,
to have the least impact possible on current behavior. I believe
git_qsort is stable but a confirmation would be nice.)

Thanks!
Guido

Guido Martínez (1):
  remote: add a fetching priority to each remote

 Documentation/config.txt |  5 +++++
 builtin/fetch.c          |  2 +-
 remote.c                 | 43 +++++++++++++++++++++++++++++++++++++++----
 remote.h                 |  2 ++
 4 files changed, 47 insertions(+), 5 deletions(-)

--
2.8.1.281.g0994585

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[RFC/PATCH] remote: add a fetching priority to each remote

Guido Martínez
Add an int that allows for a way of setting a fetch order for remotes,
mainly for the use case of "git remote update" which updates every
remote.

This way, users can set local mirrors of repositories first to quickly
get a bunch of objects, and later the upstream repo to make sure that
they pulled the latest commit.

Signed-off-by: Guido Martínez <[hidden email]>
---
 Documentation/config.txt |  5 +++++
 builtin/fetch.c          |  2 +-
 remote.c                 | 43 +++++++++++++++++++++++++++++++++++++++----
 remote.h                 |  2 ++
 4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 42d2b50..5ca199c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2489,6 +2489,11 @@ remote.<name>.fetch::
  The default set of "refspec" for linkgit:git-fetch[1]. See
  linkgit:git-fetch[1].
 
+remote.<name>.fetchprio::
+ Set a priority for fetching this remote, to allow you to set
+ a custom order when doing "git fetch --all" (thus also when
+ running "git remote update"). Default value is 50.
+
 remote.<name>.push::
  The default set of "refspec" for linkgit:git-push[1]. See
  linkgit:git-push[1].
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f8455bd..44f42bf 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1187,7 +1187,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
  die(_("fetch --all does not take a repository argument"));
  else if (argc > 1)
  die(_("fetch --all does not make sense with refspecs"));
- (void) for_each_remote(get_one_remote_for_fetch, &list);
+ (void) for_each_sorted_remote(get_one_remote_for_fetch, &list);
  result = fetch_multiple(&list);
  } else if (argc == 0) {
  /* No arguments -- use default remote */
diff --git a/remote.c b/remote.c
index 28fd676..3b9f20a 100644
--- a/remote.c
+++ b/remote.c
@@ -168,6 +168,7 @@ static struct remote *make_remote(const char *name, int len)
  ALLOC_GROW(remotes, remotes_nr + 1, remotes_alloc);
  remotes[remotes_nr++] = ret;
  ret->name = xstrndup(name, len);
+ ret->fetch_prio = 50;
 
  hashmap_entry_init(ret, lookup_entry.hash);
  replaced = hashmap_put(&remotes_hash, ret);
@@ -375,6 +376,8 @@ static int handle_config(const char *key, const char *value, void *cb)
  remote->mirror = git_config_bool(key, value);
  else if (!strcmp(subkey, "skipdefaultupdate"))
  remote->skip_default_update = git_config_bool(key, value);
+ else if (!strcmp(subkey, "fetchprio"))
+ remote->fetch_prio = git_config_int(key, value);
  else if (!strcmp(subkey, "skipfetchall"))
  remote->skip_default_update = git_config_bool(key, value);
  else if (!strcmp(subkey, "prune"))
@@ -719,12 +722,12 @@ int remote_is_configured(struct remote *remote)
  return remote && remote->origin;
 }
 
-int for_each_remote(each_remote_fn fn, void *priv)
+static int for_each_remote_do(struct remote **rlist, int len,
+      each_remote_fn fn, void *priv)
 {
  int i, result = 0;
- read_config();
- for (i = 0; i < remotes_nr && !result; i++) {
- struct remote *r = remotes[i];
+ for (i = 0; i < len && !result; i++) {
+ struct remote *r = rlist[i];
  if (!r)
  continue;
  if (!r->fetch)
@@ -738,6 +741,38 @@ int for_each_remote(each_remote_fn fn, void *priv)
  return result;
 }
 
+int for_each_remote(each_remote_fn fn, void *priv)
+{
+ read_config();
+ return for_each_remote_do(remotes, remotes_nr, fn, priv);
+}
+
+int compare_fetch_prio(const void *p, const void *q)
+{
+ const struct remote *pp = *(struct remote**)p;
+ const struct remote *qq = *(struct remote**)q;
+
+ return pp->fetch_prio - qq->fetch_prio;
+}
+
+int for_each_sorted_remote(each_remote_fn fn, void *priv)
+{
+ struct remote **sr;
+ int i, rc;
+
+ read_config();
+
+ sr = xmalloc(sizeof (struct remote*) * remotes_nr);
+ for (i = 0; i < remotes_nr; i++)
+ sr[i] = remotes[i];
+
+ qsort(sr, remotes_nr, sizeof (struct remote*), compare_fetch_prio);
+
+ rc = for_each_remote_do(sr, remotes_nr, fn, priv);
+ free(sr);
+ return rc;
+}
+
 static void handle_duplicate(struct ref *ref1, struct ref *ref2)
 {
  if (strcmp(ref1->name, ref2->name)) {
diff --git a/remote.h b/remote.h
index c21fd37..09b48e4 100644
--- a/remote.h
+++ b/remote.h
@@ -47,6 +47,7 @@ struct remote {
  int skip_default_update;
  int mirror;
  int prune;
+ int fetch_prio;
 
  const char *receivepack;
  const char *uploadpack;
@@ -64,6 +65,7 @@ int remote_is_configured(struct remote *remote);
 
 typedef int each_remote_fn(struct remote *remote, void *priv);
 int for_each_remote(each_remote_fn fn, void *priv);
+int for_each_sorted_remote(each_remote_fn fn, void *priv);
 
 int remote_has_url(struct remote *remote, const char *url);
 
--
2.8.1.281.g0994585

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH] Ordering of remotes for fetch --all

Jeff King
In reply to this post by Guido Martínez
On Mon, Apr 25, 2016 at 11:15:05PM +0200, Guido Martínez wrote:

> I run a server with several git mirrors, that are updated every hour. On
> that same server, users clone those projects and work on them. We use
> the local mirrors to reduce network load: the users can fetch from the
> mirror first (to get most of the objects with zero network cost) and
> then fetch the real remote (to make sure they're completely up to date).
>
> I would like this to be configurable in each git working directory,
> so users can just configure the order they want and then just do "git
> remote update".
>
> I'm aware one can get this behavior by editing .git/config and
> ordering the remotes as one wishes, but I find that very hacky and not
> scripting-friendly.

You can also define your own ordered groups, like:

  $ git config remotes.foo "one two three"
  $ git fetch foo 2>&1 | grep ^Fetching
  Fetching one
  Fetching two
  Fetching three

That's not _exactly_ the same, because you can't give a partial ordering
of one high-priority remote and then say "all the rest, in whatever
order you want", because there's no way to say "all the rest".

You _can_ say:

  git config remotes.foo "high-priority --all"

but the final "--all" will fetch from high-priority again. An
alternative feature would be to teach remotes.* groups to cull
duplicates, if that's not acceptable.

I don't have a strong opinion against your approach, though. Just
exploring alternatives.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH] Ordering of remotes for fetch --all

Guido Martínez
Hi Jeff, thanks for your comments.

On Mon, Apr 25, 2016 at 11:37 PM, Jeff King <[hidden email]> wrote:

> On Mon, Apr 25, 2016 at 11:15:05PM +0200, Guido Martínez wrote:
>
>> I run a server with several git mirrors, that are updated every hour. On
>> that same server, users clone those projects and work on them. We use
>> the local mirrors to reduce network load: the users can fetch from the
>> mirror first (to get most of the objects with zero network cost) and
>> then fetch the real remote (to make sure they're completely up to date).
>>
>> I would like this to be configurable in each git working directory,
>> so users can just configure the order they want and then just do "git
>> remote update".
>>
>> I'm aware one can get this behavior by editing .git/config and
>> ordering the remotes as one wishes, but I find that very hacky and not
>> scripting-friendly.
>
> You can also define your own ordered groups, like:
>
>   $ git config remotes.foo "one two three"
>   $ git fetch foo 2>&1 | grep ^Fetching
>   Fetching one
>   Fetching two
>   Fetching three
>
> That's not _exactly_ the same, because you can't give a partial ordering
> of one high-priority remote and then say "all the rest, in whatever
> order you want", because there's no way to say "all the rest".
>
> You _can_ say:
>
>   git config remotes.foo "high-priority --all"
>
> but the final "--all" will fetch from high-priority again. An
> alternative feature would be to teach remotes.* groups to cull
> duplicates, if that's not acceptable.
These are good, but my main drive was to be able to just "git remote
update" without any more information. In your cases I need to call
update "foo". Also as you mention you either need to edit foo when
adding a repo, or duplicating the pull from the high-prio one.

Another approach would be to add a "fetchdep" pointing to another
remote, and then do a topological sort on fetch --all. This can also
be used on "git pull", to first pull from the mirror without any extra
command.

Maybe it's not such a big deal, but I think it's a nice feature to
have. It allows for a stupidly simple mirroring/prefetch scheme,
without any proxy or anything fancy.

Not sure if it suits the needs of anyone else, though... Would there
be interest in me implementing the "fetchdep" alternative?

Thanks!
Guido

>
> I don't have a strong opinion against your approach, though. Just
> exploring alternatives.
>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html