[PATCH] pack-objects: warn on split packs disabling bitmaps

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] pack-objects: warn on split packs disabling bitmaps

Eric Wong
It can be tempting for a server admin to want a stable set of
long-lived packs for dumb clients; but also want to enable
bitmaps to serve smart clients more quickly.

Unfortunately, such a configuration is impossible;
so at least warn users of this incompatibility since
commit 21134714787a02a37da15424d72c0119b2b8ed71
("pack-objects: turn off bitmaps when we split packs").

Tested the warning by inspecting the output of:

        make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v

Signed-off-by: Eric Wong <[hidden email]>
---
 Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of
 our ML archives, soonish.  I can serve terabytes of dumb HTTP
 traffic all day long without breaking a sweat; but smart
 packing of big repos worries me; especially when feeding
 slow clients and having to leave processes running
 (or buffering pack output to disk).  So perhaps I'll teach
 my HTTP server play dumb whenever CPU/memory usage is high.

 Documentation/config.txt           | 11 +++++++----
 Documentation/git-pack-objects.txt |  3 ++-
 Documentation/git-repack.txt       |  9 ++++++---
 builtin/pack-objects.c             |  9 ++++++++-
 4 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 42d2b50..ec31170 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2156,8 +2156,10 @@ pack.packSizeLimit::
  The maximum size of a pack.  This setting only affects
  packing to a file when repacking, i.e. the git:// protocol
  is unaffected.  It can be overridden by the `--max-pack-size`
- option of linkgit:git-repack[1]. The minimum size allowed is
- limited to 1 MiB. The default is unlimited.
+ option of linkgit:git-repack[1].  Reaching this limit prevents
+ pack bitmaps from being written when `repack.writeBitmaps`
+ is configured.  The minimum size allowed is limited to 1 MiB.
+ The default is unlimited.
  Common unit suffixes of 'k', 'm', or 'g' are
  supported.
 
@@ -2557,8 +2559,9 @@ repack.writeBitmaps::
  objects to disk (e.g., when `git repack -a` is run).  This
  index can speed up the "counting objects" phase of subsequent
  packs created for clones and fetches, at the cost of some disk
- space and extra time spent on the initial repack.  Defaults to
- false.
+ space and extra time spent on the initial repack.  This has
+ no effect if `pack.packSizeLimit` is configured and reached.
+ Defaults to false.
 
 rerere.autoUpdate::
  When set to true, `git-rerere` updates the index with the
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index bbea529..19cdcd0 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -110,7 +110,8 @@ base-name::
 --max-pack-size=<n>::
  Maximum size of each output pack file. The size can be suffixed with
  "k", "m", or "g". The minimum size allowed is limited to 1 MiB.
- If specified,  multiple packfiles may be created.
+ If specified, multiple packfiles may be created, which also
+ prevents the creation of a bitmap index.
  The default is unlimited, unless the config variable
  `pack.packSizeLimit` is set.
 
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index af230d0..2065812 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -106,7 +106,8 @@ other objects in that pack they already have locally.
 --max-pack-size=<n>::
  Maximum size of each output pack file. The size can be suffixed with
  "k", "m", or "g". The minimum size allowed is limited to 1 MiB.
- If specified,  multiple packfiles may be created.
+ If specified, multiple packfiles may be created, causing
+ `--write-bitmap-index` and `repack.writeBitmaps` to be ignored.
  The default is unlimited, unless the config variable
  `pack.packSizeLimit` is set.
 
@@ -115,7 +116,9 @@ other objects in that pack they already have locally.
  Write a reachability bitmap index as part of the repack. This
  only makes sense when used with `-a` or `-A`, as the bitmaps
  must be able to refer to all reachable objects. This option
- overrides the setting of `pack.writeBitmaps`.
+ overrides the setting of `repack.writeBitmaps`.  This option
+ has no effect if a multiple packfiles are created due to
+ reaching `pack.packSizeLimit` or `--max-pack-size`.
 
 --pack-kept-objects::
  Include objects in `.keep` files when repacking.  Note that we
@@ -123,7 +126,7 @@ other objects in that pack they already have locally.
  This means that we may duplicate objects, but this makes the
  option safe to use when there are concurrent pushes or fetches.
  This option is generally only useful if you are writing bitmaps
- with `-b` or `pack.writeBitmaps`, as it ensures that the
+ with `-b` or `repack.writeBitmaps`, as it ensures that the
  bitmapped packfile has the necessary objects.
 
 Configuration
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a27de5b..b6664ce 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -759,6 +759,10 @@ static off_t write_reused_pack(struct sha1file *f)
  return reuse_packfile_offset - sizeof(struct pack_header);
 }
 
+static const char no_split_warning[] = N_(
+"disabling bitmap writing, packs are split due to pack.packSizeLimit"
+);
+
 static void write_pack_file(void)
 {
  uint32_t i = 0, j;
@@ -813,7 +817,10 @@ static void write_pack_file(void)
  fixup_pack_header_footer(fd, sha1, pack_tmp_name,
  nr_written, sha1, offset);
  close(fd);
- write_bitmap_index = 0;
+ if (write_bitmap_index) {
+ warning(_(no_split_warning));
+ write_bitmap_index = 0;
+ }
  }
 
  if (!pack_to_stdout) {
--
EW
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] pack-objects: warn on split packs disabling bitmaps

Junio C Hamano
Eric Wong <[hidden email]> writes:

> It can be tempting for a server admin to want a stable set of
> long-lived packs for dumb clients; but also want to enable
> bitmaps to serve smart clients more quickly.
>
> Unfortunately, such a configuration is impossible;
> so at least warn users of this incompatibility since
> commit 21134714787a02a37da15424d72c0119b2b8ed71
> ("pack-objects: turn off bitmaps when we split packs").
>
> Tested the warning by inspecting the output of:
>
> make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v

While I do not think the update in this patch is wrong, this makes
me wonder what happens to a server admin who wants a stable set of
long-lived objects in a pack, and other objects in another pack that
is frequently recreated, by first packing objects needed for the
latest released version into a single pack and marking it with .keep
and then running "git repack" to create the second pack for the
remainder.

There is no automatic split involved, so we would get a bitmap file
for each of these two packs.  Would that pose a problem?  The pack
with the newer part of the history would not satisfy "all of the
reachable objects are in the same pack" condition.

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 42d2b50..ec31170 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2156,8 +2156,10 @@ pack.packSizeLimit::
>   The maximum size of a pack.  This setting only affects
>   packing to a file when repacking, i.e. the git:// protocol
>   is unaffected.  It can be overridden by the `--max-pack-size`
> - option of linkgit:git-repack[1]. The minimum size allowed is
> - limited to 1 MiB. The default is unlimited.
> + option of linkgit:git-repack[1].  Reaching this limit prevents
> + pack bitmaps from being written when `repack.writeBitmaps`
> + is configured.  The minimum size allowed is limited to 1 MiB.
> + The default is unlimited.

This is not wrong per-se, but I find the additional text overly
verbose.  The resulting text has at least three issues:

 - This sets maximum.  It does not say what happens when the maximum
   is reached ("packing fails" is a valid expectation).  We should
   spell out that when the maximum is reached, we will create
   multiple packfiles.

 - It is not "reaching this limit" that prevents bitmaps from being
   written.  It is "we will create multiple packfiles" that does.

 - Even when repack.writeBitmaps is not configured, but bitmap is
   requested via the command line option "repack -b", bitmap
   generation is disabled once we end up creating multiple
   packfiles.

> @@ -2557,8 +2559,9 @@ repack.writeBitmaps::
>   objects to disk (e.g., when `git repack -a` is run).  This
>   index can speed up the "counting objects" phase of subsequent
>   packs created for clones and fetches, at the cost of some disk
> - space and extra time spent on the initial repack.  Defaults to
> - false.
> + space and extra time spent on the initial repack.  This has
> + no effect if `pack.packSizeLimit` is configured and reached.
> + Defaults to false.

This has the opposite issue as the third point above.  We have to
ignore repack.writeBitmaps when we end up splitting a pack,
regardless of the reason why we split was pack.packSizeLimit or by
an explicit request from the command line.

Also to future-proof these two paragraphs (and those that are
touched by later parts of this patch), it may even be a good idea to
omit the mention of packsizelimit altogether.  We may invent a new
and different reason why we end up producing multiple packfiles,
other than packsizelimit.  What this patch wants to make readers
aware is that when multiple packs are generated, bitmap generation
is disabled.  Other details (e.g. how the user asks us to create
multiple packs, either via command line or configuration, or how the
use asks us to generate bitmaps) are of lessor importance.

> diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
> index bbea529..19cdcd0 100644
> --- a/Documentation/git-pack-objects.txt
> +++ b/Documentation/git-pack-objects.txt
> @@ -110,7 +110,8 @@ base-name::
>  --max-pack-size=<n>::
>   Maximum size of each output pack file. The size can be suffixed with
>   "k", "m", or "g". The minimum size allowed is limited to 1 MiB.
> - If specified,  multiple packfiles may be created.
> + If specified, multiple packfiles may be created, which also
> + prevents the creation of a bitmap index.

This is a good update, judging with the yardstick I set in the
previous paragraph in this review.

>   The default is unlimited, unless the config variable
>   `pack.packSizeLimit` is set.
>  
> diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
> index af230d0..2065812 100644
> --- a/Documentation/git-repack.txt
> +++ b/Documentation/git-repack.txt
> @@ -106,7 +106,8 @@ other objects in that pack they already have locally.
>  --max-pack-size=<n>::
>   Maximum size of each output pack file. The size can be suffixed with
>   "k", "m", or "g". The minimum size allowed is limited to 1 MiB.
> - If specified,  multiple packfiles may be created.
> + If specified, multiple packfiles may be created, causing
> + `--write-bitmap-index` and `repack.writeBitmaps` to be ignored.

And this is not.  Just say "bitmap index is not created".

> @@ -115,7 +116,9 @@ other objects in that pack they already have locally.
>   Write a reachability bitmap index as part of the repack. This
>   only makes sense when used with `-a` or `-A`, as the bitmaps
>   must be able to refer to all reachable objects. This option
> - overrides the setting of `pack.writeBitmaps`.
> + overrides the setting of `repack.writeBitmaps`.  This option
> + has no effect if a multiple packfiles are created due to
> + reaching `pack.packSizeLimit` or `--max-pack-size`.

Dropping "due to ..." makes it perfect.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] pack-objects: warn on split packs disabling bitmaps

Jeff King
On Wed, Apr 27, 2016 at 03:56:46PM -0700, Junio C Hamano wrote:

> Eric Wong <[hidden email]> writes:
>
> > It can be tempting for a server admin to want a stable set of
> > long-lived packs for dumb clients; but also want to enable
> > bitmaps to serve smart clients more quickly.
> >
> > Unfortunately, such a configuration is impossible;
> > so at least warn users of this incompatibility since
> > commit 21134714787a02a37da15424d72c0119b2b8ed71
> > ("pack-objects: turn off bitmaps when we split packs").
> >
> > Tested the warning by inspecting the output of:
> >
> > make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v
>
> While I do not think the update in this patch is wrong, this makes
> me wonder what happens to a server admin who wants a stable set of
> long-lived objects in a pack, and other objects in another pack that
> is frequently recreated, by first packing objects needed for the
> latest released version into a single pack and marking it with .keep
> and then running "git repack" to create the second pack for the
> remainder.
>
> There is no automatic split involved, so we would get a bitmap file
> for each of these two packs.  Would that pose a problem?  The pack
> with the newer part of the history would not satisfy "all of the
> reachable objects are in the same pack" condition.

You will not get two bitmaps in such a case. In add_object_entry(), if
we exclude an object via want_object_in_pack(), we will issue a warning
and turn off bitmaps.  That includes finding that one of the reachable
objects we would need is in a .keep pack.

You could in theory construct a broken non-closed bitmap by feeding an
arbitrary set of objects to pack-objects, but we turn off bitmap writing
entirely unless --all is used. So the test in add_object_entry() should
be sufficient for all cases there (it's actually too conservative; it's
possible that the object is not reachable from the other objects that
are going into the pack, but this is so impractical that it's not even
worth trying to catch).

The split-pack check from 211347147 had to come separately, because we
only find out about the split much later during the write phase (and
again, it is too conservative; it's _possible_ that the objects being
split aren't reachable from anything in the previous pack, but it's
exceedingly unlikely and not worth caring about).

Adding a warning as Eric's patch does is a definite improvement, and
something I should have done in the original (we _could_ just use the
same no_closure_warning, as that is technically what is going on, but I
think it is nice to mention pack-splitting, which is more likely to lead
the user in the right direction to fixing it).

> [discussion of doc fixes]

I do not mind overly if pack-splitting mentions disabling bitmaps. But I
think it may be simpler to keep the documentation in the bitmap section,
rather than trying to cross-reference in both directions.  It is the
bitmap code which is not prepared to work with pack-splits (and other
things, like .keep), not the other way around.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] pack-objects: warn on split packs disabling bitmaps

Jeff King
In reply to this post by Eric Wong
On Wed, Apr 27, 2016 at 09:53:24PM +0000, Eric Wong wrote:

> It can be tempting for a server admin to want a stable set of
> long-lived packs for dumb clients; but also want to enable
> bitmaps to serve smart clients more quickly.
>
> Unfortunately, such a configuration is impossible;
> so at least warn users of this incompatibility since
> commit 21134714787a02a37da15424d72c0119b2b8ed71
> ("pack-objects: turn off bitmaps when we split packs").
>
> Tested the warning by inspecting the output of:
>
> make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v

I think the intent and code in your patch is fine; looks like doc
specifics are being discussed elsewhere.

But I did want to mention one thing, which is that long-lived split
packs are a tradeoff, even for dumb clients. The pack format cannot do
deltas between packs, so the sum of your split packs is larger than a
single pack would be. That's a good thing for somebody who cloned
earlier, and wants to only a few small packs on top. But it's much worse
for somebody who wants to do a fresh clone, and has to grab all of the
packs either way.

>  Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of
>  our ML archives, soonish.  I can serve terabytes of dumb HTTP
>  traffic all day long without breaking a sweat; but smart
>  packing of big repos worries me; especially when feeding
>  slow clients and having to leave processes running
>  (or buffering pack output to disk).  So perhaps I'll teach
>  my HTTP server play dumb whenever CPU/memory usage is high.

Yeah, CPU and memory load for serving large clones is a problem. Memory
especially scales with number of objects (because we keep the whole
packing list in memory for the entirety of the write). At GitHub, we
have some changes to try to serve things verbatim from the on-disk pack
without even creating an in-memory list of objects (it's just a bitmap
of which objects in the packfile to send), and that reduces CPU and
memory load quite a bit. Cleaning up and submitting those patches has
been on my todo list for a while, but I just haven't gotten to it. I'm
of course happy to share the messy state if you want to pick through it
yourself.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v2] pack-objects: warn on split packs disabling bitmaps

Eric Wong
In reply to this post by Junio C Hamano
Junio C Hamano <[hidden email]> wrote:

> Eric Wong <[hidden email]> writes:
> > +++ b/Documentation/git-pack-objects.txt
> > @@ -110,7 +110,8 @@ base-name::
> >  --max-pack-size=<n>::
> >   Maximum size of each output pack file. The size can be suffixed with
> >   "k", "m", or "g". The minimum size allowed is limited to 1 MiB.
> > - If specified,  multiple packfiles may be created.
> > + If specified, multiple packfiles may be created, which also
> > + prevents the creation of a bitmap index.
>
> This is a good update, judging with the yardstick I set in the
> previous paragraph in this review.

Thanks for the review; made some adjustments and have v2 below.

> > --- a/Documentation/git-repack.txt
> > +++ b/Documentation/git-repack.txt
> > @@ -106,7 +106,8 @@ other objects in that pack they already have locally.
> >  --max-pack-size=<n>::
> >   Maximum size of each output pack file. The size can be suffixed with
> >   "k", "m", or "g". The minimum size allowed is limited to 1 MiB.
> > - If specified,  multiple packfiles may be created.
> > + If specified, multiple packfiles may be created, causing
> > + `--write-bitmap-index` and `repack.writeBitmaps` to be ignored.
>
> And this is not.  Just say "bitmap index is not created".

Ah, I've now synced the same --max-pack-size doc from
git-pack-objects.txt you liked into v2 below.

I worded my original differently between pack-objects and repack
since I figured repack is more likely used by end users;
and perhaps warranted an explanation that didn't require
describing the actual problem...

But I suppose "repack" isn't commonly called anymore, either.

> > @@ -115,7 +116,9 @@ other objects in that pack they already have locally.
> >   Write a reachability bitmap index as part of the repack. This
> >   only makes sense when used with `-a` or `-A`, as the bitmaps
> >   must be able to refer to all reachable objects. This option
> > - overrides the setting of `pack.writeBitmaps`.
> > + overrides the setting of `repack.writeBitmaps`.  This option
> > + has no effect if a multiple packfiles are created due to
> > + reaching `pack.packSizeLimit` or `--max-pack-size`.
>
> Dropping "due to ..." makes it perfect.

Done, along with:

        s/effect if a multiple/effect when multiple/

"if a" was definitely a typo, "if" is probably alright, but
I suspect "when" is even better.

-------------8<-------------
Subject: [PATCH] pack-objects: warn on split packs disabling bitmaps

It can be tempting for a server admin to want a stable set of
long-lived packs for dumb clients; but also want to enable
bitmaps to serve smart clients more quickly.

Unfortunately, such a configuration is impossible;
so at least warn users of this incompatibility since
commit 21134714787a02a37da15424d72c0119b2b8ed71
("pack-objects: turn off bitmaps when we split packs").

Tested the warning by inspecting the output of:

        make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v

Signed-off-by: Eric Wong <[hidden email]>
---
 Documentation/config.txt           | 12 ++++++++----
 Documentation/git-pack-objects.txt |  3 ++-
 Documentation/git-repack.txt       |  8 +++++---
 builtin/pack-objects.c             |  9 ++++++++-
 4 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3d8bc97..3ea3372 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2165,8 +2165,11 @@ pack.packSizeLimit::
  The maximum size of a pack.  This setting only affects
  packing to a file when repacking, i.e. the git:// protocol
  is unaffected.  It can be overridden by the `--max-pack-size`
- option of linkgit:git-repack[1]. The minimum size allowed is
- limited to 1 MiB. The default is unlimited.
+ option of linkgit:git-repack[1].  Reaching this limit results
+ in the creation of multiple packfiles; which in turn prevents
+ bitmaps from being created.
+ The minimum size allowed is limited to 1 MiB.
+ The default is unlimited.
  Common unit suffixes of 'k', 'm', or 'g' are
  supported.
 
@@ -2566,8 +2569,9 @@ repack.writeBitmaps::
  objects to disk (e.g., when `git repack -a` is run).  This
  index can speed up the "counting objects" phase of subsequent
  packs created for clones and fetches, at the cost of some disk
- space and extra time spent on the initial repack.  Defaults to
- false.
+ space and extra time spent on the initial repack.  This has
+ no effect if multiple packfiles are created.
+ Defaults to false.
 
 rerere.autoUpdate::
  When set to true, `git-rerere` updates the index with the
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index bbea529..19cdcd0 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -110,7 +110,8 @@ base-name::
 --max-pack-size=<n>::
  Maximum size of each output pack file. The size can be suffixed with
  "k", "m", or "g". The minimum size allowed is limited to 1 MiB.
- If specified,  multiple packfiles may be created.
+ If specified, multiple packfiles may be created, which also
+ prevents the creation of a bitmap index.
  The default is unlimited, unless the config variable
  `pack.packSizeLimit` is set.
 
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index af230d0..b9c02ce 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -106,7 +106,8 @@ other objects in that pack they already have locally.
 --max-pack-size=<n>::
  Maximum size of each output pack file. The size can be suffixed with
  "k", "m", or "g". The minimum size allowed is limited to 1 MiB.
- If specified,  multiple packfiles may be created.
+ If specified, multiple packfiles may be created, which also
+ prevents the creation of a bitmap index.
  The default is unlimited, unless the config variable
  `pack.packSizeLimit` is set.
 
@@ -115,7 +116,8 @@ other objects in that pack they already have locally.
  Write a reachability bitmap index as part of the repack. This
  only makes sense when used with `-a` or `-A`, as the bitmaps
  must be able to refer to all reachable objects. This option
- overrides the setting of `pack.writeBitmaps`.
+ overrides the setting of `repack.writeBitmaps`.  This option
+ has no effect if multiple packfiles are created.
 
 --pack-kept-objects::
  Include objects in `.keep` files when repacking.  Note that we
@@ -123,7 +125,7 @@ other objects in that pack they already have locally.
  This means that we may duplicate objects, but this makes the
  option safe to use when there are concurrent pushes or fetches.
  This option is generally only useful if you are writing bitmaps
- with `-b` or `pack.writeBitmaps`, as it ensures that the
+ with `-b` or `repack.writeBitmaps`, as it ensures that the
  bitmapped packfile has the necessary objects.
 
 Configuration
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a27de5b..b6664ce 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -759,6 +759,10 @@ static off_t write_reused_pack(struct sha1file *f)
  return reuse_packfile_offset - sizeof(struct pack_header);
 }
 
+static const char no_split_warning[] = N_(
+"disabling bitmap writing, packs are split due to pack.packSizeLimit"
+);
+
 static void write_pack_file(void)
 {
  uint32_t i = 0, j;
@@ -813,7 +817,10 @@ static void write_pack_file(void)
  fixup_pack_header_footer(fd, sha1, pack_tmp_name,
  nr_written, sha1, offset);
  close(fd);
- write_bitmap_index = 0;
+ if (write_bitmap_index) {
+ warning(_(no_split_warning));
+ write_bitmap_index = 0;
+ }
  }
 
  if (!pack_to_stdout) {
--
EW
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] pack-objects: warn on split packs disabling bitmaps

Eric Wong
In reply to this post by Jeff King
Jeff King <[hidden email]> wrote:
> On Wed, Apr 27, 2016 at 09:53:24PM +0000, Eric Wong wrote:
>
> > It can be tempting for a server admin to want a stable set of
> > long-lived packs for dumb clients; but also want to enable
> > bitmaps to serve smart clients more quickly.

> But I did want to mention one thing, which is that long-lived split
> packs are a tradeoff, even for dumb clients. The pack format cannot do
> deltas between packs, so the sum of your split packs is larger than a
> single pack would be. That's a good thing for somebody who cloned
> earlier, and wants to only a few small packs on top. But it's much worse
> for somebody who wants to do a fresh clone, and has to grab all of the
> packs either way.

Definitely a trade off, but a fresh clone with packs might only
be (at worst) doubling or tripling bandwidth use on both sides?

However, the CPU/memory cost of packing is at least an order of
magnitude (more likely several orders of magnitude) more
expensive on the server.  The client most likely won't care
about CPU/memory usage, though.

> >  Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of
> >  our ML archives, soonish.  I can serve terabytes of dumb HTTP
> >  traffic all day long without breaking a sweat; but smart
> >  packing of big repos worries me; especially when feeding
> >  slow clients and having to leave processes running
> >  (or buffering pack output to disk).  So perhaps I'll teach
> >  my HTTP server play dumb whenever CPU/memory usage is high.
>
> Yeah, CPU and memory load for serving large clones is a problem. Memory
> especially scales with number of objects (because we keep the whole
> packing list in memory for the entirety of the write). At GitHub, we
> have some changes to try to serve things verbatim from the on-disk pack
> without even creating an in-memory list of objects (it's just a bitmap
> of which objects in the packfile to send), and that reduces CPU and
> memory load quite a bit. Cleaning up and submitting those patches has
> been on my todo list for a while, but I just haven't gotten to it. I'm
> of course happy to share the messy state if you want to pick through it
> yourself.

Sure thing!  I can't promise I'll have time, either, but being
able to serve packs verbatim would be great; especially if you
could multiplex it with epoll/kqueue for folks on slow pipes
(and maybe use sendfile, but perhaps that's not worth the effort
 with TLS everywhere nowadays).

I was also wondering if fresh clones could be memoized entirely.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html