[PATCH] Optimize sha1_object_info for loose objects, not concurrent repacks

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Optimize sha1_object_info for loose objects, not concurrent repacks

Steven Grimm
When dealing with a repository with lots of loose objects, sha1_object_info
would rescan the packs directory every time an unpacked object was referenced
before finally giving up and looking for the loose object. This caused a lot
of extra unnecessary system calls during git pack-objects; the code was
rereading the entire pack directory once for each loose object file.

This patch looks for a loose object before falling back to rescanning the
pack directory, rather than the other way around.

Signed-off-by: Steven Grimm <[hidden email]>
---

        I discovered this by running strace on a pack-objects that was
        taking especially long to run; it was making more system calls
        to scan the pack directory than to do stuff with the loose
        objects, which didn't seem right.

 sha1_file.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index e281c14..32e4664 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1929,11 +1929,18 @@ static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *size
 int sha1_object_info(const unsigned char *sha1, unsigned long *sizep)
 {
  struct pack_entry e;
+ int status;
 
  if (!find_pack_entry(sha1, &e, NULL)) {
+ /* Most likely it's a loose object. */
+ status = sha1_loose_object_info(sha1, sizep);
+ if (status >= 0)
+ return status;
+
+ /* Not a loose object; someone else may have just packed it. */
  reprepare_packed_git();
  if (!find_pack_entry(sha1, &e, NULL))
- return sha1_loose_object_info(sha1, sizep);
+ return status;
  }
  return packed_object_info(e.p, e.offset, sizep);
 }
--
1.6.0.rc1.66.gc78d7

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Optimize sha1_object_info for loose objects, not concurrent repacks

Shawn Pearce
Steven Grimm <[hidden email]> wrote:

> When dealing with a repository with lots of loose objects, sha1_object_info
> would rescan the packs directory every time an unpacked object was referenced
> before finally giving up and looking for the loose object. This caused a lot
> of extra unnecessary system calls during git pack-objects; the code was
> rereading the entire pack directory once for each loose object file.
>
> This patch looks for a loose object before falling back to rescanning the
> pack directory, rather than the other way around.
>
> Signed-off-by: Steven Grimm <[hidden email]>

Heh.  Cute bug.

ACK.

> diff --git a/sha1_file.c b/sha1_file.c
> index e281c14..32e4664 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1929,11 +1929,18 @@ static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *size
>  int sha1_object_info(const unsigned char *sha1, unsigned long *sizep)
>  {
>   struct pack_entry e;
> + int status;
>  
>   if (!find_pack_entry(sha1, &e, NULL)) {
> + /* Most likely it's a loose object. */
> + status = sha1_loose_object_info(sha1, sizep);
> + if (status >= 0)
> + return status;
> +
> + /* Not a loose object; someone else may have just packed it. */
>   reprepare_packed_git();
>   if (!find_pack_entry(sha1, &e, NULL))
> - return sha1_loose_object_info(sha1, sizep);
> + return status;
>   }
>   return packed_object_info(e.p, e.offset, sizep);
>  }

--
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html