propagating repo corruption across clone

classic Classic list List threaded Threaded
60 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 9/9] clone: run check_everything_connected

Jeff King
On Tue, Mar 26, 2013 at 07:53:42AM +0700, Nguyen Thai Ngoc Duy wrote:

> On Tue, Mar 26, 2013 at 3:26 AM, Jeff King <[hidden email]> wrote:
> >  static void update_remote_refs(const struct ref *refs,
> >                                const struct ref *mapped_refs,
> >                                const struct ref *remote_head_points_at,
> >                                const char *branch_top,
> >                                const char *msg)
> >  {
> > +       const struct ref *rm = mapped_refs;
> > +
> > +       if (check_everything_connected(iterate_ref_map, 0, &rm))
> > +               die(_("remote did not send all necessary objects"));
> > +
> >         if (refs) {
> >                 write_remote_refs(mapped_refs);
> >                 if (option_single_branch)
>
> Maybe move this after checkout, so that I can switch terminal and
> start working while it's verifying? And maybe an option not to
> check_everything_connected, instead print a big fat warning telling
> the user to fsck later?

I tried to follow the fetch process of not installing the refs until we
had verified that the objects were reasonable. It probably doesn't
matter that much for clone, since you would not have simultaneous users
expecting the repository to be in a reasonable state until after clone
completes, though.

We also would have to tweak check_everything_connected, which does
something like "--not --all" to avoid rechecking objects we already
have. But that is not too hard to do.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 10/9] clone: leave repo in place after checkout errors

Jonathan Nieder-2
In reply to this post by Jeff King
Jeff King wrote:

> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -377,10 +377,40 @@ static void remove_junk(void)
>  static const char *junk_work_tree;
>  static const char *junk_git_dir;
>  static pid_t junk_pid;
> +enum {
> + JUNK_LEAVE_NONE,
> + JUNK_LEAVE_REPO,
> + JUNK_LEAVE_ALL
> +} junk_mode = JUNK_LEAVE_NONE;

Neat.

> +
> +static const char junk_leave_repo_msg[] =
> +N_("The remote repository was cloned successfully, but there was\n"
> +   "an error checking out the HEAD branch. The repository has been left in\n"
> +   "place but the working tree may be in an inconsistent state. You can\n"
> +   "can inspect the contents with:\n"
> +   "\n"
> +   "    git status\n"
> +   "\n"
> +   "and retry the checkout with\n"
> +   "\n"
> +   "    git checkout -f HEAD\n"
> +   "\n");

Can this be made more precise?  I don't know what it means for the
working tree to be in an inconsistent state: do you mean that some files
might be partially checked out or not have been checked out at all yet?

        error: Clone succeeded, but checkout failed.
        hint: You can inspect what was checked out with "git status".
        hint: To retry the checkout, run "git checkout -f HEAD".

Aside from that, this looks very nice.

Thanks,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Rich Fromm
In reply to this post by Jeff King
Jeff King wrote
Fundamentally the problem is
that the --local transport is not safe from propagating corruption, and
should not be used if that's a requirement.
I've read Jeff Mitchell's blog post, his update, relevant parts of the git-clone(1) man page, and a decent chunk of this thread, and I'm still not clear on one thing.  Is the danger of `git clone --mirror` propagating corruption only true when using the --local option ?

Specifically, in my case, I'm using `git clone --mirror`, but I'm *not* using --local, nor am I using --no-hardlinks.  The host executing the clone command is different than the the host on which the remote repository lives, and I am using ssh as a transport protocol.  If there is corruption, can I or can I not expect the clone operation to fail and return a non-zero exit value?  If I can not expect this, is the workaround to run `git fsck` on the resulting clone?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 10/9] clone: leave repo in place after checkout errors

Jeff King
In reply to this post by Jonathan Nieder-2
On Tue, Mar 26, 2013 at 03:32:59PM -0700, Jonathan Nieder wrote:

> > +static const char junk_leave_repo_msg[] =
> > +N_("The remote repository was cloned successfully, but there was\n"
> > +   "an error checking out the HEAD branch. The repository has been left in\n"
> > +   "place but the working tree may be in an inconsistent state. You can\n"
> > +   "can inspect the contents with:\n"
> > +   "\n"
> > +   "    git status\n"
> > +   "\n"
> > +   "and retry the checkout with\n"
> > +   "\n"
> > +   "    git checkout -f HEAD\n"
> > +   "\n");
>
> Can this be made more precise?  I don't know what it means for the
> working tree to be in an inconsistent state: do you mean that some files
> might be partially checked out or not have been checked out at all yet?

It means that we died during the checkout procedure, and we don't have
any idea what was left. Maybe something, maybe nothing. Maybe an index,
maybe not.

> error: Clone succeeded, but checkout failed.
> hint: You can inspect what was checked out with "git status".
> hint: To retry the checkout, run "git checkout -f HEAD".

That is certainly more succint, if not more precise. I'd be fine with
it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Jonathan Nieder-2
In reply to this post by Rich Fromm
Hi,

Rich Fromm wrote:

>                                                The host executing the clone
> command is different than the the host on which the remote repository lives,
> and I am using ssh as a transport protocol.  If there is corruption, can I
> or can I not expect the clone operation to fail and return a non-zero exit
> value?  If I can not expect this, is the workaround to run `git fsck` on the
> resulting clone?

Is the "[transfer] fsckObjects" configuration on the host executing the
clone set to true?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Junio C Hamano
In reply to this post by Rich Fromm
Rich Fromm <[hidden email]> writes:

> Jeff King wrote
>> Fundamentally the problem is
>> that the --local transport is not safe from propagating corruption, and
>> should not be used if that's a requirement.
>
> I've read Jeff Mitchell's blog post, his update, relevant parts of the
> git-clone(1) man page, and a decent chunk of this thread, and I'm still not
> clear on one thing.  Is the danger of `git clone --mirror` propagating
> corruption only true when using the --local option ?

If you use --local, that is equivalent to "cp -R".  Your corruption
in the source will faithfully be byte-for-byte copied to the
destination.  If you do not (and in your case you have two different
machines), unless you are using the long deprecated rsync transport
(which again is the same as "cp -R"), transport layer will notice
object corruption.  See Jeff's analysis earlier in the thread.

If you are lucky (or unlucky, depending on how you look at it), the
corruption you have in your object store may affect objects that are
needed to check out the version at the tip of the history, and "git
checkout" that happens as the last step of cloning may loudly
complain, but that just means you can immediately notice the
breakage in that case.  You may be unlucky and the corruption may
not affect objects that are needed to check out the tip. The initial
checkout will succeed as if nothing is wrong, but the corruption in
your object store is still there nevertheless.  "git log -p --all"
or "git fsck" will certainly be unhappy.

The difference between --mirror and no --mirror is a red herring.
You may want to ask Jeff Mitchell to remove the mention of it; it
only adds to the confusion without helping users.  If you made
byte-for-byte copy of corrupt repository, it wouldn't make any
difference if the first "checkout" notices it.

To be paranoid, you may want to set transfer.fsckObjects to true,
perhaps in your ~/.gitconfig.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Sitaram Chamarty
On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano <[hidden email]> wrote:

> To be paranoid, you may want to set transfer.fsckObjects to true,
> perhaps in your ~/.gitconfig.

do we have any numbers on the overhead of this?

Even a "guesstimate" will do...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Junio C Hamano
Sitaram Chamarty <[hidden email]> writes:

> On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano <[hidden email]> wrote:
>
>> To be paranoid, you may want to set transfer.fsckObjects to true,
>> perhaps in your ~/.gitconfig.
>
> do we have any numbers on the overhead of this?
>
> Even a "guesstimate" will do...

On a reasonably slow machine:

$ cd /project/git/git.git && git repack -a -d
$ ls -hl .git/objects/pack/*.pack
-r--r--r-- 1 junio src 44M Mar 26 13:18 .git/objects/pack/pack-c40635e5ee2b7094eb0e2c416e921a2b129bd8d2.pack

$ cd .. && git --bare init junk && cd junk
$ time git index-pack --strict --stdin <../git.git/.git/objects/pack/*.pack
real    0m13.873s
user    0m21.345s
sys     0m2.248s

That's about 3.2 Mbps?

Compare that with the speed your other side feeds you (or your line
speed could be the limiting factor) and decide how much you value
your data.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Sitaram Chamarty
On Wed, Mar 27, 2013 at 8:33 PM, Junio C Hamano <[hidden email]> wrote:

> Sitaram Chamarty <[hidden email]> writes:
>
>> On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano <[hidden email]> wrote:
>>
>>> To be paranoid, you may want to set transfer.fsckObjects to true,
>>> perhaps in your ~/.gitconfig.
>>
>> do we have any numbers on the overhead of this?
>>
>> Even a "guesstimate" will do...
>
> On a reasonably slow machine:
>
> $ cd /project/git/git.git && git repack -a -d
> $ ls -hl .git/objects/pack/*.pack
> -r--r--r-- 1 junio src 44M Mar 26 13:18 .git/objects/pack/pack-c40635e5ee2b7094eb0e2c416e921a2b129bd8d2.pack
>
> $ cd .. && git --bare init junk && cd junk
> $ time git index-pack --strict --stdin <../git.git/.git/objects/pack/*.pack
> real    0m13.873s
> user    0m21.345s
> sys     0m2.248s
>
> That's about 3.2 Mbps?
>
> Compare that with the speed your other side feeds you (or your line
> speed could be the limiting factor) and decide how much you value
> your data.

Thanks.  I was also interested in overhead on the server just as a %-age.

I have no idea why but when I did some tests a long time ago I got
upwards of 40% or so, but now when I try these tests for git.git

    cd <some empty dir>
    git init --bare
    # git config transfer.fsckobjects true
    git fetch file:///full/path/to/git.git refs/*:refs/*

then, the difference in elapsed time 18s -> 22s, so about 22%, and CPU
time is 31 -> 37, so about 20%.  I didn't measure disk access
increases, but I guess 20% is not too bad.

Is it likely to be linear in the size of the repo, by and large?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Rich Fromm
In reply to this post by Jonathan Nieder-2
Jonathan Nieder-2 wrote
Is the "[transfer] fsckObjects" configuration on the host executing the
clone set to true?
I hadn't been setting it at all, and according to git-config(1) it defaults to false, so the answer is no.  It looks like setting it might be a good idea.

But I'm still somewhat confused about what is and is not checked under what conditions.  Consider the three statements:

# 1
git clone --mirror myuser@myhost:my_repo

# 2
git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo

# 3
git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck

Are 2 and 3 equivalent?  Or is there an increasing level of checking that occurs from 1 to 2, and from 2 to 3?  My guess is the latter, but perhaps this could be clearer in the man pages.

git-config(1) says that transfer.fsckObjects essentially (if fetch... and receive... are not explicitly set) "git-fetch-pack will check all fetched objects" and "git-receive-pack will check all received objects."  While git-fsck(1) says "git-fsck tests SHA1 and general object sanity, and it does full tracking of the resulting reachability and everything else."  The latter sounds like a stronger statement to me.  But if that's true, perhaps should the relevant section(s) of git-config(1) explicitly note that this is not equivalent to a full git-fsck ?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Rich Fromm
In reply to this post by Junio C Hamano
Junio C Hamano wrote
If you use --local, that is equivalent to "cp -R".  Your corruption
in the source will faithfully be byte-for-byte copied to the
destination.  If you do not
...
transport layer will notice
object corruption.
...
The difference between --mirror and no --mirror is a red herring.
You may want to ask Jeff Mitchell to remove the mention of it; it
only adds to the confusion without helping users.
Just to clarify, I don't know Jeff Mitchell personally, and I'm not affiliated with the KDE project.  I happened to have recently implemented a backup strategy for a different codebase, that relies on `git clone --mirror` to take the actual snapshots of the live repos, and I read about Jeff's experiences, and that's why I started following this discussion.  Apologies if my questions are considered slightly off topic -- I'm not positive if this is supposed to be a list for developers, and not users.

Nevertheless, I will try to contact Jeff and point him at this.  My initial reading of his blog posts definitely gave me the impression that this was a --mirror vs. not issue, but it really sounds like his main problem was using --local.

However, I think there may be room for some additional clarity in the docs.  The --local option in git-config(1) says "When the repository to clone from is on a local machine, this flag bypasses the normal "git aware" transport mechanism".  But there's no mention of the consequences of this transport bypass.  There's also no mention of this in the "GIT URLS" section that discusses transport protocols, and I also don't see anything noting it in either of these sections of the git book:

http://git-scm.com/book/en/Git-on-the-Server-The-Protocols
http://git-scm.com/book/en/Git-Internals-Transfer-Protocols
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Junio C Hamano
Rich Fromm <[hidden email]> writes:

> Apologies if my questions are considered slightly off topic -- I'm not
> positive if this is supposed to be a list for developers, and not users.

The list is both for users and developers.

> However, I think there may be room for some additional clarity in the docs.
> The --local option in git-config(1) says "When the repository to clone from
> is on a local machine, this flag bypasses the normal "git aware" transport
> mechanism".  But there's no mention of the consequences of this transport
> bypass.

Yeah, I would not mind a patch to update the documentation for
"clone --local" and rsync transport to say something about
byte-for-byte copying of broken repository.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Jeff King
In reply to this post by Rich Fromm
On Wed, Mar 27, 2013 at 11:23:15AM -0700, Rich Fromm wrote:

> But I'm still somewhat confused about what is and is not checked under what
> conditions.  Consider the three statements:
>
> # 1
> git clone --mirror myuser@myhost:my_repo
>
> # 2
> git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo
>
> # 3
> git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck
>
> Are 2 and 3 equivalent?  Or is there an increasing level of checking that
> occurs from 1 to 2, and from 2 to 3?  My guess is the latter, but perhaps
> this could be clearer in the man pages.

2 and 3 are not exactly equivalent, in that they are implemented
slightly differently, but I do not know offhand of any case that would
pass 2 but not 3. We do check reachability with transfer.fsckObjects.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Jeff King
On Wed, Mar 27, 2013 at 03:49:38PM -0400, Jeff King wrote:

> On Wed, Mar 27, 2013 at 11:23:15AM -0700, Rich Fromm wrote:
>
> > But I'm still somewhat confused about what is and is not checked under what
> > conditions.  Consider the three statements:
> >
> > # 1
> > git clone --mirror myuser@myhost:my_repo
> >
> > # 2
> > git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo
> >
> > # 3
> > git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck
> >
> > Are 2 and 3 equivalent?  Or is there an increasing level of checking that
> > occurs from 1 to 2, and from 2 to 3?  My guess is the latter, but perhaps
> > this could be clearer in the man pages.
>
> 2 and 3 are not exactly equivalent, in that they are implemented
> slightly differently, but I do not know offhand of any case that would
> pass 2 but not 3. We do check reachability with transfer.fsckObjects.

Oh, and in the case of #1, I think we would already find corruption, in
that index-pack will expand and check the sha1 of each object we
receive. The transfer.fsckObjects check adds some semantic checks as
well (e.g., making sure author identities are well-formed).

Clone will not currently detect missing objects and reachability
without transfer.fsckObjects set, but that is IMHO a bug; fetch will
notice it, and clone should behave the same way.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 5/9] add test for streaming corrupt blobs

Jeff King
In reply to this post by Jeff King
On Mon, Mar 25, 2013 at 04:21:34PM -0400, Jeff King wrote:

> +# Convert byte at offset "$2" of object "$1" into '\0'
> +corrupt_byte() {
> + obj_file=$(obj_to_file "$1") &&
> + chmod +w "$obj_file" &&
> + printf '\0' | dd of="$obj_file" bs=1 seek="$2"
> +}

Hmm, this last line should probably be:

diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index a84deb1..3f87051 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -12,7 +12,7 @@ corrupt_byte() {
 corrupt_byte() {
  obj_file=$(obj_to_file "$1") &&
  chmod +w "$obj_file" &&
- printf '\0' | dd of="$obj_file" bs=1 seek="$2"
+ printf '\0' | dd of="$obj_file" bs=1 seek="$2" conv=notrunc
 }
 
 test_expect_success 'setup corrupt repo' '

The intent was to change a single byte, not truncate the file (though on
the plus side, that truncation is what found the other bugs).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 5/9] add test for streaming corrupt blobs

Junio C Hamano
Jeff King <[hidden email]> writes:

> On Mon, Mar 25, 2013 at 04:21:34PM -0400, Jeff King wrote:
>
>> +# Convert byte at offset "$2" of object "$1" into '\0'
>> +corrupt_byte() {
>> + obj_file=$(obj_to_file "$1") &&
>> + chmod +w "$obj_file" &&
>> + printf '\0' | dd of="$obj_file" bs=1 seek="$2"
>> +}
>
> Hmm, this last line should probably be:
>
> diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
> index a84deb1..3f87051 100755
> --- a/t/t1060-object-corruption.sh
> +++ b/t/t1060-object-corruption.sh
> @@ -12,7 +12,7 @@ corrupt_byte() {
>  corrupt_byte() {
>   obj_file=$(obj_to_file "$1") &&
>   chmod +w "$obj_file" &&
> - printf '\0' | dd of="$obj_file" bs=1 seek="$2"
> + printf '\0' | dd of="$obj_file" bs=1 seek="$2" conv=notrunc
>  }
>  
>  test_expect_success 'setup corrupt repo' '
>
> The intent was to change a single byte, not truncate the file (though on
> the plus side, that truncation is what found the other bugs).

;-).  Thanks, I missed that.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 9/9] clone: run check_everything_connected

Duy Nguyen
In reply to this post by Jeff King
On Tue, Mar 26, 2013 at 3:26 AM, Jeff King <[hidden email]> wrote:
> The slowdown is really quite terrible if you try "git clone --bare
> linux-2.6.git". Even with this, the local-clone case already misses blob
> corruption. So it probably makes sense to restrict it to just the
> non-local clone case, which already has to do more work.
>
> Even still, it adds a non-trivial amount of work (linux-2.6 takes
> something like a minute to check). I don't like the idea of declaring
> "git clone" non-safe unless you turn on transfer.fsckObjects, though. It
> should have the same safety as "git fetch".

Maybe we could do it in index-pack to save some (wall) time. I haven't
tried but I think it might work. The problem is to make sure the pack
contains objects for all sha1 references in the pack. By that
description, we don't need to do standard DAG traversal. We could
extract sha-1 references in index-pack as we uncompress objects and
put all "want" sha-1 in a hash table. At the end of index-pack, we
check if any sha-1 in the hash table still points to non-existing
object.

This way, at least we don't need to uncompress all objects again in
rev-list. We could parse+hash in both phases in index-pack. The first
phase (parse_pack_objects) is usually I/O bound, we could hide some
cost there. The second phase is multithreaded, all the better.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Jeff Mitchell
In reply to this post by Junio C Hamano
On Tue, Mar 26, 2013 at 11:47 PM, Junio C Hamano <[hidden email]> wrote:
> The difference between --mirror and no --mirror is a red herring.
> You may want to ask Jeff Mitchell to remove the mention of it; it
> only adds to the confusion without helping users.  If you made
> byte-for-byte copy of corrupt repository, it wouldn't make any
> difference if the first "checkout" notices it.

Hi,

Several days ago I had actually already updated the post to indicate
that my testing methodology was incorrect as a result of mixing up
--no-hardlinks and --no-local, and pointed to this thread.

I will say that we did see corrupted repos on the downstream mirrors.
I don't have an explanation for it, but have not been able to
reproduce it either. My only potential guess (untested) is that
perhaps when the corruption was detected the clone aborted but left
the objects already transferred locally. Again, untested -- I mention
it only because it's my only potential explanation  :-)

> To be paranoid, you may want to set transfer.fsckObjects to true,
> perhaps in your ~/.gitconfig.

Interesting; I'd known about receive.fsckObjects but not
transfer/fetch. Thanks for the pointer.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: propagating repo corruption across clone

Jeff Mitchell
In reply to this post by Rich Fromm
On Wed, Mar 27, 2013 at 2:51 PM, Rich Fromm <[hidden email]> wrote:
> Nevertheless, I will try to contact Jeff and point him at this.  My initial
> reading of his blog posts definitely gave me the impression that this was a
> --mirror vs. not issue, but it really sounds like his main problem was using
> --local.

Actually, I wasn't using --local, I just wasn't using --no-local, as I
mixed up that and --no-hardlinks (which lies somewhere between, but is
still not the same as using file://).

It's entirely possible that you read the posts before I updated them.

Also, keep in mind, if you're evaluating what you're doing, that the
system we have was not set up to be a backup system. It was set up to
be a mirror system. With proper sanity checking, it would have acted
in a decent capacity as a backup system, but that wasn't why it was
set up, nor how it was set up.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 9/9] clone: run check_everything_connected

Duy Nguyen
In reply to this post by Duy Nguyen
On Thu, Mar 28, 2013 at 07:40:51AM +0700, Duy Nguyen wrote:

> Maybe we could do it in index-pack to save some (wall) time. I haven't
> tried but I think it might work. The problem is to make sure the pack
> contains objects for all sha1 references in the pack. By that
> description, we don't need to do standard DAG traversal. We could
> extract sha-1 references in index-pack as we uncompress objects and
> put all "want" sha-1 in a hash table. At the end of index-pack, we
> check if any sha-1 in the hash table still points to non-existing
> object.
>
> This way, at least we don't need to uncompress all objects again in
> rev-list. We could parse+hash in both phases in index-pack. The first
> phase (parse_pack_objects) is usually I/O bound, we could hide some
> cost there. The second phase is multithreaded, all the better.

It looks like what I describe above is exactly what index-pack
--strict does. Except that it holds the lock longer and has more
abstraction layers to slow things down. On linux-2.6 with 3 threads:

$ rev-list --all --objects --quiet (aka check_everything_connected)
34.26user 0.22system 0:34.56elapsed 99%CPU (0avgtext+0avgdata 2550528maxresident)k
0inputs+0outputs (0major+208569minor)pagefaults 0swaps

$ index-pack --stdin
214.57user 8.38system 1:31.82elapsed 242%CPU (0avgtext+0avgdata 1357328maxresident)k
8inputs+1421016outputs (0major+1222537minor)pagefaults 0swaps

$ index-pack --stdin --strict
297.36user 13.77system 2:11.82elapsed 236%CPU (0avgtext+0avgdata 1875040maxresident)k
0inputs+1421016outputs (0major+1308718minor)pagefaults 0swaps

$ index-pack --stdin --connectivity
231.09user 7.42system 1:37.39elapsed 244%CPU (0avgtext+0avgdata 2080816maxresident)k
0inputs+1421016outputs (0major+540069minor)pagefaults 0swaps

The last one does not hold locks by duplicating object hash table per
thread. As you can see the consumed memory is much higher than --stdin.
In return it only adds up 1/3 of rev-list time.

Maybe you should check which one is cheaper for clone case,
check_everything_connected() or index-pack --strict.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
123
Loading...