Quantcast

False positives in git diff-index

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

False positives in git diff-index

Alexander Gladysh
Hi, list!

I wrote about the related issue earlier:

http://lists-archives.org/git/731516-false-positives-from-git-diff-index-when-used-with-git-dir.html

Now I've got a case when I can reproduce this problem each time I try to.

Unfortunately I can not share it or create a minimal example — the
case is triggered by a custom complicated automated build process on a
private repository.

Anyway, I'm ready to debug this issue if someone will guide me.

Workflow:

<...change files in /path/dir1/...>
(cd /path && git add </path/dir1/>)
(cd /path && git commit -m <message1>)

... repeat change-add-commit several times for various directories
(can be the same directory or not) ...

<...generate file /path/dirN/foo...>
# Accidentally the file is generated the same as it was

(cd /path && git add </path/dirN/>)
(cd /path && git status) # Refresh index
(cd /path && git diff-index --exit-code --quiet HEAD -- /path/dirN) #
Incorrectly reports that there are some changes
(cd /path && git commit -m <messageN>) # fails, saying that there is
nothing to commit

If I insert sleep 10 between git status and git diff-index, the
problem goes away.

Any help?
Alexander.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Alexander Gladysh
Nobody is interested?

Is there a way I can get some help with this issue?

Thanks,
Alexander.

On Mon, Dec 27, 2010 at 11:49, Alexander Gladysh <[hidden email]> wrote:

> Hi, list!
>
> I wrote about the related issue earlier:
>
> http://lists-archives.org/git/731516-false-positives-from-git-diff-index-when-used-with-git-dir.html
>
> Now I've got a case when I can reproduce this problem each time I try to.
>
> Unfortunately I can not share it or create a minimal example — the
> case is triggered by a custom complicated automated build process on a
> private repository.
>
> Anyway, I'm ready to debug this issue if someone will guide me.
>
> Workflow:
>
> <...change files in /path/dir1/...>
> (cd /path && git add </path/dir1/>)
> (cd /path && git commit -m <message1>)
>
> ... repeat change-add-commit several times for various directories
> (can be the same directory or not) ...
>
> <...generate file /path/dirN/foo...>
> # Accidentally the file is generated the same as it was
>
> (cd /path && git add </path/dirN/>)
> (cd /path && git status) # Refresh index
> (cd /path && git diff-index --exit-code --quiet HEAD -- /path/dirN) #
> Incorrectly reports that there are some changes
> (cd /path && git commit -m <messageN>) # fails, saying that there is
> nothing to commit
>
> If I insert sleep 10 between git status and git diff-index, the
> problem goes away.
>
> Any help?
> Alexander.
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Zenaan Harkness
On Tue, Jan 4, 2011 at 20:45, Alexander Gladysh <[hidden email]> wrote:
> Nobody is interested?

Your problem set appears that you have a rather gnarly corner case
issue, arising from your custom build processes. Although git really
is amazing, I believe you may well be pushing git to its technological
limits.

So your problem could be quite hard to debug, whilst being distinctly
difficult to ascertain the root causes.

It also appears that your custom complicated build process is likely
protecting, or at least integral to, your high value corporate process
assets.

So _in this case_ you would be remiss to not find a suitable
consultant to provide professional and discreet assistance - perhaps
GitHub.com, as GitHub’s Tender provides both public and _private_
support issue posting, and customized and private training if you and/
or your colleagues require; you might contact GitHub direct (
https://github.com/contact ) as their Support page does not link
directly to support contract information; oh, and GitHub supports a
lot of community projects too: their support for our community ought
be supported.

<disclaimer> I am _not_ affiliated with GitHub, I do work full time
with a human rights association in Australia.

Good luck
Zenaan


> Is there a way I can get some help with this issue?
>
> Thanks,
> Alexander.
>
> On Mon, Dec 27, 2010 at 11:49, Alexander Gladysh <[hidden email]> wrote:
>> Hi, list!
>>
>> I wrote about the related issue earlier:
>>
>> http://lists-archives.org/git/731516-false-positives-from-git-diff-index-when-used-with-git-dir.html
>>
>> Now I've got a case when I can reproduce this problem each time I try to.
>>
>> Unfortunately I can not share it or create a minimal example — the
>> case is triggered by a custom complicated automated build process on a
>> private repository.
...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Alexander Gladysh
On Tue, Jan 4, 2011 at 14:47, Zenaan Harkness <[hidden email]> wrote:
> On Tue, Jan 4, 2011 at 20:45, Alexander Gladysh <[hidden email]> wrote:
>> Nobody is interested?

> Your problem set appears that you have a rather gnarly corner case
> issue, arising from your custom build processes. Although git really
> is amazing, I believe you may well be pushing git to its technological
> limits.

Committing few megabytes of data several times per second is
technological limits? I do not believe so.

> So your problem could be quite hard to debug, whilst being distinctly
> difficult to ascertain the root causes.

> It also appears that your custom complicated build process is likely
> protecting, or at least integral to, your high value corporate process
> assets.

> So _in this case_ you would be remiss to not find a suitable
> consultant to provide professional and discreet assistance - perhaps
> GitHub.com, as GitHub’s Tender provides both public and _private_
> support issue posting, and customized and private training if you and/
> or your colleagues require; you might contact GitHub direct (
> https://github.com/contact ) as their Support page does not link
> directly to support contract information; oh, and GitHub supports a
> lot of community projects too: their support for our community ought
> be supported.

> <disclaimer> I am _not_ affiliated with GitHub, I do work full time
> with a human rights association in Australia.

Thank you for your opinion.

I view this particular situation as follows:

1. I found a reproducible case for a hard to catch bug in Git. (This
is a bug in Git, not in my build process.) This bug in its
intermittent form annoyed me for quite some time — several months at
least — and is likely to annoy other users. (I'm not *that* unique!)

2. I can live happily with sleep(0.2) in my deployment code (while
this is not very satisfying, it is acceptable — certainly cheaper than
a paid consultant).

3. I'm willing to help Git developers with catching this bug for
mutual benefit — I will get rid of annoying issue and make my
deployment code more robust. Git will, well, be a bit more robust as
well.

4. The sole reason I'm pinging back on this bug report is that I'm
afraid to accidentally lose the data snapshot (or something in
environment) that makes the issue reproducible.

5. If no one is interested, well, that's opensource :-) No hard feelings.

Alexander.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Alexander Gladysh
On Tue, Jan 4, 2011 at 14:08, Jakub Narebski <[hidden email]> wrote:
> Alexander Gladysh <[hidden email]> writes:
>> On Tue, Jan 4, 2011 at 14:47, Zenaan Harkness <[hidden email]> wrote:
>> > On Tue, Jan 4, 2011 at 20:45, Alexander Gladysh <[hidden email]> wrote:

>> > So your problem could be quite hard to debug, whilst being distinctly
>> > difficult to ascertain the root causes.

>> 1. I found a reproducible case for a hard to catch bug in Git. (This
>> is a bug in Git, not in my build process.) This bug in its
>> intermittent form annoyed me for quite some time — several months at
>> least — and is likely to annoy other users. (I'm not *that* unique!)

> But it is reproductible to you: from what I understand you didn't find
> some minimal example to reproduce this issue without need for access
> your proprietary build process.

> AG> Unfortunately I can not share it or create a minimal example ? the
> AG> case is triggered by a custom complicated automated build process on a
> AG> private repository.

Yes, that is true. Still, much, much better than intermittent.

>> 3. I'm willing to help Git developers with catching this bug for
>> mutual benefit — I will get rid of annoying issue and make my
>> deployment code more robust. Git will, well, be a bit more robust as
>> well.

> To debug it, if you cannot do it yourself, you would have to find git
> developer who is both knowledgeable about fairly deep part of git
> code, and can work with remote debugging with you at remote.

I understand that. But is the second part of requirement is such a
large problem?

Anyway, as I said, if no one will step up, no problem.

> P.S. Somewhere in the depths of git maling list archive (it didn't
> unfortunately made it to "Interfaces, Frontends and tools" page on git
> wiki) there is tool/script for anonymizing git repository, to allow
> debugging of bugs which occurs in some repositories that cannot be
> made public.  Perhaps something similar could be done for your build
> process (you need to reproduce only stat + git part)?

I remember, somebody advised me to use this tool, when I reported some
bug some time (maybe a year) ago.

But, I'm afraid, I do not know how to separate my deployment tool
logic (which reproduces the bug) from the repository data. If I did
know, I'd come up with a minimal example already. Nothing trivial
"along the lines", that I tried so far, does reproduce it.

Alexander.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Jeff King
In reply to this post by Alexander Gladysh
On Tue, Jan 04, 2011 at 12:45:33PM +0300, Alexander Gladysh wrote:

> > Anyway, I'm ready to debug this issue if someone will guide me.
> >
> > Workflow:
> >
> > <...change files in /path/dir1/...>
> > (cd /path && git add </path/dir1/>)
> > (cd /path && git commit -m <message1>)
> >
> > ... repeat change-add-commit several times for various directories
> > (can be the same directory or not) ...
> >
> > <...generate file /path/dirN/foo...>
> > # Accidentally the file is generated the same as it was
> >
> > (cd /path && git add </path/dirN/>)
> > (cd /path && git status) # Refresh index
> > (cd /path && git diff-index --exit-code --quiet HEAD -- /path/dirN) #
> > Incorrectly reports that there are some changes
> > (cd /path && git commit -m <messageN>) # fails, saying that there is
> > nothing to commit
> >
> > If I insert sleep 10 between git status and git diff-index, the
> > problem goes away.

If adding a sleep makes it work, that sounds like a race condition in
git. But from the description of your workflow, it should be easy to
make a minimal example:

-- >8 --
#!/bin/sh

random() {
  perl -e 'print int(rand(5))+1, "\n"'
}

rm -rf repo
mkdir repo && cd repo && git init

for i in 1 2 3 4 5; do
  mkdir dir$i
  echo initial >dir$i/file
done
git add .
git commit -m initial

while true; do
  for i in 1 2 3 4 5; do
    random >dir$i/file
    git add dir$i
    git update-index --refresh
    if ! git diff-index --exit-code --quiet HEAD -- dir$i; then
      if ! git commit -m foo; then
        echo breakage
        exit 1
      fi
    else
      echo not bothering to commit
    fi
  done
done
-- 8< --

Basically, we generate random data which has a 20% chance of
being the same as what's there. When it is, we should get "not bothering
to commit", but in your error case, we would try to commit (and get "no
changes").

But using that script, I can't replicate your problem. Can you try
running it on the same box you're having trouble with? That might at
least tell us if it's your environment or something more complex going
on.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Alexander Gladysh
On Wed, Jan 5, 2011 at 05:48, Jeff King <[hidden email]> wrote:
> On Tue, Jan 04, 2011 at 12:45:33PM +0300, Alexander Gladysh wrote:
>> > Anyway, I'm ready to debug this issue if someone will guide me.

> If adding a sleep makes it work, that sounds like a race condition in
> git. But from the description of your workflow, it should be easy to
> make a minimal example:

> -- 8< --

> Basically, we generate random data which has a 20% chance of
> being the same as what's there. When it is, we should get "not bothering
> to commit", but in your error case, we would try to commit (and get "no
> changes").

> But using that script, I can't replicate your problem. Can you try
> running it on the same box you're having trouble with? That might at
> least tell us if it's your environment or something more complex going
> on.

Thank you. I tried it, and, unfortunately, it does not reproduce the problem.

Alexander.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Jeff King
On Wed, Jan 05, 2011 at 06:07:35AM +0000, Alexander Gladysh wrote:

> > Basically, we generate random data which has a 20% chance of
> > being the same as what's there. When it is, we should get "not bothering
> > to commit", but in your error case, we would try to commit (and get "no
> > changes").
>
> > But using that script, I can't replicate your problem. Can you try
> > running it on the same box you're having trouble with? That might at
> > least tell us if it's your environment or something more complex going
> > on.
>
> Thank you. I tried it, and, unfortunately, it does not reproduce the
> problem.

Oh well, thanks for trying.

Going back to your original reproduction recipe, can you change the
"diff-index" line to actually report on what it thinks is different?
That is, drop the "--quiet" and have it actually produce a patch?

It would be interesting to see what is different, and how that compares
with the "git status" you run just prior to it (and whether it matches
the file you "git add"ed just above).

You haven't told us much about your build process. Are you absolutely
sure that there couldn't be another process on the system manipulating
the files between the various runs?

Are you running on top of any special filesystem that might not meet the
consistency guarantees we expect (though in that case, I would assume my
trivial script would have reproduced).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Alexander Gladysh
On Wed, Jan 5, 2011 at 06:15, Jeff King <[hidden email]> wrote:
> On Wed, Jan 05, 2011 at 06:07:35AM +0000, Alexander Gladysh wrote:

>> > But using that script, I can't replicate your problem. Can you try
>> > running it on the same box you're having trouble with? That might at
>> > least tell us if it's your environment or something more complex going
>> > on.

>> Thank you. I tried it, and, unfortunately, it does not reproduce the
>> problem.

> Oh well, thanks for trying.

> Going back to your original reproduction recipe, can you change the
> "diff-index" line to actually report on what it thinks is different?
> That is, drop the "--quiet" and have it actually produce a patch?

----> Rebuilding manifest...
Making manifest for .
Generating index.html for .
:100644 100644 483a7292436daecc9bea0ab265ee19d587b14298
0000000000000000000000000000000000000000
M cluster/localhost-ag/rocks/index.html
:100644 100644 fcb9ff896fd1a1bd15663fa9be19b250789d4a25
0000000000000000000000000000000000000000
M cluster/localhost-ag/rocks/manifest

These are the two files, which are overridden with identical content.
(See below, looks like I realized who to blame.)

If I read this correctly, Git tells me that the files are deleted. No?

Anyway, I checked, looks like that files are overridden (by
fopen("name", "w")), never explicitly deleted. If it is important, I
will checkout strace.

Contunuing with the script:

----> Comitting changed manifest...
2edcbfabc11f9bbab4fc8c059490cba9ae196d27
# On branch ag/git-debugging
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
# typechange: cluster/localhost-ag/versions/versions-current.lua
#
no changes added to commit (use "git add" and/or "git commit -a")

Suddenly: no changes.

> It would be interesting to see what is different, and how that compares
> with the "git status" you run just prior to it (and whether it matches
> the file you "git add"ed just above).

Git status before:

$ git status
# On branch ag/git-debugging
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
# typechange: cluster/localhost-ag/versions/versions-current.lua
#
no changes added to commit (use "git add" and/or "git commit -a")

> You haven't told us much about your build process. Are you absolutely
> sure that there couldn't be another process on the system manipulating
> the files between the various runs?

No other process. But see below.

> Are you running on top of any special filesystem that might not meet the
> consistency guarantees we expect (though in that case, I would assume my
> trivial script would have reproduced).

And here I have to say "Oops".

My apologies, I should have realized this before: my project is
mounted on VMWare's HGFS.

(That is: VMWare Fusion Ubuntu Guest -> HGFS -> OS X 10.6 Host files.)

The problem is not reproduced if I copy the project to the native fs
in the guest machine.

But the problem is also not reproduced if I execute your script on the HGFS.

So, does that mean that HGFS violates consistency guarantees?

Alexander.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Jeff King
On Wed, Jan 05, 2011 at 07:46:19AM +0000, Alexander Gladysh wrote:

> ----> Rebuilding manifest...
> Making manifest for .
> Generating index.html for .
> :100644 100644 483a7292436daecc9bea0ab265ee19d587b14298
> 0000000000000000000000000000000000000000
> M cluster/localhost-ag/rocks/index.html
> :100644 100644 fcb9ff896fd1a1bd15663fa9be19b250789d4a25
> 0000000000000000000000000000000000000000
> M cluster/localhost-ag/rocks/manifest
>
> These are the two files, which are overridden with identical content.
> (See below, looks like I realized who to blame.)
>
> If I read this correctly, Git tells me that the files are deleted. No?

No, it just means that the files are stat-dirty with respect to the
index. For example:

  $ git init
  $ touch file && git add file && git commit -m one
  $ touch file
  $ git diff-files
  :100644 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0000000000000000000000000000000000000000 M      file
  $ git update-index --refresh
  $ git diff-files
  <no output>

But in your case, the stat information should be up to date, since you
just ran update-index. But see below.

> > Are you running on top of any special filesystem that might not meet the
> > consistency guarantees we expect (though in that case, I would assume my
> > trivial script would have reproduced).
>
> And here I have to say "Oops".
>
> My apologies, I should have realized this before: my project is
> mounted on VMWare's HGFS.
>
> (That is: VMWare Fusion Ubuntu Guest -> HGFS -> OS X 10.6 Host files.)
>
> The problem is not reproduced if I copy the project to the native fs
> in the guest machine.
>
> But the problem is also not reproduced if I execute your script on the HGFS.
>
> So, does that mean that HGFS violates consistency guarantees?

Hmm. That could be the problem.  It may not violate traditional
consistency guarantees, but I wonder if it is returning slightly
different stat information between the program runs. That would mean
"git status" does an index refresh and puts some stat information in the
index, but the followup "git diff-index" might see different stat
information.

That's just a theory, though. You might try the patch below:

diff --git a/read-cache.c b/read-cache.c
index 4f2e890..1b415a3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -283,6 +283,8 @@ int ie_match_stat(const struct index_state *istate,
  return DATA_CHANGED | TYPE_CHANGED | MODE_CHANGED;
 
  changed = ce_match_stat_basic(ce, st);
+ if (changed)
+ fprintf(stderr, "changed (%u): %s\n", changed, ce->name);
 
  /*
  * Within 1 second of this sequence:

The number in parentheses is the bitwise-or of the things git found that
caused the stat information to be stale (the actual flags are the
*_CHANGED defines in cache.h, but I was too lazy to write a
pretty-printer). If you can get the output from diff-files for the error
case, we can at least see why git thinks the cache is stale.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: False positives in git diff-index

Alexander Gladysh
Hi, Jeff,

Apologies for delay. Apparently, after all, the problem does not
reproduce each time (like 33% of the time, or there is some another
factor that I did not realized yet). And the build process is quite
lengthy, so I was not able to gather stats fast enough. But here they
are, below:

On Wed, Jan 5, 2011 at 08:08, Jeff King <[hidden email]> wrote:
> On Wed, Jan 05, 2011 at 07:46:19AM +0000, Alexander Gladysh wrote:

>> ----> Rebuilding manifest...
>> Making manifest for .
>> Generating index.html for .
>> :100644 100644 483a7292436daecc9bea0ab265ee19d587b14298
>> 0000000000000000000000000000000000000000
>> M     cluster/localhost-ag/rocks/index.html
>> :100644 100644 fcb9ff896fd1a1bd15663fa9be19b250789d4a25
>> 0000000000000000000000000000000000000000
>> M     cluster/localhost-ag/rocks/manifest

>> So, does that mean that HGFS violates consistency guarantees?
>
> Hmm. That could be the problem.  It may not violate traditional
> consistency guarantees, but I wonder if it is returning slightly
> different stat information between the program runs. That would mean
> "git status" does an index refresh and puts some stat information in the
> index, but the followup "git diff-index" might see different stat
> information.
>
> That's just a theory, though. You might try the patch below:

<...>


----> Rebuilding manifest...
Making manifest for .
Generating index.html for .
changed (3): cluster/localhost-ag/rocks/index.html
changed (3): cluster/localhost-ag/rocks/manifest
changed (3): cluster/localhost-ag/rocks/index.html
changed (3): cluster/localhost-ag/rocks/manifest
changed (115): cluster/localhost-ag/versions/versions-current.lua
changed (115): cluster/localhost-ag/versions/versions-current.lua
changed (3): cluster/localhost-ag/rocks/index.html
changed (3): cluster/localhost-ag/rocks/manifest
:100644 100644 483a7292436daecc9bea0ab265ee19d587b14298
0000000000000000000000000000000000000000
M cluster/localhost-ag/rocks/index.html
:100644 100644 fcb9ff896fd1a1bd15663fa9be19b250789d4a25
0000000000000000000000000000000000000000
M cluster/localhost-ag/rocks/manifest
----> Comitting changed manifest...
changed (3): cluster/localhost-ag/rocks/index.html
changed (3): cluster/localhost-ag/rocks/index.html
changed (3): cluster/localhost-ag/rocks/manifest
changed (3): cluster/localhost-ag/rocks/manifest
changed (115): cluster/localhost-ag/versions/versions-current.lua
changed (115): cluster/localhost-ag/versions/versions-current.lua
9760438c65f7b0293459e622153f235434436ad6
changed (115): cluster/localhost-ag/versions/versions-current.lua
# On branch ag/git-debugging
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
# typechange: cluster/localhost-ag/versions/versions-current.lua
#
no changes added to commit (use "git add" and/or "git commit -a")

Hope this makes some sense,
Alexander.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...