git status core dump with bad sector!

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

git status core dump with bad sector!

Eric Chamberland
Hi,

just cloned a repo and it checked-out wihtout any error (with git 2.2.0)
but got come corrupted files (because I got some sdd failures).

Then, I get a git core dump when trying to "git status" into the repo
which have a "bad sector" on sdd drive (crypted partition).

I tried with git 2.2.0 AND git version 2.8.1.185.gdc0db2c.dirty (just
modified the Makefile to remove STRIP part)

In both cases, I have a  Bus error (core dumped)

Tried to make it more verbose:

GIT_TRACE=2 GIT_CURL_VERBOSE=2 GIT_TRACE_PERFORMANCE=2
GIT_TRACE_PACK_ACCESS=2 GIT_TRACE_PACKET=2 GIT_TRACE_PACKFILE=2
GIT_TRACE_SETUP=2 GIT_TRACE_SHALLOW=2 /opt/gitgit/bin/git status
10:54:30.644999 trace.c:318             setup: git_dir: .git
10:54:30.645094 trace.c:319             setup: git_common_dir: .git
10:54:30.645102 trace.c:320             setup: worktree:
/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/TestValidation_avec_erreur_disque_git_core_dump_dans_dev_Test.ExportationVTK_Avion
10:54:30.645112 trace.c:321             setup: cwd:
/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/TestValidation_avec_erreur_disque_git_core_dump_dans_dev_Test.ExportationVTK_Avion
10:54:30.645151 trace.c:322             setup: prefix:
Ressources/dev/Test.ExportationVTK/
10:54:30.645181 git.c:350               trace: built-in: git 'status'
Bus error (core dumped)

started in gdb:

Program received signal SIGBUS, Bus error.
0x00007ffff7866d58 in ?? () from /lib64/libcrypto.so.1.0.0
(gdb) bt
#0  0x00007ffff7866d58 in ?? () from /lib64/libcrypto.so.1.0.0
#1  0x3334d90d8c20f3f0 in ?? ()
#2  0xe59b5a6cd844a601 in ?? ()
#3  0xc587a53f67985ae7 in ?? ()
#4  0x3ce81893e5541777 in ?? ()
#5  0xdeb18349a4b042ea in ?? ()
#6  0x8254de489067ec4b in ?? ()
#7  0x6fbef2439704c81b in ?? ()
#8  0xe0eee2bb385a96da in ?? ()
#9  0x00007ffff6e19ab3 in ?? ()
#10 0x00007fffffffc4d0 in ?? ()
#11 0x000000000000001d in ?? ()
#12 0x00007ffff7863f80 in SHA1_Update () from /lib64/libcrypto.so.1.0.0
#13 0x00000000005102c0 in write_sha1_file_prepare
(buf=buf@entry=0x7ffff6c81000, len=1673936, type=<optimized out>,
sha1=sha1@entry=0x7fffffffc750 "\340_~", hdr=hdr@entry=0x7fffffffc570
"blob 1673936",
     hdrlen=hdrlen@entry=0x7fffffffc56c) at sha1_file.c:2951
#14 0x000000000051567b in hash_sha1_file (buf=buf@entry=0x7ffff6c81000,
len=<optimized out>, type=<optimized out>,
sha1=sha1@entry=0x7fffffffc750 "\340_~") at sha1_file.c:3010
#15 0x00000000005159f8 in index_mem (sha1=sha1@entry=0x7fffffffc750
"\340_~", buf=buf@entry=0x7ffff6c81000, size=1673936,
type=type@entry=OBJ_BLOB,
     path=path@entry=0x80a818
"Ressources/dev/Test.ExportationVTK/Ressources.Avion/Avion.Quadratique.cont.vtu.etalon",
flags=flags@entry=0) at sha1_file.c:3305
#16 0x00000000005160ee in index_core (flags=0, path=0x80a818
"Ressources/dev/Test.ExportationVTK/Ressources.Avion/Avion.Quadratique.cont.vtu.etalon",
type=OBJ_BLOB, size=<optimized out>, fd=7,
     sha1=0x7fffffffc750 "\340_~") at sha1_file.c:3367
#17 index_fd (sha1=sha1@entry=0x7fffffffc750 "\340_~", fd=7,
st=st@entry=0x7fffffffc7c0, type=type@entry=OBJ_BLOB,
     path=path@entry=0x80a818
"Ressources/dev/Test.ExportationVTK/Ressources.Avion/Avion.Quadratique.cont.vtu.etalon",
flags=flags@entry=0) at sha1_file.c:3410
#18 0x00000000004eac66 in ce_compare_data (st=0x7fffffffc7c0,
ce=0x80a7c0) at read-cache.c:166
#19 ce_modified_check_fs (ce=0x80a7c0, st=0x7fffffffc7c0) at
read-cache.c:215
#20 0x00000000004ebb6d in ie_modified (istate=istate@entry=0x7e5fe0
<the_index>, ce=ce@entry=0x80a7c0, st=st@entry=0x7fffffffc7c0,
options=options@entry=16) at read-cache.c:395
#21 0x00000000004ebcfe in refresh_cache_ent
(istate=istate@entry=0x7e5fe0 <the_index>, ce=ce@entry=0x80a7c0,
options=options@entry=16, err=err@entry=0x7fffffffc908,
     changed_ret=changed_ret@entry=0x7fffffffc90c) at read-cache.c:1130
#22 0x00000000004ed59c in refresh_index (istate=0x7e5fe0 <the_index>,
flags=flags@entry=6, pathspec=pathspec@entry=0x7bb738 <s.25876+24>,
seen=seen@entry=0x0, header_msg=header_msg@entry=0x0)
     at read-cache.c:1221
#23 0x0000000000429e3b in cmd_status (argc=<optimized out>,
argv=0x7fffffffcca0, prefix=0x7e950f
"Ressources/dev/Test.ExportationVTK/") at builtin/commit.c:1376
#24 0x00000000004063b3 in run_builtin (argv=0x7fffffffcca0, argc=1,
p=0x7b4030 <commands+2352>) at git.c:352
#25 handle_builtin (argc=1, argv=0x7fffffffcca0) at git.c:539
#26 0x00000000004054a1 in run_argv (argv=0x7fffffffca80,
argcp=0x7fffffffca6c) at git.c:593
#27 main (argc=1, av=<optimized out>) at git.c:698

Ii would have expected git to first gave me an error when checking out
the files!!! Here is the log:

Checking out files:  99% (28645/28934)
Checking out files: 100% (28934/28934)
Checking out files: 100% (28934/28934), done.
Already on 'master'
Your branch is up-to-date with 'origin/master'.
     On valide le dépôt TestValidation avec la référence:
9b4a485202b2b52922377842c15bfd605d240667
HEAD is now at 9b4a485 BUG: On spécifie bash comme shell...

But at least 1 file is corrupted!

I keep preciously this faulty repo to further investigation with someone
who can help dig into the coredump and correct it...

I am available to recompile a new git to help...

Thanks,

Eric
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: git status core dump with bad sector!

Jeff King
On Thu, Apr 14, 2016 at 10:59:57AM -0400, Eric Chamberland wrote:

> just cloned a repo and it checked-out wihtout any error (with git 2.2.0) but
> got come corrupted files (because I got some sdd failures).
>
> Then, I get a git core dump when trying to "git status" into the repo which
> have a "bad sector" on sdd drive (crypted partition).
>
> I tried with git 2.2.0 AND git version 2.8.1.185.gdc0db2c.dirty (just
> modified the Makefile to remove STRIP part)
>
> In both cases, I have a  Bus error (core dumped)

Interesting. There was a known issue with reading corrupted pack .idx
files, but it was fixed in v2.8.0. So this could be a new thing.

SIGBUS is somewhat rare, though (usually just accessing unmapped memory
should get us a SIGSEGV). What platform are you on? I seem to recall
that hardware like ARM that cares about memory alignment is more likely
to get a SIGBUS.

> Program received signal SIGBUS, Bus error.
> 0x00007ffff7866d58 in ?? () from /lib64/libcrypto.so.1.0.0
> (gdb) bt
> #0  0x00007ffff7866d58 in ?? () from /lib64/libcrypto.so.1.0.0
> #1  0x3334d90d8c20f3f0 in ?? ()
> #2  0xe59b5a6cd844a601 in ?? ()
> #3  0xc587a53f67985ae7 in ?? ()
> #4  0x3ce81893e5541777 in ?? ()
> #5  0xdeb18349a4b042ea in ?? ()
> #6  0x8254de489067ec4b in ?? ()
> #7  0x6fbef2439704c81b in ?? ()
> #8  0xe0eee2bb385a96da in ?? ()
> #9  0x00007ffff6e19ab3 in ?? ()
> #10 0x00007fffffffc4d0 in ?? ()
> #11 0x000000000000001d in ?? ()
> #12 0x00007ffff7863f80 in SHA1_Update () from /lib64/libcrypto.so.1.0.0
> #13 0x00000000005102c0 in write_sha1_file_prepare
> (buf=buf@entry=0x7ffff6c81000, len=1673936, type=<optimized out>,
> sha1=sha1@entry=0x7fffffffc750 "\340_~", hdr=hdr@entry=0x7fffffffc570 "blob
> 1673936",

So I'd assume here that the problem is in accessing the memory in "buf".
to actually compute the sha1. That is mmap'd data, but the process is
fairly bland (mmap however many bytes stat() tells us the file has, and
then compute the sha1). You mentioned a bad sector; could it be that the
filesystem is corrupted, and the OS is giving us SIGBUS when trying to
read unavailable bytes from an mmap'd file?

That would explain the SIGBUS versus SIGSEGV.

What happens if you "cat" the file in question:

> #15 0x00000000005159f8 in index_mem (sha1=sha1@entry=0x7fffffffc750
> "\340_~", buf=buf@entry=0x7ffff6c81000, size=1673936,
> type=type@entry=OBJ_BLOB,
>     path=path@entry=0x80a818 "Ressources/dev/Test.ExportationVTK/Ressources.Avion/Avion.Quadratique.cont.vtu.etalon",
> flags=flags@entry=0) at sha1_file.c:3305

Can it show all of the bytes? I guess from the "size" field it's too big
to manually verify, but "cat >/dev/null" should be enough to see if we
can read the whole thing.

> Ii would have expected git to first gave me an error when checking out the
> files!!! Here is the log:
>
> Checking out files:  99% (28645/28934)
> Checking out files: 100% (28934/28934)
> Checking out files: 100% (28934/28934), done.
> Already on 'master'
> Your branch is up-to-date with 'origin/master'.
>     On valide le dépôt TestValidation avec la référence:
> 9b4a485202b2b52922377842c15bfd605d240667
> HEAD is now at 9b4a485 BUG: On spécifie bash comme shell...
>
> But at least 1 file is corrupted!
>
> I keep preciously this faulty repo to further investigation with someone who
> can help dig into the coredump and correct it...

So _if_ my guess is right that you have filesystem corruption, git may
not even know about it. It wrote the file, and the OS said "OK,
success", not knowing it had been partially corrupted.

And if that guess is right, it also means there's no git bug to fix.
SIGBUS is the natural way for the OS to tell the process that mmap'd
data isn't available.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: git status core dump with bad sector!

Eric Chamberland
Hi,

sorry for the delay...

On 22/04/16 01:11 AM, Jeff King wrote:

> On Thu, Apr 14, 2016 at 10:59:57AM -0400, Eric Chamberland wrote:
>
>> just cloned a repo and it checked-out wihtout any error (with git 2.2.0) but
>> got come corrupted files (because I got some sdd failures).
>>
>> Then, I get a git core dump when trying to "git status" into the repo which
>> have a "bad sector" on sdd drive (crypted partition).
>>
>> I tried with git 2.2.0 AND git version 2.8.1.185.gdc0db2c.dirty (just
>> modified the Makefile to remove STRIP part)
>>
>> In both cases, I have a  Bus error (core dumped)
>
> Interesting. There was a known issue with reading corrupted pack .idx
> files, but it was fixed in v2.8.0. So this could be a new thing.
>
> SIGBUS is somewhat rare, though (usually just accessing unmapped memory
> should get us a SIGSEGV). What platform are you on? I seem to recall
> that hardware like ARM that cares about memory alignment is more likely
> to get a SIGBUS.
>
Linux ... 3.7.10-1.45-desktop #1 SMP PREEMPT Tue Dec 16 20:27:58 UTC
2014 (4c885a1) x86_64 x86_64 x86_64 GNU/Linux
df .
Filesystem                                     1K-blocks      Used
Available Use% Mounted on
/dev/mapper/cr_ata-ST31000524AS_6VPCXHSW-part1 961430856 699476812
213116108  77% /pmi

model name      : Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz

>> Program received signal SIGBUS, Bus error.
>> 0x00007ffff7866d58 in ?? () from /lib64/libcrypto.so.1.0.0
>> (gdb) bt
>> #0  0x00007ffff7866d58 in ?? () from /lib64/libcrypto.so.1.0.0
>> #1  0x3334d90d8c20f3f0 in ?? ()
>> #2  0xe59b5a6cd844a601 in ?? ()
>> #3  0xc587a53f67985ae7 in ?? ()
>> #4  0x3ce81893e5541777 in ?? ()
>> #5  0xdeb18349a4b042ea in ?? ()
>> #6  0x8254de489067ec4b in ?? ()
>> #7  0x6fbef2439704c81b in ?? ()
>> #8  0xe0eee2bb385a96da in ?? ()
>> #9  0x00007ffff6e19ab3 in ?? ()
>> #10 0x00007fffffffc4d0 in ?? ()
>> #11 0x000000000000001d in ?? ()
>> #12 0x00007ffff7863f80 in SHA1_Update () from /lib64/libcrypto.so.1.0.0
>> #13 0x00000000005102c0 in write_sha1_file_prepare
>> (buf=buf@entry=0x7ffff6c81000, len=1673936, type=<optimized out>,
>> sha1=sha1@entry=0x7fffffffc750 "\340_~", hdr=hdr@entry=0x7fffffffc570 "blob
>> 1673936",
>
> So I'd assume here that the problem is in accessing the memory in "buf".
> to actually compute the sha1. That is mmap'd data, but the process is
> fairly bland (mmap however many bytes stat() tells us the file has, and
> then compute the sha1). You mentioned a bad sector; could it be that the
> filesystem is corrupted, and the OS is giving us SIGBUS when trying to
> read unavailable bytes from an mmap'd file?

Yes it could be that...

>
> That would explain the SIGBUS versus SIGSEGV.
>
> What happens if you "cat" the file in question:

hmmm, it shows the beginning of the file, then ends with:

cat: Avion.Quadratique.cont.vtu.etalon: Input/output error

also, this appear in /var/log/messages:

2016-05-04T16:33:19.243595-04:00 melkor kernel: [1096660.854161]
ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
2016-05-04T16:33:19.243609-04:00 melkor kernel: [1096660.854165]
ata4.00: irq_stat 0x40000008
2016-05-04T16:33:19.243610-04:00 melkor kernel: [1096660.854168]
ata4.00: failed command: READ FPDMA QUEUED
2016-05-04T16:33:19.243611-04:00 melkor kernel: [1096660.854175]
ata4.00: cmd 60/08:00:70:30:c6/00:00:53:00:00/40 tag 0 ncq 4096 in
2016-05-04T16:33:19.243612-04:00 melkor kernel: [1096660.854175]
   res 41/40:08:71:30:c6/00:00:53:00:00/00 Emask 0x409 (media error) <F>
2016-05-04T16:33:19.243613-04:00 melkor kernel: [1096660.854178]
ata4.00: status: { DRDY ERR }
2016-05-04T16:33:19.243614-04:00 melkor kernel: [1096660.854180]
ata4.00: error: { UNC }
2016-05-04T16:33:19.340479-04:00 melkor kernel: [1096660.950794]
ata4.00: configured for UDMA/133
2016-05-04T16:33:19.340484-04:00 melkor kernel: [1096660.950806] sd
3:0:0:0: [sdb] Unhandled sense code
2016-05-04T16:33:19.340485-04:00 melkor kernel: [1096660.950809] sd
3:0:0:0: [sdb]
2016-05-04T16:33:19.340485-04:00 melkor kernel: [1096660.950811] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
2016-05-04T16:33:19.340486-04:00 melkor kernel: [1096660.950814] sd
3:0:0:0: [sdb]
2016-05-04T16:33:19.340486-04:00 melkor kernel: [1096660.950815] Sense
Key : Medium Error [current] [descriptor]
2016-05-04T16:33:19.340486-04:00 melkor kernel: [1096660.950819]
Descriptor sense data with sense descriptors (in hex):
2016-05-04T16:33:19.340487-04:00 melkor kernel: [1096660.950820]
  72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
2016-05-04T16:33:19.340487-04:00 melkor kernel: [1096660.950829]
  53 c6 30 71
2016-05-04T16:33:19.340488-04:00 melkor kernel: [1096660.950834] sd
3:0:0:0: [sdb]
2016-05-04T16:33:19.340488-04:00 melkor kernel: [1096660.950836] Add.
Sense: Unrecovered read error - auto reallocate failed
2016-05-04T16:33:19.340489-04:00 melkor kernel: [1096660.950839] sd
3:0:0:0: [sdb] CDB:
2016-05-04T16:33:19.340489-04:00 melkor kernel: [1096660.950840]
Read(10): 28 00 53 c6 30 70 00 00 08 00
2016-05-04T16:33:19.340489-04:00 melkor kernel: [1096660.950848]
end_request: I/O error, dev sdb, sector 1405497457
2016-05-04T16:33:19.340490-04:00 melkor kernel: [1096660.950870] ata4:
EH complete
2016-05-04T16:33:22.157550-04:00 melkor kernel: [1096663.764515]
ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
2016-05-04T16:33:22.157561-04:00 melkor kernel: [1096663.764519]
ata4.00: irq_stat 0x40000008
2016-05-04T16:33:22.157563-04:00 melkor kernel: [1096663.764522]
ata4.00: failed command: READ FPDMA QUEUED
2016-05-04T16:33:22.157564-04:00 melkor kernel: [1096663.764529]
ata4.00: cmd 60/08:00:70:30:c6/00:00:53:00:00/40 tag 0 ncq 4096 in
2016-05-04T16:33:22.157565-04:00 melkor kernel: [1096663.764529]
   res 41/40:08:71:30:c6/00:00:53:00:00/00 Emask 0x409 (media error) <F>
2016-05-04T16:33:22.157566-04:00 melkor kernel: [1096663.764532]
ata4.00: status: { DRDY ERR }
2016-05-04T16:33:22.157567-04:00 melkor kernel: [1096663.764534]
ata4.00: error: { UNC }
2016-05-04T16:33:22.180479-04:00 melkor kernel: [1096663.787215]
ata4.00: configured for UDMA/133
2016-05-04T16:33:22.180485-04:00 melkor kernel: [1096663.787225] sd
3:0:0:0: [sdb] Unhandled sense code
2016-05-04T16:33:22.180486-04:00 melkor kernel: [1096663.787228] sd
3:0:0:0: [sdb]
2016-05-04T16:33:22.180486-04:00 melkor kernel: [1096663.787230] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
2016-05-04T16:33:22.180487-04:00 melkor kernel: [1096663.787232] sd
3:0:0:0: [sdb]
2016-05-04T16:33:22.180487-04:00 melkor kernel: [1096663.787234] Sense
Key : Medium Error [current] [descriptor]
2016-05-04T16:33:22.180487-04:00 melkor kernel: [1096663.787237]
Descriptor sense data with sense descriptors (in hex):
2016-05-04T16:33:22.180488-04:00 melkor kernel: [1096663.787238]
  72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
2016-05-04T16:33:22.180488-04:00 melkor kernel: [1096663.787247]
  53 c6 30 71
2016-05-04T16:33:22.180489-04:00 melkor kernel: [1096663.787252] sd
3:0:0:0: [sdb]
2016-05-04T16:33:22.180489-04:00 melkor kernel: [1096663.787254] Add.
Sense: Unrecovered read error - auto reallocate failed
2016-05-04T16:33:22.180490-04:00 melkor kernel: [1096663.787256] sd
3:0:0:0: [sdb] CDB:
2016-05-04T16:33:22.180490-04:00 melkor kernel: [1096663.787258]
Read(10): 28 00 53 c6 30 70 00 00 08 00
2016-05-04T16:33:22.180490-04:00 melkor kernel: [1096663.787266]
end_request: I/O error, dev sdb, sector 1405497457
2016-05-04T16:33:22.180491-04:00 melkor kernel: [1096663.787280] ata4:
EH complete


>
>> #15 0x00000000005159f8 in index_mem (sha1=sha1@entry=0x7fffffffc750
>> "\340_~", buf=buf@entry=0x7ffff6c81000, size=1673936,
>> type=type@entry=OBJ_BLOB,
>>      path=path@entry=0x80a818 "Ressources/dev/Test.ExportationVTK/Ressources.Avion/Avion.Quadratique.cont.vtu.etalon",
>> flags=flags@entry=0) at sha1_file.c:3305
>
> Can it show all of the bytes? I guess from the "size" field it's too big
> to manually verify, but "cat >/dev/null" should be enough to see if we
> can read the whole thing.
>
>> Ii would have expected git to first gave me an error when checking out the
>> files!!! Here is the log:
>>
>> Checking out files:  99% (28645/28934)
>> Checking out files: 100% (28934/28934)
>> Checking out files: 100% (28934/28934), done.
>> Already on 'master'
>> Your branch is up-to-date with 'origin/master'.
>>      On valide le dépôt TestValidation avec la référence:
>> 9b4a485202b2b52922377842c15bfd605d240667
>> HEAD is now at 9b4a485 BUG: On spécifie bash comme shell...
>>
>> But at least 1 file is corrupted!
>>
>> I keep preciously this faulty repo to further investigation with someone who
>> can help dig into the coredump and correct it...
>
> So _if_ my guess is right that you have filesystem corruption, git may
> not even know about it. It wrote the file, and the OS said "OK,
> success", not knowing it had been partially corrupted.

ok, I see...
>
> And if that guess is right, it also means there's no git bug to fix.
> SIGBUS is the natural way for the OS to tell the process that mmap'd
> data isn't available.

doh... then forget about this...

Thanks for the enlightments! :)

Eric
>
> -Peff
>

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html