[PATCH] fast-import: do not truncate exported marks file

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] fast-import: do not truncate exported marks file

Felipe Contreras
Certain lines of the marks file might be corrupted (or the objects
missing due to a garbage collection), but that's no reason to truncate
the file and essentially destroy the rest of it.

Ideally missing objects should not cause a crash, we could just skip
them, but that's another patch.

Signed-off-by: Felipe Contreras <[hidden email]>
---
 fast-import.c          |  7 +++++--
 t/t9300-fast-import.sh | 15 +++++++++++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 9fc7093..a975c34 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -329,6 +329,7 @@ static const char *export_marks_file;
 static const char *import_marks_file;
 static int import_marks_file_from_stream;
 static int import_marks_file_ignore_missing;
+static int import_marks_file_done;
 static int relative_marks_paths;
 
 /* Our last blob */
@@ -1802,7 +1803,7 @@ static void dump_marks(void)
  static struct lock_file mark_lock;
  FILE *f;
 
- if (!export_marks_file)
+ if (!export_marks_file || (import_marks_file && !import_marks_file_done))
  return;
 
  if (hold_lock_file_for_update(&mark_lock, export_marks_file, 0) < 0) {
@@ -1835,7 +1836,7 @@ static void read_marks(void)
  if (f)
  ;
  else if (import_marks_file_ignore_missing && errno == ENOENT)
- return; /* Marks file does not exist */
+ goto done; /* Marks file does not exist */
  else
  die_errno("cannot read '%s'", import_marks_file);
  while (fgets(line, sizeof(line), f)) {
@@ -1865,6 +1866,8 @@ static void read_marks(void)
  insert_mark(mark, e);
  }
  fclose(f);
+done:
+ import_marks_file_done = 1;
 }
 
 
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 25bb60b..4bca35c 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2650,6 +2650,21 @@ test_expect_success 'R: ignore non-git options' '
  git fast-import <input
 '
 
+test_expect_success 'R: corrupt lines do not mess marks file' '
+ rm -f io.marks &&
+ blob=$(echo hi | git hash-object --stdin) &&
+ cat >expect <<-EOF &&
+ :3 0000000000000000000000000000000000000000
+ :1 $blob
+ :2 $blob
+ EOF
+ cp expect io.marks &&
+ test_must_fail git fast-import --import-marks=io.marks --export-marks=io.marks <<-\EOF &&
+
+ EOF
+ test_cmp expect io.marks
+'
+
 ##
 ## R: very large blobs
 ##
--
2.8.2.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fast-import: do not truncate exported marks file

Junio C Hamano
Felipe Contreras <[hidden email]> writes:

> Certain lines of the marks file might be corrupted (or the objects
> missing due to a garbage collection), but that's no reason to truncate
> the file and essentially destroy the rest of it.

Hmm, so the issue is:

 - we use die_nicely() that calls dump_marks() after writing a crash
   report as our die_routine().

 - when dump_marks() is called, and export_marks_file names an
   existing file, it tries to write marks in it.  If we let it go
   through, we would end up writing a new marks file based on an
   incomplete set of marks we have only half-read in the earlier
   step, which is bad, as the resulting one is incomplete, and the
   original one that this replaced may have been a good one.

Is that what this change addresses?

I am just wondering if a solution to preserve both files is more
desirable.

This change looks a bit over-eager to discard the dump die_nicely()
is trying to create in one scenario, and a bit less careful at the
same time in another scenario.

 - Even if we are reading from somewhere, export_marks_file can
   point at a completely new file that is different from
   import_marks file, in which case, we are not really losing any
   information by freshly creating a new marks file, no?

 - Even if we did not read from any existing marks file, if we are
   given export_marks_file that names an existing file, wouldn't we
   want to avoid corrupting it with a dump from this aborted run?

In other words, I understand the intent of "import-marks-file-done",
but I am not sure if that is so special a case.

If the patch were to tell dump_marks() if the caller is die_nicely()
or not, and stop it from overwriting an existing file, whether we
read from any import-marks file, I would agree with the change a lot
more strongly.  An obvious extension of that line of thought would
be to export to an alternative marks file, perhaps to
strbuf_addf("%s.crash", exports_marks_file), if exports_marks_file
already exists if we are called while dying.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fast-import: do not truncate exported marks file

Felipe Contreras
On Tue, May 17, 2016 at 5:22 PM, Junio C Hamano <[hidden email]> wrote:

> Felipe Contreras <[hidden email]> writes:
>
>> Certain lines of the marks file might be corrupted (or the objects
>> missing due to a garbage collection), but that's no reason to truncate
>> the file and essentially destroy the rest of it.
>
> Hmm, so the issue is:
>
>  - we use die_nicely() that calls dump_marks() after writing a crash
>    report as our die_routine().
>
>  - when dump_marks() is called, and export_marks_file names an
>    existing file, it tries to write marks in it.  If we let it go
>    through, we would end up writing a new marks file based on an
>    incomplete set of marks we have only half-read in the earlier
>    step, which is bad, as the resulting one is incomplete, and the
>    original one that this replaced may have been a good one.
>
> Is that what this change addresses?

Yes. As I said; the marks file gets truncated.

> I am just wondering if a solution to preserve both files is more
> desirable.
>
> This change looks a bit over-eager to discard the dump die_nicely()
> is trying to create in one scenario, and a bit less careful at the
> same time in another scenario.
>
>  - Even if we are reading from somewhere, export_marks_file can
>    point at a completely new file that is different from
>    import_marks file, in which case, we are not really losing any
>    information by freshly creating a new marks file, no?

Right, we are not losing any information, but we are not gaining much
either: it's a truncated version of the import marks.

>  - Even if we did not read from any existing marks file, if we are
>    given export_marks_file that names an existing file, wouldn't we
>    want to avoid corrupting it with a dump from this aborted run?

If we don't run from an existing marks file, this patch has no effect.

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fast-import: do not truncate exported marks file

Junio C Hamano
On Tue, May 17, 2016 at 8:31 PM, Felipe Contreras
<[hidden email]> wrote:

> On Tue, May 17, 2016 at 5:22 PM, Junio C Hamano <[hidden email]> wrote:
>> Felipe Contreras <[hidden email]> writes:
>>
>>  - Even if we are reading from somewhere, export_marks_file can
>>    point at a completely new file that is different from
>>    import_marks file, in which case, we are not really losing any
>>    information by freshly creating a new marks file, no?
>
> Right, we are not losing any information, but we are not gaining much
> either: it's a truncated version of the import marks.

Ah, if we finished reading from the marks file and then die elsewhere,
the new conditional that says "don't clobber if we haven't finished reading"
would not have any effect, right. We'll see the imported ones plus some
of the new ones.

And if we didn't read anything but have export defined, then we can overwrite
the existing marks file, but that is nothing new.

>>  - Even if we did not read from any existing marks file, if we are
>>    given export_marks_file that names an existing file, wouldn't we
>>    want to avoid corrupting it with a dump from this aborted run?
>
> If we don't run from an existing marks file, this patch has no effect.

Yes, that is exactly what I was wondering if we may want to improve
while at it.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fast-import: do not truncate exported marks file

Felipe Contreras
On Tue, May 17, 2016 at 10:59 PM, Junio C Hamano <[hidden email]> wrote:
> On Tue, May 17, 2016 at 8:31 PM, Felipe Contreras
> <[hidden email]> wrote:
>> On Tue, May 17, 2016 at 5:22 PM, Junio C Hamano <[hidden email]> wrote:

>>>  - Even if we did not read from any existing marks file, if we are
>>>    given export_marks_file that names an existing file, wouldn't we
>>>    want to avoid corrupting it with a dump from this aborted run?
>>
>> If we don't run from an existing marks file, this patch has no effect.
>
> Yes, that is exactly what I was wondering if we may want to improve
> while at it.

This doesn't make much sense. Corrupted from where? This patch is
tackling the issue where the imported marks file is "corrupted".

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] fast-import: do not truncate exported marks file

Junio C Hamano
Felipe Contreras <[hidden email]> writes:

> On Tue, May 17, 2016 at 10:59 PM, Junio C Hamano <[hidden email]> wrote:
>> On Tue, May 17, 2016 at 8:31 PM, Felipe Contreras
>> <[hidden email]> wrote:
>>> On Tue, May 17, 2016 at 5:22 PM, Junio C Hamano <[hidden email]> wrote:
>
>>>>  - Even if we did not read from any existing marks file, if we are
>>>>    given export_marks_file that names an existing file, wouldn't we
>>>>    want to avoid corrupting it with a dump from this aborted run?
>>>
>>> If we don't run from an existing marks file, this patch has no effect.
>>
>> Yes, that is exactly what I was wondering if we may want to improve
>> while at it.
>
> This doesn't make much sense. Corrupted from where? This patch is
> tackling the issue where the imported marks file is "corrupted".

s/corrupted/overwritten by garbage/ is what I really meant, but in
any case, I do not think there is any valid use case that relied on
the current behaviour that will be broken by this update.  I do not
think a script that used separate import and export marks files (and
used mv to rename the exported one to the import used by the next
round, only after seeing fast-import successfully exits) would be
harmed, as a non-zero status is still a non-exit status and it
wouldn't have moved a corrupt export file to the next import, for
example.  Other people and real users of fast-import might find
cases I couldn't think of that this change negatively affects, but
that's expected for any change and that's why we cook any changes in
'pu' and then 'next'.

So let's queue this as a strict improvement.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html