Bug report: Duplicate CRLF rewrite warnings on commit

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Bug report: Duplicate CRLF rewrite warnings on commit

Adam Dinwoodie
If you use .gitattributes to enable CRLF->LF rewriting, then commit a
file that would have its line endings rewritten, the "CRLF will be
replaced by LF" warning is printed several times over; I'd expect it to
be printed only once.

There's a test case in t0020 -- "safecrlf: print warning only once" --
that checks the warning is only printed once when using `git add`, but
nothing that seems to check the same thing on `git commit`.  The
unnecessary multiple warnings seem to have existed since at least v1.6.0
(the warnings appear to have been added in v1.5.5 in 21e5ad5, but I
couldn't get that to build on my current box), and I'm seeing them on my
Cygwin box's build of the current next branch (d10caa2) and on my CentOS
box's v2.8.1 release.

Example:

    $ git init
    Initialized empty Git repository in /home/Adam/test/.git/

    $ echo '* text' >.gitattributes

    $ echo 'lf-terminated line' >text

    $ git add .gitattributes text && git commit -m 'Initial commit'
    [master (root-commit) 9a18d39] Initial commit
     2 files changed, 2 insertions(+)
     create mode 100644 .gitattributes
     create mode 100644 text

    $ echo 'crlf-terminated line' | unix2dos >text

    $ git add text  # Single CRLF warning as expected
    warning: CRLF will be replaced by LF in text.
    The file will have its original line endings in your working directory.

    $ git commit -m 'CRLF'  # Should have one CRLF warning, actually get two
    warning: CRLF will be replaced by LF in text.
    The file will have its original line endings in your working directory.
    [master 4a8b1cb] CRLF
    warning: CRLF will be replaced by LF in text.
    The file will have its original line endings in your working directory.
     1 file changed, 1 insertion(+), 1 deletion(-)

(Tangentially: what's the accepted practice for submitting failing test
scripts?  I've written a short test case to add to t0020 that shows this
bugged behaviour, but I've got the vague impression from past emails
that leading with the patch email adding the failing test case is not
the expected way to do things on this list...)

Cheers,

Adam
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Bug report: Duplicate CRLF rewrite warnings on commit

Junio C Hamano
Adam Dinwoodie <[hidden email]> writes:

> If you use .gitattributes to enable CRLF->LF rewriting, then commit a
> file that would have its line endings rewritten, the "CRLF will be
> replaced by LF" warning is printed several times over; I'd expect it to
> be printed only once.
>
> There's a test case in t0020 -- "safecrlf: print warning only once" --
> that checks the warning is only printed once when using `git add`, but
> nothing that seems to check the same thing on `git commit`.  The
> unnecessary multiple warnings seem to have existed since at least v1.6.0
> (the warnings appear to have been added in v1.5.5 in 21e5ad5, but I
> couldn't get that to build on my current box), and I'm seeing them on my
> Cygwin box's build of the current next branch (d10caa2) and on my CentOS
> box's v2.8.1 release.

Torsten, I know you are not heavily invested in "commit" codepath,
but since you've been actively touching the CRLF area, I wonder
perhaps you might be interested in taking a look?

>
> Example:
>
>     $ git init
>     Initialized empty Git repository in /home/Adam/test/.git/
>
>     $ echo '* text' >.gitattributes
>
>     $ echo 'lf-terminated line' >text
>
>     $ git add .gitattributes text && git commit -m 'Initial commit'
>     [master (root-commit) 9a18d39] Initial commit
>      2 files changed, 2 insertions(+)
>      create mode 100644 .gitattributes
>      create mode 100644 text
>
>     $ echo 'crlf-terminated line' | unix2dos >text
>
>     $ git add text  # Single CRLF warning as expected
>     warning: CRLF will be replaced by LF in text.
>     The file will have its original line endings in your working directory.
>
>     $ git commit -m 'CRLF'  # Should have one CRLF warning, actually get two
>     warning: CRLF will be replaced by LF in text.
>     The file will have its original line endings in your working directory.
>     [master 4a8b1cb] CRLF
>     warning: CRLF will be replaced by LF in text.
>     The file will have its original line endings in your working directory.
>      1 file changed, 1 insertion(+), 1 deletion(-)
>
> (Tangentially: what's the accepted practice for submitting failing test
> scripts?  I've written a short test case to add to t0020 that shows this
> bugged behaviour, but I've got the vague impression from past emails
> that leading with the patch email adding the failing test case is not
> the expected way to do things on this list...)
>
> Cheers,
>
> Adam
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Bug report: Duplicate CRLF rewrite warnings on commit

Jeff King
In reply to this post by Adam Dinwoodie
On Fri, May 13, 2016 at 02:49:53PM +0100, Adam Dinwoodie wrote:

> (Tangentially: what's the accepted practice for submitting failing test
> scripts?  I've written a short test case to add to t0020 that shows this
> bugged behaviour, but I've got the vague impression from past emails
> that leading with the patch email adding the failing test case is not
> the expected way to do things on this list...)

We don't want commits that fail the test suite, since it makes bisection
more difficult. But you can mark known bugs like:

   test_expect_failure 'git-foo should output bar' '
        ...
   '

I think it's OK to submit a patch like that. Hopefully somebody picks
that up and combines it with their fix patch, but if not, then it at
least documents the failure for later generations.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Bug report: Duplicate CRLF rewrite warnings on commit

Junio C Hamano
Jeff King <[hidden email]> writes:

> On Fri, May 13, 2016 at 02:49:53PM +0100, Adam Dinwoodie wrote:
>
>> (Tangentially: what's the accepted practice for submitting failing test
>> scripts?  I've written a short test case to add to t0020 that shows this
>> bugged behaviour, but I've got the vague impression from past emails
>> that leading with the patch email adding the failing test case is not
>> the expected way to do things on this list...)
>
> We don't want commits that fail the test suite, since it makes bisection
> more difficult. But you can mark known bugs like:
>
>    test_expect_failure 'git-foo should output bar' '
> ...
>    '
>
> I think it's OK to submit a patch like that. Hopefully somebody picks
> that up and combines it with their fix patch, but if not, then it at
> least documents the failure for later generations.

... but we do not want to overdo this.

An expect_failure helps when somebody is actually starting to work
on a breakage by setting a goal, but otherwise nobody would look at
them.  I do not think we know if 120+ instances of expect_failure in
t/ we already have are still all valid; some of them even might be
requesting a behaviour that we decided is a wrong expectation later.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Bug report: Duplicate CRLF rewrite warnings on commit

Jeff King
On Fri, May 13, 2016 at 12:46:57PM -0700, Junio C Hamano wrote:

> > We don't want commits that fail the test suite, since it makes bisection
> > more difficult. But you can mark known bugs like:
> >
> >    test_expect_failure 'git-foo should output bar' '
> > ...
> >    '
> >
> > I think it's OK to submit a patch like that. Hopefully somebody picks
> > that up and combines it with their fix patch, but if not, then it at
> > least documents the failure for later generations.
>
> ... but we do not want to overdo this.
>
> An expect_failure helps when somebody is actually starting to work
> on a breakage by setting a goal, but otherwise nobody would look at
> them.  I do not think we know if 120+ instances of expect_failure in
> t/ we already have are still all valid; some of them even might be
> requesting a behaviour that we decided is a wrong expectation later.

True. I almost wrote more on this, but held back. But since you seem
interested... :)

I think that it's a really nice thing to write up your test as
test_expect_failure and post it to the list as a patch that is ready to
be applied, _even if we don't apply it_. Because that format makes it
super-easy for somebody who does want to work on it to just "git am"
your patch and focus on fixing it.

And your patch can either go at the front of their series, or they
can squash it in and acknowledge you in the commit message. But either
way it reduces the amount of work they have to do (whether they work on
the bug soon, or much later).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Bug report: Duplicate CRLF rewrite warnings on commit

Torsten Bögershausen-2
In reply to this post by Junio C Hamano
On 13.05.16 18:43, Junio C Hamano wrote:

> Adam Dinwoodie <[hidden email]> writes:
>
>> If you use .gitattributes to enable CRLF->LF rewriting, then commit a
>> file that would have its line endings rewritten, the "CRLF will be
>> replaced by LF" warning is printed several times over; I'd expect it to
>> be printed only once.
>>
>> There's a test case in t0020 -- "safecrlf: print warning only once" --
>> that checks the warning is only printed once when using `git add`, but
>> nothing that seems to check the same thing on `git commit`.  The
>> unnecessary multiple warnings seem to have existed since at least v1.6.0
>> (the warnings appear to have been added in v1.5.5 in 21e5ad5, but I
>> couldn't get that to build on my current box), and I'm seeing them on my
>> Cygwin box's build of the current next branch (d10caa2) and on my CentOS
>> box's v2.8.1 release.
>
> Torsten, I know you are not heavily invested in "commit" codepath,
> but since you've been actively touching the CRLF area, I wonder
> perhaps you might be interested in taking a look?
>
I have seen this double warning as well, but never digged further,
as it is most probably in the code outside convert.c

I can put it on the TODO list, and I think it would make sense to add
a TC in t0020, which is marked as test_expect_failure.
 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Bug report: Duplicate CRLF rewrite warnings on commit

Adam Dinwoodie
On Sat, May 14, 2016 at 07:40:11AM +0200, Torsten Bögershausen wrote:

> On 13.05.16 18:43, Junio C Hamano wrote:
> > Adam Dinwoodie <[hidden email]> writes:
> >
> >> If you use .gitattributes to enable CRLF->LF rewriting, then commit a
> >> file that would have its line endings rewritten, the "CRLF will be
> >> replaced by LF" warning is printed several times over; I'd expect it to
> >> be printed only once.
> >>
> >> There's a test case in t0020 -- "safecrlf: print warning only once" --
> >> that checks the warning is only printed once when using `git add`, but
> >> nothing that seems to check the same thing on `git commit`.  The
> >> unnecessary multiple warnings seem to have existed since at least v1.6.0
> >> (the warnings appear to have been added in v1.5.5 in 21e5ad5, but I
> >> couldn't get that to build on my current box), and I'm seeing them on my
> >> Cygwin box's build of the current next branch (d10caa2) and on my CentOS
> >> box's v2.8.1 release.
> >
> > Torsten, I know you are not heavily invested in "commit" codepath,
> > but since you've been actively touching the CRLF area, I wonder
> > perhaps you might be interested in taking a look?
> >
> I have seen this double warning as well, but never digged further,
> as it is most probably in the code outside convert.c
>
> I can put it on the TODO list, and I think it would make sense to add
> a TC in t0020, which is marked as test_expect_failure.

Grand.  I'll submit the test patch now.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH/RFC v1 1/1] No duplicate CRLF rewrite warnings on commit

Torsten Bögershausen-2
In reply to this post by Adam Dinwoodie
From: Torsten Bögershausen <[hidden email]>

If .gitattributes are used to enable CRLF->LF rewriting,
then commiting a file that would have its line endings rewritten,
the "CRLF will be replaced by LF" warning is printed 2 times.
A user expects it to be printed only once.
The automatic rename detection by Git runs the conversion twice,
suppress the warning in the second run.

Reported-By: Adam Dinwoodie <[hidden email]>
Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 diff.c           |  2 ++
 diffcore-break.c |  6 ++++--
 diffcore.h       |  1 +
 t/t0020-crlf.sh  | 14 ++++++++++++++
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index d3734d3..c965afc 100644
--- a/diff.c
+++ b/diff.c
@@ -2749,6 +2749,8 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
  enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL
     ? SAFE_CRLF_WARN
     : safe_crlf);
+ if (flags & CHECK_IGNORE_CRLF_WARNING)
+ crlf_warn = SAFE_CRLF_FALSE;
 
  if (!DIFF_FILE_VALID(s))
  die("internal error: asking to populate invalid file.");
diff --git a/diffcore-break.c b/diffcore-break.c
index 5473493..4a75681 100644
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -47,6 +47,7 @@ static int should_break(struct diff_filespec *src,
  */
  unsigned long delta_size, max_size;
  unsigned long src_copied, literal_added, src_removed;
+ unsigned flags2 = 0;
 
  *merge_score_p = 0; /* assume no deletion --- "do not break"
      * is the default.
@@ -61,9 +62,10 @@ static int should_break(struct diff_filespec *src,
     !hashcmp(src->sha1, dst->sha1))
  return 0; /* they are the same */
 
- if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0))
+ if (!strcmp(src->path, dst->path))
+ flags2 = CHECK_IGNORE_CRLF_WARNING;
+ if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, flags2))
  return 0; /* error but caught downstream */
-
  max_size = ((src->size > dst->size) ? src->size : dst->size);
  if (max_size < MINIMUM_BREAK_SIZE)
  return 0; /* we do not break too small filepair */
diff --git a/diffcore.h b/diffcore.h
index 33ea2de..91464aa 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -57,6 +57,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 
 #define CHECK_SIZE_ONLY 1
 #define CHECK_BINARY    2
+#define CHECK_IGNORE_CRLF_WARNING    4
 extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
index f94120a..f7d2f74 100755
--- a/t/t0020-crlf.sh
+++ b/t/t0020-crlf.sh
@@ -86,6 +86,20 @@ test_expect_success 'safecrlf: print warning only once' '
  test $(git add doublewarn 2>&1 | grep "CRLF will be replaced by LF" | wc -l) = 1
 '
 
+test_expect_success 'safecrlf: print warning only once on commit' '
+
+ git config core.autocrlf input &&
+ git config core.safecrlf warn &&
+
+ for w in I am all LF; do echo $w; done >doublewarn2 &&
+ git add doublewarn2 &&
+ git commit -m "nowarn" &&
+ for w in Oh here is CRLFQ in text; do echo $w; done | q_to_cr >doublewarn2 &&
+ git add doublewarn2 2>&1 &&
+ git commit -m Message 2>&1 | grep "CRLF will be replaced by LF" >actual &&
+ echo "warning: CRLF will be replaced by LF in doublewarn2." >expected &&
+ test_cmp expected actual
+'
 
 test_expect_success 'safecrlf: git diff demotes safecrlf=true to warn' '
  git config core.autocrlf input &&
--
2.0.0.rc1.6318.g0c2c796

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH/RFC v1 0/1] Quickfix ?No duplicate CRLF rewrite warnings on commit

Torsten Bögershausen-2
In reply to this post by Adam Dinwoodie
From: Torsten Bögershausen <[hidden email]>

It may be that this patch only covers over a sympton, rather
than fixing the root cause.

Torsten Bögershausen (1):
  No duplicate CRLF rewrite warnings on commit

 diff.c           |  2 ++
 diffcore-break.c |  6 ++++--
 diffcore.h       |  1 +
 t/t0020-crlf.sh  | 14 ++++++++++++++
 4 files changed, 21 insertions(+), 2 deletions(-)

--
2.0.0.rc1.6318.g0c2c796

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH/RFC v1 1/1] No duplicate CRLF rewrite warnings on commit

Eric Sunshine
In reply to this post by Torsten Bögershausen-2
On Sun, May 15, 2016 at 2:08 AM,  <[hidden email]> wrote:

> If .gitattributes are used to enable CRLF->LF rewriting,
> then commiting a file that would have its line endings rewritten,
> the "CRLF will be replaced by LF" warning is printed 2 times.
> A user expects it to be printed only once.
> The automatic rename detection by Git runs the conversion twice,
> suppress the warning in the second run.
>
> Reported-By: Adam Dinwoodie <[hidden email]>
> Signed-off-by: Torsten Bögershausen <[hidden email]>
> ---
> diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
> @@ -86,6 +86,20 @@ test_expect_success 'safecrlf: print warning only once' '
> +test_expect_success 'safecrlf: print warning only once on commit' '
> +
> +       git config core.autocrlf input &&
> +       git config core.safecrlf warn &&
> +
> +       for w in I am all LF; do echo $w; done >doublewarn2 &&

I would typically say something about how you could instead use:

    test_write_lines I am all LF >doublewarn2 &&

but since you're just mimicking existing style in this script, I won't
mention it.

> +       git add doublewarn2 &&
> +       git commit -m "nowarn" &&
> +       for w in Oh here is CRLFQ in text; do echo $w; done | q_to_cr >doublewarn2 &&

Likewise; note my silence.

> +       git add doublewarn2 2>&1 &&
> +       git commit -m Message 2>&1 | grep "CRLF will be replaced by LF" >actual &&
> +       echo "warning: CRLF will be replaced by LF in doublewarn2." >expected &&
> +       test_cmp expected actual
> +'
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 0/3] CRLF-Handling: bug fix around ce_compare_data()

Torsten Bögershausen-2
In reply to this post by Adam Dinwoodie
From: Torsten Bögershausen <[hidden email]>

Break up the old 10/10 series about CLRF handling into smaller
series.

1/10..4/10 are now found in tb/core-eol-fix

This series includes 3 from the 10/10 series:
09/10 t6038; use crlf on all platforms                    #now 1/3
05/10 read-cache: factor out get_sha1_from_index() helper #now 2/3
10/10 Fix for ce_compare_data(), done right.              #now 3/3

The most important patch is 3/3

The reminding 3 patches,
stream-and-early-out,
convert-unify-the-auto-handling-of-CRLF
more-safer-crlf-handling-with-text-attr
Need to be re-done, re-sorted and re-written, thanks for all the comments.

Torsten Bögershausen (3):
  t6038; use crlf on all platforms
  read-cache: factor out get_sha1_from_index() helper
  convert: ce_compare_data() checks for a sha1 of a path

 cache.h                    |  4 ++++
 convert.c                  | 33 ++++++++++++++++++++++-----------
 convert.h                  | 23 +++++++++++++++++++----
 read-cache.c               | 33 +++++++++++++++++++++------------
 sha1_file.c                | 17 +++++++++++++----
 t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
 6 files changed, 90 insertions(+), 57 deletions(-)

--
2.0.0.rc1.6318.g0c2c796

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 1/3] t6038; use crlf on all platforms

Torsten Bögershausen-2
In reply to this post by Adam Dinwoodie
From: Torsten Bögershausen <[hidden email]>

t6038 uses different code, dependig if NATIVE_CRLF is set ot not.
When the native line endings are LF, merge.renormalize is not tested very well.
Change the test to always use CRLF by setting core.eol=crlf.
After doing so, the test fails:

rm '.gitattributes'
rm 'control_file'
rm 'file'
rm 'inert_file'
HEAD is now at 0d9ffb6 add line from b
error: addinfo_cache failed for path 'file'
file: unmerged (cbd69ec7cd12dd0989e853923867d94c8519aa52)
file: unmerged (ad55e240aeb42e0d9a0e18d6d8b02dd82ee3e527)
file: unmerged (99b633103c15c20cebebf821133ab526b0ff90b2)
fatal: git write-tree failed to write a tree
Merging:
0d9ffb6 add line from b
virtual a
found 1 common ancestor:
1c56df1 Initial
Auto-merging file
not ok 4 - Merge addition of text=auto

This will be addressed in the next commit.
---
 t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 85c10b0..4dc8c1a 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -18,6 +18,7 @@ test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
 
 test_expect_success setup '
  git config core.autocrlf false &&
+ git config core.eol crlf &&
 
  echo first line | append_cr >file &&
  echo first line >control_file &&
@@ -72,10 +73,8 @@ test_expect_success 'Merge after setting text=auto' '
  same line
  EOF
 
- if test_have_prereq NATIVE_CRLF; then
- append_cr <expected >expected.temp &&
- mv expected.temp expected
- fi &&
+ append_cr <expected >expected.temp &&
+ mv expected.temp expected &&
  git config merge.renormalize true &&
  git rm -fr . &&
  rm -f .gitattributes &&
@@ -90,10 +89,8 @@ test_expect_success 'Merge addition of text=auto' '
  same line
  EOF
 
- if test_have_prereq NATIVE_CRLF; then
- append_cr <expected >expected.temp &&
- mv expected.temp expected
- fi &&
+ append_cr <expected >expected.temp &&
+ mv expected.temp expected &&
  git config merge.renormalize true &&
  git rm -fr . &&
  rm -f .gitattributes &&
@@ -104,15 +101,9 @@ test_expect_success 'Merge addition of text=auto' '
 
 test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
  echo "<<<<<<<" >expected &&
- if test_have_prereq NATIVE_CRLF; then
- echo first line | append_cr >>expected &&
- echo same line | append_cr >>expected &&
- echo ======= | append_cr >>expected
- else
- echo first line >>expected &&
- echo same line >>expected &&
- echo ======= >>expected
- fi &&
+ echo first line | append_cr >>expected &&
+ echo same line | append_cr >>expected &&
+ echo ======= | append_cr >>expected &&
  echo first line | append_cr >>expected &&
  echo same line | append_cr >>expected &&
  echo ">>>>>>>" >>expected &&
@@ -128,15 +119,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
  echo "<<<<<<<" >expected &&
  echo first line | append_cr >>expected &&
  echo same line | append_cr >>expected &&
- if test_have_prereq NATIVE_CRLF; then
- echo ======= | append_cr >>expected &&
- echo first line | append_cr >>expected &&
- echo same line | append_cr >>expected
- else
- echo ======= >>expected &&
- echo first line >>expected &&
- echo same line >>expected
- fi &&
+ echo ======= | append_cr >>expected &&
+ echo first line | append_cr >>expected &&
+ echo same line | append_cr >>expected &&
  echo ">>>>>>>" >>expected &&
  git config merge.renormalize false &&
  rm -f .gitattributes &&
--
2.0.0.rc1.6318.g0c2c796

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 2/3] read-cache: factor out get_sha1_from_index() helper

Torsten Bögershausen-2
In reply to this post by Adam Dinwoodie
From: Torsten Bögershausen <[hidden email]>

Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
  return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
- int pos, len;
+ const unsigned char *sha1;
  unsigned long sz;
  enum object_type type;
  void *data;
 
- len = strlen(path);
- pos = index_name_pos(istate, path, len);
+ sha1 = get_sha1_from_index(istate, path);
+ if (!sha1)
+ return NULL;
+ data = read_sha1_file(sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return NULL;
+ }
+ if (size)
+ *size = sz;
+ return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+ int pos = index_name_pos(istate, path, strlen(path));
  if (pos < 0) {
  /*
  * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
  }
  if (pos < 0)
  return NULL;
- data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
- if (!data || type != OBJ_BLOB) {
- free(data);
- return NULL;
- }
- if (size)
- *size = sz;
- return data;
+ return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
--
2.0.0.rc1.6318.g0c2c796

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path

Torsten Bögershausen-2
In reply to this post by Adam Dinwoodie
From: Torsten Bögershausen <[hidden email]>

To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.

Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.

While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 cache.h      |  1 +
 convert.c    | 33 ++++++++++++++++++++++-----------
 convert.h    | 23 +++++++++++++++++++----
 read-cache.c |  4 +++-
 sha1_file.c  | 17 +++++++++++++----
 5 files changed, 58 insertions(+), 20 deletions(-)

diff --git a/cache.h b/cache.h
index 15a2a10..a5136c0 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_CE_HAS_SHA1  4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/convert.c b/convert.c
index f524b8d..5a0cd03 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,29 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
  }
 }
 
-static int has_cr_in_index(const char *path)
+static int has_cr_in_index(const char *path, const unsigned char *sha1)
 {
  unsigned long sz;
  void *data;
  int has_cr;
-
- data = read_blob_data_from_cache(path, &sz);
- if (!data)
+ enum object_type type;
+ if (!sha1)
+ sha1 = get_sha1_from_cache(path);
+ if (!sha1)
+ return 0;
+ data = read_sha1_file(sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
  return 0;
+ }
+
  has_cr = memchr(data, '\r', sz) != NULL;
  free(data);
  return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+       const char *src, size_t len,
        struct strbuf *buf,
        enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
@@ -260,7 +268,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
  * If the file in the index has any CR in it, do not convert.
  * This is the new safer autocrlf handling.
  */
- if (has_cr_in_index(path))
+ if (has_cr_in_index(path, sha1))
  return 0;
  }
  }
@@ -852,8 +860,9 @@ const char *get_convert_attr_ascii(const char *path)
  return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+   const char *src, size_t len,
+   struct strbuf *dst, enum safe_crlf checksafe)
 {
  int ret = 0;
  const char *filter = NULL;
@@ -874,7 +883,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
  src = dst->buf;
  len = dst->len;
  }
- ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+ ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
  if (ret && dst) {
  src = dst->buf;
  len = dst->len;
@@ -882,7 +891,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
  return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+      const unsigned char *sha1,
+      int fd, struct strbuf *dst,
       enum safe_crlf checksafe)
 {
  struct conv_attrs ca;
@@ -894,7 +905,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
  if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
  die("%s: clean filter '%s' failed", path, ca.drv->name);
 
- crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+ crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
  ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
 extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
-  struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+  const char *src, size_t len,
+  struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe)
+{
+ return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
 extern int convert_to_working_tree(const char *path, const char *src,
    size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
 {
  return convert_to_git(path, NULL, 0, NULL, 0);
 }
+static inline int would_convert_to_git_ce_sha1(const char *path,
+       const unsigned char *sha1)
+{
+ return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
-     struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+     const unsigned char *sha1,
+     int fd, struct strbuf *dst,
      enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
 
diff --git a/read-cache.c b/read-cache.c
index a3ef967..0ebc237 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
  if (fd >= 0) {
  unsigned char sha1[20];
- if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+ unsigned flags = HASH_CE_HAS_SHA1;
+ memcpy(sha1, ce->sha1, sizeof(sha1));
+ if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
  match = hashcmp(sha1, ce->sha1);
  /* index_fd() closed the file descriptor already */
  }
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..dd013d5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
  int ret, re_allocated = 0;
  int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 
  if (!type)
  type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
  */
  if ((type == OBJ_BLOB) && path) {
  struct strbuf nbuf = STRBUF_INIT;
- if (convert_to_git(path, buf, size, &nbuf,
-   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+ if (convert_to_git_ce_sha1(path,
+   valid_sha1 ? sha1 : NULL,
+   buf, size, &nbuf,
+   write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
  buf = strbuf_detach(&nbuf, &size);
  re_allocated = 1;
  }
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
  int ret;
  const int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
  struct strbuf sbuf = STRBUF_INIT;
 
  assert(path);
  assert(would_convert_to_git_filter_fd(path));
 
- convert_to_git_filter_fd(path, fd, &sbuf,
+ convert_to_git_filter_fd(path,
+ valid_sha1 ? sha1 : NULL,
+ fd, &sbuf,
  write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
  if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
      enum object_type type, const char *path, unsigned flags)
 {
  int ret;
+ const unsigned char *sha1_ce;
+ sha1_ce = flags & HASH_CE_HAS_SHA1 ? sha1 : NULL;
 
  /*
  * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
  else if (!S_ISREG(st->st_mode))
  ret = index_pipe(sha1, fd, type, path, flags);
  else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
- (path && would_convert_to_git(path)))
+ (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
  ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
  flags);
  else
--
2.0.0.rc1.6318.g0c2c796

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v1 1/3] t6038; use crlf on all platforms

Eric Sunshine
In reply to this post by Torsten Bögershausen-2
On Sun, May 15, 2016 at 2:38 AM,  <[hidden email]> wrote:
> t6038 uses different code, dependig if NATIVE_CRLF is set ot not.

s/dependig/depending/
s/ot/or/

> When the native line endings are LF, merge.renormalize is not tested very well.
> Change the test to always use CRLF by setting core.eol=crlf.
> After doing so, the test fails:
> [...snip...]
> This will be addressed in the next commit.

Does this mean that the below tests now fail? If so, they should be
switched to use test_expect_failure here, and then swapped back to
test_expect_success in the patch which fixes the problem.

> ---
> diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
> index 85c10b0..4dc8c1a 100755
> --- a/t/t6038-merge-text-auto.sh
> +++ b/t/t6038-merge-text-auto.sh
> @@ -18,6 +18,7 @@ test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
>
>  test_expect_success setup '
>         git config core.autocrlf false &&
> +       git config core.eol crlf &&
>
>         echo first line | append_cr >file &&
>         echo first line >control_file &&
> @@ -72,10 +73,8 @@ test_expect_success 'Merge after setting text=auto' '
>         same line
>         EOF
>
> -       if test_have_prereq NATIVE_CRLF; then
> -               append_cr <expected >expected.temp &&
> -               mv expected.temp expected
> -       fi &&
> +       append_cr <expected >expected.temp &&
> +       mv expected.temp expected &&
>         git config merge.renormalize true &&
>         git rm -fr . &&
>         rm -f .gitattributes &&
> @@ -90,10 +89,8 @@ test_expect_success 'Merge addition of text=auto' '
>         same line
>         EOF
>
> -       if test_have_prereq NATIVE_CRLF; then
> -               append_cr <expected >expected.temp &&
> -               mv expected.temp expected
> -       fi &&
> +       append_cr <expected >expected.temp &&
> +       mv expected.temp expected &&
>         git config merge.renormalize true &&
>         git rm -fr . &&
>         rm -f .gitattributes &&
> @@ -104,15 +101,9 @@ test_expect_success 'Merge addition of text=auto' '
>
>  test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
>         echo "<<<<<<<" >expected &&
> -       if test_have_prereq NATIVE_CRLF; then
> -               echo first line | append_cr >>expected &&
> -               echo same line | append_cr >>expected &&
> -               echo ======= | append_cr >>expected
> -       else
> -               echo first line >>expected &&
> -               echo same line >>expected &&
> -               echo ======= >>expected
> -       fi &&
> +       echo first line | append_cr >>expected &&
> +       echo same line | append_cr >>expected &&
> +       echo ======= | append_cr >>expected &&
>         echo first line | append_cr >>expected &&
>         echo same line | append_cr >>expected &&
>         echo ">>>>>>>" >>expected &&
> @@ -128,15 +119,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
>         echo "<<<<<<<" >expected &&
>         echo first line | append_cr >>expected &&
>         echo same line | append_cr >>expected &&
> -       if test_have_prereq NATIVE_CRLF; then
> -               echo ======= | append_cr >>expected &&
> -               echo first line | append_cr >>expected &&
> -               echo same line | append_cr >>expected
> -       else
> -               echo ======= >>expected &&
> -               echo first line >>expected &&
> -               echo same line >>expected
> -       fi &&
> +       echo ======= | append_cr >>expected &&
> +       echo first line | append_cr >>expected &&
> +       echo same line | append_cr >>expected &&
>         echo ">>>>>>>" >>expected &&
>         git config merge.renormalize false &&
>         rm -f .gitattributes &&
> --
> 2.0.0.rc1.6318.g0c2c796
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v1 2/3] read-cache: factor out get_sha1_from_index() helper

Eric Sunshine
In reply to this post by Torsten Bögershausen-2
On Sun, May 15, 2016 at 2:38 AM,  <[hidden email]> wrote:

> Factor out the retrieval of the sha1 for a given path in
> read_blob_data_from_index() into the function get_sha1_from_index().
>
> This will be used in the next commit, when convert.c can do the
> analyze for "text=auto" without slurping the whole blob into memory
> at once.
>
> Add a wrapper definition get_sha1_from_cache().
>
> Signed-off-by: Torsten Bögershausen <[hidden email]>
> ---
> diff --git a/cache.h b/cache.h
> @@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
>  #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
> +#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))

s/\s+/ /
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path

Eric Sunshine
In reply to this post by Torsten Bögershausen-2
On Sun, May 15, 2016 at 2:38 AM,  <[hidden email]> wrote:

> To compare a file in working tree with the index, convert_to_git() is used,
> the the result is hashed and the hash value compared with ce->sha1.
>
> Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
> are converted or not: When a CRLF had been in the index before, CRLF in
> the working tree are not converted.
>
> While in a merge, a file name in the working tree has different blobs
> in the index with different hash values.
> Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
> the would_convert_crlf_at_commit() looks at the appropriate blob.
>
> Signed-off-by: Torsten Bögershausen <[hidden email]>
> ---
> diff --git a/convert.c b/convert.c
> @@ -217,21 +217,29 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
> -static int has_cr_in_index(const char *path)
> +static int has_cr_in_index(const char *path, const unsigned char *sha1)
>  {
>         unsigned long sz;
>         void *data;
>         int has_cr;
> -
> -       data = read_blob_data_from_cache(path, &sz);
> -       if (!data)
> +       enum object_type type;
> +       if (!sha1)
> +               sha1 = get_sha1_from_cache(path);
> +       if (!sha1)
> +               return 0;
> +       data = read_sha1_file(sha1, &type, &sz);
> +       if (!data || type != OBJ_BLOB) {
> +               free(data);
>                 return 0;
> +       }
> +
>         has_cr = memchr(data, '\r', sz) != NULL;
>         free(data);
>         return has_cr;
>  }

Possible rewrite which would make it harder to forget to free 'data':

    int has_cr = 0;
    ...
    data = read_sha1_file(...);
    if (data && type == OBJ_BLOB)
        has_cr = memchr(...);
    free(data);
    return has_cr;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v2 0/3] CRLF-Handling: bug fix around ce_compare_data()

Torsten Bögershausen-2
In reply to this post by Adam Dinwoodie
From: Torsten Bögershausen <[hidden email]>

Changes since v1:
  - Re-order the patches
  - t6038; use crlf on all platforms
    This did actually not break anything (without old 7/10)
    Adapt the commit message
  - ce_compare_data():
    Simplify the logic around free() in has_cr_in_index(),
    Thanks Eric Sunshine

Break up the old 10/10 series about CLRF handling into smaller
series.

1/10..4/10 are now found in tb/core-eol-fix

This series includes 3 from the 10/10 series:
05/10 read-cache: factor out get_sha1_from_index() helper #now 1/3
10/10 Fix for ce_compare_data(), done right.              #now 2/3
09/10 t6038; use crlf on all platforms                    #now 3/3

The most important patch is 2/3

The reminding 3 patches,
stream-and-early-out,
convert-unify-the-auto-handling-of-CRLF
more-safer-crlf-handling-with-text-attr
Need to be re-done, re-sorted and re-written, thanks for all the comments.

Torsten Bögershausen (3):
  read-cache: factor out get_sha1_from_index() helper
  convert: ce_compare_data() checks for a sha1 of a path
  t6038; use crlf on all platforms

 cache.h                    |  4 ++++
 convert.c                  | 34 ++++++++++++++++++++++------------
 convert.h                  | 23 +++++++++++++++++++----
 read-cache.c               | 33 +++++++++++++++++++++------------
 sha1_file.c                | 17 +++++++++++++----
 t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
 6 files changed, 90 insertions(+), 58 deletions(-)

--
2.0.0.rc1.6318.g0c2c796

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v2 1/3] read-cache: factor out get_sha1_from_index() helper

Torsten Bögershausen-2
In reply to this post by Adam Dinwoodie
From: Torsten Bögershausen <[hidden email]>

Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
  return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
- int pos, len;
+ const unsigned char *sha1;
  unsigned long sz;
  enum object_type type;
  void *data;
 
- len = strlen(path);
- pos = index_name_pos(istate, path, len);
+ sha1 = get_sha1_from_index(istate, path);
+ if (!sha1)
+ return NULL;
+ data = read_sha1_file(sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return NULL;
+ }
+ if (size)
+ *size = sz;
+ return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+ int pos = index_name_pos(istate, path, strlen(path));
  if (pos < 0) {
  /*
  * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
  }
  if (pos < 0)
  return NULL;
- data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
- if (!data || type != OBJ_BLOB) {
- free(data);
- return NULL;
- }
- if (size)
- *size = sz;
- return data;
+ return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
--
2.0.0.rc1.6318.g0c2c796

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v2 2/3] convert: ce_compare_data() checks for a sha1 of a path

Torsten Bögershausen-2
In reply to this post by Adam Dinwoodie
From: Torsten Bögershausen <[hidden email]>

To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.

Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.

While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 cache.h      |  1 +
 convert.c    | 34 ++++++++++++++++++++++------------
 convert.h    | 23 +++++++++++++++++++----
 read-cache.c |  4 +++-
 sha1_file.c  | 17 +++++++++++++----
 5 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index 15a2a10..a5136c0 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_CE_HAS_SHA1  4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/convert.c b/convert.c
index f524b8d..c972d14 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,28 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
  }
 }
 
-static int has_cr_in_index(const char *path)
+static int has_cr_in_index(const char *path, const unsigned char *sha1)
 {
  unsigned long sz;
  void *data;
- int has_cr;
-
- data = read_blob_data_from_cache(path, &sz);
+ int has_cr = 0;
+ enum object_type type;
+ if (!sha1)
+ sha1 = get_sha1_from_cache(path);
+ if (!sha1)
+ return 0;
+ data = read_sha1_file(sha1, &type, &sz);
  if (!data)
  return 0;
- has_cr = memchr(data, '\r', sz) != NULL;
+ if (type == OBJ_BLOB)
+ has_cr = memchr(data, '\r', sz) != NULL;
+
  free(data);
  return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+       const char *src, size_t len,
        struct strbuf *buf,
        enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
@@ -260,7 +267,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
  * If the file in the index has any CR in it, do not convert.
  * This is the new safer autocrlf handling.
  */
- if (has_cr_in_index(path))
+ if (has_cr_in_index(path, sha1))
  return 0;
  }
  }
@@ -852,8 +859,9 @@ const char *get_convert_attr_ascii(const char *path)
  return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+   const char *src, size_t len,
+   struct strbuf *dst, enum safe_crlf checksafe)
 {
  int ret = 0;
  const char *filter = NULL;
@@ -874,7 +882,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
  src = dst->buf;
  len = dst->len;
  }
- ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+ ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
  if (ret && dst) {
  src = dst->buf;
  len = dst->len;
@@ -882,7 +890,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
  return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+      const unsigned char *sha1,
+      int fd, struct strbuf *dst,
       enum safe_crlf checksafe)
 {
  struct conv_attrs ca;
@@ -894,7 +904,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
  if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
  die("%s: clean filter '%s' failed", path, ca.drv->name);
 
- crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+ crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
  ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
 extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
-  struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+  const char *src, size_t len,
+  struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe)
+{
+ return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
 extern int convert_to_working_tree(const char *path, const char *src,
    size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
 {
  return convert_to_git(path, NULL, 0, NULL, 0);
 }
+static inline int would_convert_to_git_ce_sha1(const char *path,
+       const unsigned char *sha1)
+{
+ return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
-     struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+     const unsigned char *sha1,
+     int fd, struct strbuf *dst,
      enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
 
diff --git a/read-cache.c b/read-cache.c
index a3ef967..0ebc237 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
  if (fd >= 0) {
  unsigned char sha1[20];
- if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+ unsigned flags = HASH_CE_HAS_SHA1;
+ memcpy(sha1, ce->sha1, sizeof(sha1));
+ if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
  match = hashcmp(sha1, ce->sha1);
  /* index_fd() closed the file descriptor already */
  }
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..dd013d5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
  int ret, re_allocated = 0;
  int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 
  if (!type)
  type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
  */
  if ((type == OBJ_BLOB) && path) {
  struct strbuf nbuf = STRBUF_INIT;
- if (convert_to_git(path, buf, size, &nbuf,
-   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+ if (convert_to_git_ce_sha1(path,
+   valid_sha1 ? sha1 : NULL,
+   buf, size, &nbuf,
+   write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
  buf = strbuf_detach(&nbuf, &size);
  re_allocated = 1;
  }
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
  int ret;
  const int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
  struct strbuf sbuf = STRBUF_INIT;
 
  assert(path);
  assert(would_convert_to_git_filter_fd(path));
 
- convert_to_git_filter_fd(path, fd, &sbuf,
+ convert_to_git_filter_fd(path,
+ valid_sha1 ? sha1 : NULL,
+ fd, &sbuf,
  write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
  if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
      enum object_type type, const char *path, unsigned flags)
 {
  int ret;
+ const unsigned char *sha1_ce;
+ sha1_ce = flags & HASH_CE_HAS_SHA1 ? sha1 : NULL;
 
  /*
  * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
  else if (!S_ISREG(st->st_mode))
  ret = index_pipe(sha1, fd, type, path, flags);
  else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
- (path && would_convert_to_git(path)))
+ (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
  ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
  flags);
  else
--
2.0.0.rc1.6318.g0c2c796

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
123