[PATCH 1/3] git reset --hard gives clean working tree

classic Classic list List threaded Threaded
120 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/3] git reset --hard gives clean working tree

Torsten Bögershausen-2
From: Torsten Bögershausen <[hidden email]>

We define the working tree file is clean if either:

  * the result of running convert_to_git() on the working tree
    contents matches what is in the index (because that would mean
    doing another "git add" on the path is a no-op); OR

  * the result of running convert_to_working_tree() on the content
    in the index matches what is in the working tree (because that
    would mean doing another "git checkout -f" on the path is a
    no-op).

Add an extra check in ce_compare_data() in read_cache.c, and adjust
the test cases in t0025:
When a file has CRLF in the index, and is checked out into the working tree,
but left unchabged, it is not normalized at the next commit.
Whenever the file is changed in the working tree, a line is added/deleted
or dos2unix is run, it may be normalized at the next commit,
depending on .gitattributes.

This patch is a result of a longer discussion on the mailing list,
how to fix the flaky t0025.

Suggested-by: Junio C Hamano <[hidden email]>
Signed-off-by: Torsten Bögershausen <[hidden email]>
---
This is my attempt to help to fix flaky t0025:
 - steal the code from Junio
 - Add test case, change the existing if needed.
 - Spice with some optimizations
The commit messages may be in the state "improvable".

 read-cache.c         | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 t/t0025-crlf-auto.sh | 57 +++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 394ce14..2948b8f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -156,17 +156,75 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
  ce_mark_uptodate(ce);
 }
 
+/*
+ * Compare the data in buf with the data in the file pointed by fd and
+ * return 0 if they are identical, and non-zero if they differ.
+ */
+static int compare_with_fd(const char *input, ssize_t len, int fd)
+{
+ for (;;) {
+ char buf[1024 * 16];
+ ssize_t chunk_len, read_len;
+
+ chunk_len = sizeof(buf) < len ? sizeof(buf) : len;
+ read_len = xread(fd, buf, chunk_len ? chunk_len : 1);
+
+ if (!read_len)
+ /* EOF on the working tree file */
+ return !len ? 0 : -1;
+
+ if (!len)
+ /* we expected there is nothing left */
+ return -1;
+
+ if (memcmp(buf, input, read_len))
+ return -1;
+ input += read_len;
+ len -= read_len;
+ }
+}
+
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 {
  int match = -1;
  int fd = open(ce->name, O_RDONLY);
 
+ /*
+ * Would another "git add" on the path change what is in the
+ * index for the path?
+ */
  if (fd >= 0) {
  unsigned char sha1[20];
  if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
  match = hashcmp(sha1, ce->sha1);
  /* index_fd() closed the file descriptor already */
  }
+ if (!match)
+ return match;
+
+ /*
+ * Would another "git checkout -f" out of the index change
+ * what is in the working tree file?
+ */
+ fd = open(ce->name, O_RDONLY);
+ if (fd >= 0) {
+ enum object_type type;
+ unsigned long size;
+ void *data = read_sha1_file(ce->sha1, &type, &size);
+
+ if (type == OBJ_BLOB) {
+ struct strbuf worktree = STRBUF_INIT;
+ if (convert_to_working_tree(ce->name, data, size,
+ &worktree)) {
+ free(data);
+ data = strbuf_detach(&worktree, &size);
+ }
+ if (!compare_with_fd(data, size, fd))
+ match = 0;
+ }
+ free(data);
+ close(fd);
+ }
  return match;
 }
 
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index c164b46..4a7e5c0 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -14,6 +14,8 @@ test_expect_success setup '
 
  for w in Hello world how are you; do echo $w; done >LFonly &&
  for w in I am very very fine thank you; do echo ${w}Q; done | q_to_cr >CRLFonly &&
+ cp CRLFonly CRLFonly1 &&
+ cp CRLFonly CRLFonly2 &&
  for w in Oh here is a QNUL byte how alarming; do echo ${w}; done | q_to_nul >LFwithNUL &&
  git add . &&
 
@@ -23,6 +25,28 @@ test_expect_success setup '
  CRLFonly=$(git rev-parse HEAD:CRLFonly) &&
  LFwithNUL=$(git rev-parse HEAD:LFwithNUL) &&
 
+ cp CRLFonly CRLFonly.orig &&
+ for w in I am very very fine thank YOU; do echo ${w}Q; done | q_to_cr >CRLFonly.changed &&
+ cat >expnorm  <<-EOF &&
+ MIQ
+ MamQ
+ MveryQ
+ MveryQ
+ MfineQ
+ MthankQ
+ MyouQ
+ PI
+ Pam
+ Pvery
+ Pvery
+ Pfine
+ Pthank
+ PYOU
+ EOF
+ cat >expunor  <<-EOF &&
+ MyouQ
+ PYOUQ
+ EOF
  echo happy.
 '
 
@@ -39,7 +63,7 @@ test_expect_success 'default settings cause no changes' '
  test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
-test_expect_success 'crlf=true causes a CRLF file to be normalized' '
+test_expect_success 'crlf=true causes an unchanged CRLF file not to be normalized' '
 
  # Backwards compatibility check
  rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
@@ -49,10 +73,10 @@ test_expect_success 'crlf=true causes a CRLF file to be normalized' '
  # Note, "normalized" means that git will normalize it if added
  has_cr CRLFonly &&
  CRLFonlydiff=$(git diff CRLFonly) &&
- test -n "$CRLFonlydiff"
+ test -z "$CRLFonlydiff"
 '
 
-test_expect_success 'text=true causes a CRLF file to be normalized' '
+test_expect_success 'text=true causes an unchanged CRLF file not to be normalized' '
 
  rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
  echo "CRLFonly text" > .gitattributes &&
@@ -61,7 +85,7 @@ test_expect_success 'text=true causes a CRLF file to be normalized' '
  # Note, "normalized" means that git will normalize it if added
  has_cr CRLFonly &&
  CRLFonlydiff=$(git diff CRLFonly) &&
- test -n "$CRLFonlydiff"
+ test -z "$CRLFonlydiff"
 '
 
 test_expect_success 'eol=crlf gives a normalized file CRLFs with autocrlf=false' '
@@ -114,7 +138,7 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
  test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
-test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
+test_expect_success 'text=auto, autocrlf=true  causes an unchanged CRLF file not to be normalized' '
 
  rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
  git config core.autocrlf true &&
@@ -126,7 +150,7 @@ test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
  LFonlydiff=$(git diff LFonly) &&
  CRLFonlydiff=$(git diff CRLFonly) &&
  LFwithNULdiff=$(git diff LFwithNUL) &&
- test -z "$LFonlydiff" -a -n "$CRLFonlydiff" -a -z "$LFwithNULdiff"
+ test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
 test_expect_success 'text=auto, autocrlf=true does not normalize binary files' '
@@ -152,4 +176,25 @@ test_expect_success 'eol=crlf _does_ normalize binary files' '
  test -z "$LFwithNULdiff"
 '
 
+test_expect_success 'crlf=true causes a changed CRLF file to be normalized' '
+ git config core.autocrlf false &&
+ echo "CRLFonly1 crlf" > .gitattributes &&
+ # Note, "normalized" means that git will normalize it if added
+ cp CRLFonly.changed CRLFonly1 &&
+ git add CRLFonly1 &&
+ git diff --cached CRLFonly1 |
+ tr "\015" Q | sed -ne "/^[+-][a-zA-Z]/p" | tr "+-" PM >actual1 &&
+ test_cmp expnorm actual1
+'
+
+test_expect_success 'autocrlf=true causes a changed CRLF file not to be normalized' '
+ git config core.autocrlf true &&
+ echo > .gitattributes &&
+ # Note, "normalized" means that git will normalize it if added
+ cp CRLFonly.changed CRLFonly2 &&
+ git add CRLFonly2 &&
+ git diff --cached CRLFonly2 |
+ tr "\015" Q  | sed -ne "/^[+-][a-zA-Z]/p" | tr "+-" PM >actual2 &&
+ test_cmp expunor actual2
+'
 test_done
--
2.7.0.303.g2c4f448.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] git reset --hard gives clean working tree

Junio C Hamano
[hidden email] writes:

> From: Torsten Bögershausen <[hidden email]>
>
> We define the working tree file is clean if either:
>
>   * the result of running convert_to_git() on the working tree
>     contents matches what is in the index (because that would mean
>     doing another "git add" on the path is a no-op); OR
>
>   * the result of running convert_to_working_tree() on the content
>     in the index matches what is in the working tree (because that
>     would mean doing another "git checkout -f" on the path is a
>     no-op).
>
> Add an extra check in ce_compare_data() in read_cache.c, and adjust
> the test cases in t0025:
> When a file has CRLF in the index, and is checked out into the working tree,
> but left unchabged, it is not normalized at the next commit.
> Whenever the file is changed in the working tree, a line is added/deleted
> or dos2unix is run, it may be normalized at the next commit,
> depending on .gitattributes.
>
> This patch is a result of a longer discussion on the mailing list,
> how to fix the flaky t0025.

Currently, the codepaths that want to stop if it would lose
information from the index and/or the working tree for the path run
an equivalent of "diff-files path" to see there is any difference.
This indeed is overly strict for a path that has contents in the
index that wouldn't have been created by "clean" conversion (I am
using this word to mean anything convert_to_git() does, not limited
to the "clean" filter).

And it is sensible to allow them to proceed if the contents in the
working tree file for the path match what would be created by
"smudge" conversion of the contents in the index.

But breaking "diff-files" is not the right way to do so.  Teaching
"Am I safe to proceed" callers that paths that do not pass
"diff-files" test may still be safe to work on is.

I did not continue the approach I illustrated because I realized and
finally convinced myself that touching ce_compare_data() is a wrong
solution--it breaks "diff-files".

Imagine if you have contents in the index that wouldn't have been
left by a "clean" conversion of what is in the working tree.  You
then run "git checkout -f".  Now the contents in the working tree
will still not convert back to what is in the index with another
"clean" conversion, but it should pass the "Am I safe to proceed"
check, namely, it matches what convert_to_worktree() would give.

But imagine further what would happen when you add an extra blank
line at the end of the file in the working tree (i.e. "echo >>file")
and then run "diff-files -p".

The illustration patch I gave broke "diff-files" in such a way that
before such an addition of an extra blank line, it would have said
"No changes".  And if you run "diff-files" after adding that extra
blank line, you will see whole bunch of changes, not just the extra
blank line at the end.

This is sufficient to convince me that the approach is broken.

The stolen code from me to do the reverse conversion and comparison
may still be reusable, but it should not go to ce_compare_data().
Such a path that has contents in the index that does not match what
"clean"-ing the working tree contents would give us _must_ remain
dirty and return DATA_CHANGED to ie_modified().

I think the right approach to solve this (assuming that this is
worth solving, which I am yet not sure) would go like this:

 * Allocate an extra bit in options passed to ie_match_stat().  When
   this bit is set, allow it to perform the "Smudge the contents in
   the index and compare it with the working tree contents" check
   and declare "You can proceed, nothing will be lost by touching
   the working tree", but never mark the cache entry up-to-date
   while doing so.

 * Pass the flag from callers that want this semantics,
   e.g. verify_uptodate() in unpack-trees.c, but not from
   "diff-files".

It probably is necessary to enable the new behaviour _only_ when a
new command line option explicitly asks for it, because

 - this is rather expensive;

 - this will have to be done for pretty-much all dirty paths, not
   just the ones that the additional reverse test would declare
   clean;

 - most normal people would not have such a broken data in the
   index;

 - those who has such a broken data would want to "fix" the data in
   the index as a tree-wide flag day correction event, after which
   they do not want to pay the penalty of doing this reverse
   comparison check all the time; and

 - requiring an explicit command line option is not a burden when
   the user _does_ want to do such a one-time "fix".



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] git reset --hard gives clean working tree

Torsten Bögershausen-2


On 11.02.16 19:49, Junio C Hamano wrote:

> [hidden email] writes:
>
>> From: Torsten Bögershausen <[hidden email]>
>>
>> We define the working tree file is clean if either:
>>
>>   * the result of running convert_to_git() on the working tree
>>     contents matches what is in the index (because that would mean
>>     doing another "git add" on the path is a no-op); OR
>>
>>   * the result of running convert_to_working_tree() on the content
>>     in the index matches what is in the working tree (because that
>>     would mean doing another "git checkout -f" on the path is a
>>     no-op).
>>
>> Add an extra check in ce_compare_data() in read_cache.c, and adjust
>> the test cases in t0025:
>> When a file has CRLF in the index, and is checked out into the working tree,
>> but left unchabged, it is not normalized at the next commit.
>> Whenever the file is changed in the working tree, a line is added/deleted
>> or dos2unix is run, it may be normalized at the next commit,
>> depending on .gitattributes.
>>
>> This patch is a result of a longer discussion on the mailing list,
>> how to fix the flaky t0025.
> Currently, the codepaths that want to stop if it would lose
> information from the index and/or the working tree for the path run
> an equivalent of "diff-files path" to see there is any difference.
> This indeed is overly strict for a path that has contents in the
> index that wouldn't have been created by "clean" conversion (I am
> using this word to mean anything convert_to_git() does, not limited
> to the "clean" filter).
>
> And it is sensible to allow them to proceed if the contents in the
> working tree file for the path match what would be created by
> "smudge" conversion of the contents in the index.
>
> But breaking "diff-files" is not the right way to do so.  Teaching
> "Am I safe to proceed" callers that paths that do not pass
> "diff-files" test may still be safe to work on is.
>
> I did not continue the approach I illustrated because I realized and
> finally convinced myself that touching ce_compare_data() is a wrong
> solution--it breaks "diff-files".
>
> Imagine if you have contents in the index that wouldn't have been
> left by a "clean" conversion of what is in the working tree.  You
> then run "git checkout -f".  Now the contents in the working tree
> will still not convert back to what is in the index with another
> "clean" conversion, but it should pass the "Am I safe to proceed"
> check, namely, it matches what convert_to_worktree() would give.
>
> But imagine further what would happen when you add an extra blank
> line at the end of the file in the working tree (i.e. "echo >>file")
> and then run "diff-files -p".
>
> The illustration patch I gave broke "diff-files" in such a way that
> before such an addition of an extra blank line, it would have said
> "No changes".  And if you run "diff-files" after adding that extra
> blank line, you will see whole bunch of changes, not just the extra
> blank line at the end.
>
> This is sufficient to convince me that the approach is broken.
[]
Would something like this make sense?
(The main part is in diff.c, the rest needs some polishing)

commit e494c31fd2f0f8a638ff14d1b8ae3f3a6d56a107
Author: Torsten Bögershausen <[hidden email]>
Date:   Sat Mar 5 07:51:08 2016 +0100

    Text files needs to be normalized before diffing
   
    Whenever a text file is stored with CRLF in the index, Git changes
    CRLF into LF at the next commit.
    (text file means the attribute "text" or "crlf" of "eol" is set).
   
    "git diff" reports that all lines with CRLF in the index as changed to LF.
    After cloning a repo, the work tree is not considered clean by Git, even if
    the user didn't change a single line.
   
    Avoid to report lines as changed by converting the content from the index into
    LF before running diff.

diff --git a/convert.c b/convert.c
index f524b8d..af8248d 100644
--- a/convert.c
+++ b/convert.c
@@ -231,9 +231,9 @@ static int has_cr_in_index(const char *path)
     return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
-               struct strbuf *buf,
-               enum crlf_action crlf_action, enum safe_crlf checksafe)
+static int crlf_to_git_internal(const char *path, const char *src, size_t len,
+                struct strbuf *buf,
+                enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
     struct text_stat stats;
     char *dst;
@@ -852,7 +852,17 @@ const char *get_convert_attr_ascii(const char *path)
     return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
+int convert_crlf_to_git(const char *path, const char *src, size_t len,
+            struct strbuf *buf,
+            enum safe_crlf checksafe)
+{
+    struct conv_attrs ca;
+    convert_attrs(&ca, path);
+    return crlf_to_git_internal(path, src, len, buf,
+                    ca.crlf_action, checksafe);
+
+}
+    int convert_to_git(const char *path, const char *src, size_t len,
                    struct strbuf *dst, enum safe_crlf checksafe)
 {
     int ret = 0;
@@ -874,7 +884,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
         src = dst->buf;
         len = dst->len;
     }
-    ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+    ret |= crlf_to_git_internal(path, src, len, dst, ca.crlf_action, checksafe);
     if (ret && dst) {
         src = dst->buf;
         len = dst->len;
@@ -894,7 +904,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
     if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
         die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-    crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+    crlf_to_git_internal(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
     ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
diff --git a/convert.h b/convert.h
index ccf436b..11eff02 100644
--- a/convert.h
+++ b/convert.h
@@ -52,6 +52,9 @@ extern void convert_to_git_filter_fd(const char *path, int fd,
                      struct strbuf *dst,
                      enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
+extern int convert_crlf_to_git(const char *path, const char *src, size_t len,
+                   struct strbuf *buf,
+                   enum safe_crlf checksafe);
 
 /*****************************************************************
  *
diff --git a/diff.c b/diff.c
index 059123c..8710a36 100644
--- a/diff.c
+++ b/diff.c
@@ -2730,6 +2730,7 @@ static int diff_populate_gitlink(struct diff_filespec *s, int size_only)
  */
 int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 {
+    struct strbuf buf = STRBUF_INIT;
     int size_only = flags & CHECK_SIZE_ONLY;
     int err = 0;
     /*
@@ -2756,7 +2757,6 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 
     if (!s->sha1_valid ||
         reuse_worktree_file(s->path, s->sha1, 0)) {
-        struct strbuf buf = STRBUF_INIT;
         struct stat st;
         int fd;
 
@@ -2826,6 +2826,12 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
         if (!s->data)
             die("unable to read %s", sha1_to_hex(s->sha1));
         s->should_free = 1;
+        if (convert_crlf_to_git(s->path, s->data, s->size,
+                    &buf, SAFE_CRLF_FALSE)) {
+            size_t size = 0;
+            s->data = strbuf_detach(&buf, &size);
+            s->size = size;
+        }
     }
     return 0;
 }
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index c164b46..dd1645b 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -39,7 +39,7 @@ test_expect_success 'default settings cause no changes' '
     test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
-test_expect_success 'crlf=true causes a CRLF file to be normalized' '
+test_expect_success 'crlf=true on a CRLF file git diff is empty' '
 
     # Backwards compatibility check
     rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
@@ -49,19 +49,18 @@ test_expect_success 'crlf=true causes a CRLF file to be normalized' '
     # Note, "normalized" means that git will normalize it if added
     has_cr CRLFonly &&
     CRLFonlydiff=$(git diff CRLFonly) &&
-    test -n "$CRLFonlydiff"
+    test -z "$CRLFonlydiff"
 '
 
-test_expect_success 'text=true causes a CRLF file to be normalized' '
+test_expect_success 'eol=crlf gives CRLF with no diff' '
 
     rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
-    echo "CRLFonly text" > .gitattributes &&
+    echo "CRLFonly text eol=crlf" > .gitattributes &&
     git read-tree --reset -u HEAD &&
-
-    # Note, "normalized" means that git will normalize it if added
-    has_cr CRLFonly &&
-    CRLFonlydiff=$(git diff CRLFonly) &&
-    test -n "$CRLFonlydiff"
+    >expect &&
+    git diff CRLFonly | tr "\015" Q >actual &&
+    test_cmp expect actual &&
+    has_cr CRLFonly
 '
 
 test_expect_success 'eol=crlf gives a normalized file CRLFs with autocrlf=false' '
@@ -114,7 +113,7 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
     test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
-test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
+test_expect_success 'text=auto, autocrlf=true git diff is empty' '
 
     rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
     git config core.autocrlf true &&
@@ -126,7 +125,7 @@ test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
     LFonlydiff=$(git diff LFonly) &&
     CRLFonlydiff=$(git diff CRLFonly) &&
     LFwithNULdiff=$(git diff LFwithNUL) &&
-    test -z "$LFonlydiff" -a -n "$CRLFonlydiff" -a -z "$LFwithNULdiff"
+    test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
 test_expect_success 'text=auto, autocrlf=true does not normalize binary files' '

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] git reset --hard gives clean working tree

Junio C Hamano
Torsten Bögershausen <[hidden email]> writes:

> On 11.02.16 19:49, Junio C Hamano wrote:
>
>> I did not continue the approach I illustrated because I realized and
>> finally convinced myself that touching ce_compare_data() is a wrong
>> solution--it breaks "diff-files".
>>
>> Imagine if you have contents in the index that wouldn't have been
>> left by a "clean" conversion of what is in the working tree.  You
>> then run "git checkout -f".  Now the contents in the working tree
>> will still not convert back to what is in the index with another
>> "clean" conversion, but it should pass the "Am I safe to proceed"
>> check, namely, it matches what convert_to_worktree() would give.
>>
>> But imagine further what would happen when you add an extra blank
>> line at the end of the file in the working tree (i.e. "echo >>file")
>> and then run "diff-files -p".
>>
>> The illustration patch I gave broke "diff-files" in such a way that
>> before such an addition of an extra blank line, it would have said
>> "No changes".  And if you run "diff-files" after adding that extra
>> blank line, you will see whole bunch of changes, not just the extra
>> blank line at the end.
>>
>> This is sufficient to convince me that the approach is broken.
> []
> Would something like this make sense?
> (The main part is in diff.c, the rest needs some polishing)
>
> commit e494c31fd2f0f8a638ff14d1b8ae3f3a6d56a107
> Author: Torsten Bögershausen <[hidden email]>
> Date:   Sat Mar 5 07:51:08 2016 +0100
>
>     Text files needs to be normalized before diffing
>    
>     Whenever a text file is stored with CRLF in the index, Git changes
>     CRLF into LF at the next commit.
>     (text file means the attribute "text" or "crlf" of "eol" is set).

I do not think I can endorse this approach, as I cannot explain why
it could possibly make sense to treat as if CRLF conversion is
somehow special among all the convert_to_git()/convert_to_worktree()
conversions, which your patch does to the diff code.

Comparisons between contents from two tree objects and comparisons
between contents from a tree object and the index must happen
without conversion, and comparisons between contents from the tree
in the HEAD commit and contents from the working tree must be done
in line with the HEAD vs index comparison to serve as a preview of
what would happen after adding the contents taken from the working
tree to the index, which means we should compare what is in the
index (without conversion) and the result of running the whole
convert_to_git() conversion on what is in the working tree.  It
feels fundamentally wrong to apply only CRLF conversion without any
other conversion, whether the direction of the conversion is from
worktree to git or the other way around, when comparing two things.

When the user has CRLF data in the index and the user tell the
attribute system so that the next "add" would result in "fixing" the
indexed lines to be terminated with LF, "diff-files" _should_ show
that correction as a change, I think.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] git reset --hard gives clean working tree

Torsten Bögershausen-2

> I do not think I can endorse this approach, as I cannot explain why
> it could possibly make sense to treat as if CRLF conversion is
> somehow special among all the convert_to_git()/convert_to_worktree()
> conversions, which your patch does to the diff code.
>
> Comparisons between contents from two tree objects and comparisons
> between contents from a tree object and the index must happen
> without conversion, and comparisons between contents from the tree
> in the HEAD commit and contents from the working tree must be done
> in line with the HEAD vs index comparison to serve as a preview of
> what would happen after adding the contents taken from the working
> tree to the index, which means we should compare what is in the
> index (without conversion) and the result of running the whole
> convert_to_git() conversion on what is in the working tree.  It
> feels fundamentally wrong to apply only CRLF conversion without any
> other conversion, whether the direction of the conversion is from
> worktree to git or the other way around, when comparing two things.
>
> When the user has CRLF data in the index and the user tell the
> attribute system so that the next "add" would result in "fixing" the
> indexed lines to be terminated with LF, "diff-files" _should_ show
> that correction as a change, I think.
Fair enough.
There are 2 users here:
User 1 commits files with CRLF into the index, and later decides
to set the "text eol=crlf" attribute on it, without normalizing the repo.

User 2 does a simple "git clone", which includes checkout.
Running "git diff" tells user 2, that his work tree is dirty.

My conclusion is, that we could suppress the normalization for text files,
(as we do it for core.autocrlf with the new safer CRLF handling)
meaning that "git diff" and "git status" is clean and that files stay with CRLF
in the index.
Does this make sense ?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] git reset --hard gives clean working tree

Junio C Hamano
Torsten Bögershausen <[hidden email]> writes:

>> When the user has CRLF data in the index and the user tell the
>> attribute system so that the next "add" would result in "fixing" the
>> indexed lines to be terminated with LF, "diff-files" _should_ show
>> that correction as a change, I think.
> Fair enough.
> There are 2 users here:
> User 1 commits files with CRLF into the index, and later decides
> to set the "text eol=crlf" attribute on it, without normalizing the repo.
>
> User 2 does a simple "git clone", which includes checkout.
> Running "git diff" tells user 2, that his work tree is dirty.
>
> My conclusion is, that we could suppress the normalization for text files,
> (as we do it for core.autocrlf with the new safer CRLF handling)
> meaning that "git diff" and "git status" is clean and that files stay with CRLF
> in the index.
> Does this make sense ?

Your example is for these two users to have conflicting settings on
the line ending, but if user 1 commits files in latin-1 to a project
where in-project data is expected to be UTF-8 and working tree files
can be in latin-1, with necessary conversion done via clean/smudge
filter, the user 2 would see a very similar symptom, wouldn't s/he?

So I am not sure how your example supports a hack that treats CRLF
conversion as something special among other conversions, without
doing anything about clean/smudge filter.

Besides, it is OK if your status and diff says your worktree is
dirty immediately after cloning in such a broken situation, I would
think.  In fact, it may even be preferable to do so, in order to
indicate that there is something unusual going on.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] git reset --hard gives clean working tree

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> Besides, it is OK if your status and diff says your worktree is
> dirty immediately after cloning in such a broken situation, I would
> think.  In fact, it may even be preferable to do so, in order to
> indicate that there is something unusual going on.

The above needs a bit of clarifying follow-up.

Some operations (e.g. "apply --index" and "checkout another-branch")
want to make sure that the path in the working tree "matches" what
is in the index before proceeding.  The reason why they require a
match is because they are going to update what is in the index and
then update what is in the working tree to match the result by
checking the updated index entry out to the working tree--if the
working tree and the index are different before they start their
operation, that means you have some changes you made in the working
tree since you checked it out of the index, and their operation will
lose such changes.

Normally, this verification is done by ce_match_stat() and friends,
whose correct operation relies on an earlier refresh_index(), which
in turn makes sure that the result of running the contents in the
working tree through convert_to_git() matches what is in the index.

When your convert_to_working_tree() and convert_to_git() do not
round-trip, however, the result of convert_to_git() on the working
tree contents would not match what is in the index.  That is
inconvenient, and it is something you may want to relax to help such
a broken situation.  Immediately after you "git checkout" (or "git
reset --hard"), you haven't made any changes, and you should be able
to "git checkout another" to go to another branch.

For this reason, I am perfectly OK with an approach to teach the
callers that currently use ce_uptodate() as the only way to make
sure that there is no modification to a given path (and refuse to
work on it if ce_uptodate() says the path is modified) that it is
also OK to clobber a path that does not pass the ce_uptodate() check
as long as the result of running convert_to_working_tree() on the
indexed contents matches what is in the working tree.  These callers
are currently overly strict and you will be relaxing their overly
strict check to help this broken situation.

Perhaps we can introduce a new function can_clobber() that has the
same function signature as ce_uptodate() and update the callers in
apply and unpack-trees (there may be others) to call it instead when
they want to see if they can clobber the working tree file that
corresponds to the cache entry.

For implementing the can_clobber() function, you can use something
along the lines of compare_with_fd() helper function I introduced in
[1] and do something like this, perhaps.

When I send an illustration patch and say "totally untested", I
usually start from the real source file and send "git diff" output
after making changes to the source file, and I may even have at
least compiled the modified result.  The following however is typed
directly into my mail program without touching any existing source
file, so it is truly untested--caveat emptor.

/*
 * We are about to do some operation to the index entry, and
 * write the result out to the working tree.  Would we lose
 * some local change that exist only in the working tree by
 * doing so?  Return 1 if we can safely clobber the working
 * tree file (i.e. no changes) and return 0 if we can't (i.e.
 * there are some changes).
 */
int can_clobber(struct cache_entry *ce)
{
        int fd, match = 0;
        enum object_type type;
        unsigned long size;
        void *data;

        /*
         * Does another "git add -f" of the path result in the
         * identical blob in the index?  If so, the working tree
         * file is expendable.
         */
        if (ce_uptodate(ce))
        return 1;
        fd = open(ce->name, O_RDONLY);
        if (fd < 0)
                return 0;

        data = read_sha1_file(ce->sha1, &type, &size);
        if (type == OBJ_BLOB) {
                struct strbuf worktree = STRBUF_INIT;
                /*
                 * Does another "git checkout -- path"
                 * recreate what we see in the working tree?
                 * If so, the working tree file is expendable.
                 */
                if (convert_to_working_tree(ce->name, data, size,
                                            &worktree)) {
                        free(data);
                        data = strbuf_detach(&worktree, &size);
                }
                if (!compare_with_fd(data, size, fd))
                        match = 1;
        }
        free(data);
        close(fd);

        return match;
}


[Reference]

*1* http://thread.gmane.org/gmane.comp.version-control.git/284352/focus=285341
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] git reset --hard gives clean working tree

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> Perhaps we can introduce a new function can_clobber() that has the
> same function signature as ce_uptodate() and update the callers in
> apply and unpack-trees (there may be others) to call it instead when
> they want to see if they can clobber the working tree file that
> corresponds to the cache entry.

By the way, I do not want see ie_match_stat() modified to work like
the can_clobber() I outlined in the previous message, which in turn
means that immediately after "git reset --hard" or "git checkout"
when your convert_to_git() and convert_to_working_tree() do not
roundtrip, you _must_ see differences in "git diff".

This is for (at least) two reasons.

 * "git diff" (compare between the index and the working tree) is
   meant as a preview of how the indexed contents will be modified
   if you did "git add" with what you currently have in your working
   tree at the path.  In a "conversions do not roundtrip" situation,
   your "git add" will be modifying the contents in the index, so we
   should actively be showing what modification we will be making.

   One way of "fixing" the situation without changing either the
   working tree contents or the indexed contents is to fix your
   convert-to-git settings to make the conversions round-trip, and
   then you would stop seeing the changes you would make when you do
   "git add".  Not showing any diff when can_clobber() is true but
   ce_uptodate() is false would make "git diff" less useful when the
   user makes this correction.

 * "git add" of a path can legitimately optimize itself by not
   adding a path that is ce_uptodate().  Mixing ie_match_stat()
   with can_clobber() logic would mark such a "conversions do not
   roundtrip" path as ce_uptodate(), and prevent the user from
   "fixing" the incorrect index entry by running "git add".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] git reset --hard gives clean working tree

Torsten Bögershausen-2
On 03/07/2016 09:51 AM, Junio C Hamano wrote:

> Junio C Hamano <[hidden email]> writes:
>
>> Perhaps we can introduce a new function can_clobber() that has the
>> same function signature as ce_uptodate() and update the callers in
>> apply and unpack-trees (there may be others) to call it instead when
>> they want to see if they can clobber the working tree file that
>> corresponds to the cache entry.
> By the way, I do not want see ie_match_stat() modified to work like
> the can_clobber() I outlined in the previous message, which in turn
> means that immediately after "git reset --hard" or "git checkout"
> when your convert_to_git() and convert_to_working_tree() do not
> roundtrip, you _must_ see differences in "git diff".
>
> This is for (at least) two reasons.
>
>   * "git diff" (compare between the index and the working tree) is
>     meant as a preview of how the indexed contents will be modified
>     if you did "git add" with what you currently have in your working
>     tree at the path.  In a "conversions do not roundtrip" situation,
>     your "git add" will be modifying the contents in the index, so we
>     should actively be showing what modification we will be making.
>
>     One way of "fixing" the situation without changing either the
>     working tree contents or the indexed contents is to fix your
>     convert-to-git settings to make the conversions round-trip, and
>     then you would stop seeing the changes you would make when you do
>     "git add".  Not showing any diff when can_clobber() is true but
>     ce_uptodate() is false would make "git diff" less useful when the
>     user makes this correction.
>
>   * "git add" of a path can legitimately optimize itself by not
>     adding a path that is ce_uptodate().  Mixing ie_match_stat()
>     with can_clobber() logic would mark such a "conversions do not
>     roundtrip" path as ce_uptodate(), and prevent the user from
>     "fixing" the incorrect index entry by running "git add".
>
Thanks.
OK about "git add" and "git diff".
The major question, at least on my side, is where to hook in
"can_clobber()" ?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] git reset --hard gives clean working tree

Junio C Hamano
Torsten Bögershausen <[hidden email]> writes:

> The major question, at least on my side, is where to hook in
> "can_clobber()" ?

There are different ways the existing commands ensure that they do
not lose local modifications.

 * Some (like unpack-trees code that is used by "merge") do
   refresh_cache() upfront and then ask ce_uptodate() if the
   contents in the working tree match the indexed contents.
   unpack-trees.c::verify_uptodate_1() has a call to ce_uptodate()
   and returns early when true (i.e. if "git add" would result in
   the same index entry), but then does a double-check with
   ie_match_stat(), which essentially asks the "does an add result
   in the same index entry?" again.

 * Others (like apply) do not do the tree-wide refresh_cache(), but
   asks "does an add result in the same index entry" by calling
   ce_match_stat(), which is a thin wrapper to ie_match_stat(), in
   builtin/apply.c::verify_index_match().

These places need to learn that there is a case where
ie_match_stat() says "'git add' would change the index, i.e. working
tree file is different" but the working tree file can still be
clobbered because 'checkout' would produce the same contents in the
working tree.

But stepping back a bit, I no longer am sure if such a loosening is
desirable.  Imagine that you implemented such loosening and allowed
a patch to be applied to the index (and the result checked out to
the working tree).  What would the result of "diff --cached" be
after doing so?  Would it contain only the change described in the
patch you just accepted?  If that is the case, it would be OK, but
if the change from the patch gets mixed with the "fix incorrectly
recorded data in the index" change that you would have recorded if
you did "git add" from the working tree without applying the patch,
then that would not be a desirable outcome, I would suspect.  You
would rather want to first commit the "fix incorrectly recorded data
in the index" as a separate preparatory step and then want to apply
the patch.  So...





--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 1/7] Make it possible to get sha1 for a path from the index

Torsten Bögershausen-2
From: Torsten Bögershausen <[hidden email]>

Factor out the retrival of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the analyze
for "text=auto" without slurping the whole blob into memory at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index b829410..bd1210a 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1008,6 +1009,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
  return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
- int pos, len;
+ const unsigned char *sha1;
  unsigned long sz;
  enum object_type type;
  void *data;
 
- len = strlen(path);
- pos = index_name_pos(istate, path, len);
+ sha1 = get_sha1_from_index(istate, path);
+ if (!sha1)
+ return NULL;
+ data = read_sha1_file(sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return NULL;
+ }
+ if (size)
+ *size = sz;
+ return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+ int pos = index_name_pos(istate, path, strlen(path));
  if (pos < 0) {
  /*
  * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
  }
  if (pos < 0)
  return NULL;
- data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
- if (!data || type != OBJ_BLOB) {
- free(data);
- return NULL;
- }
- if (size)
- *size = sz;
- return data;
+ return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
--
2.8.0.rc2.6.g3847ccb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 2/7] convert.c: stream and early out

Torsten Bögershausen-2
In reply to this post by Junio C Hamano
From: Torsten Bögershausen <[hidden email]>

When statistics are done for the autocrlf handling, the search in
the content can be stopped, if e.g
- a search for binary is done, and a NUL character is found
- a search for CRLF is done, and the first CRLF is found.

Similar when statistics for binary vs non-binary are gathered:
Whenever a lone CR or NUL is found, the search can be aborted.

When checking out files in "auto" mode, any file that has a "lone CR"
or a CRLF will not be converted, so the search can be aborted early.

Add the new bit, CONVERT_STAT_BITS_ANY_CR,
which is set for either lone CR or CRLF.

Many binary files have a NUL very early (within the first few bytes,
latest within the first 1..2K).
It is often not necessary to load the whole content of a file or blob
into memory.

Use a streaming handling for blobs and files in the worktree.

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 convert.c | 162 ++++++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 106 insertions(+), 56 deletions(-)

diff --git a/convert.c b/convert.c
index f524b8d..b6da114 100644
--- a/convert.c
+++ b/convert.c
@@ -3,6 +3,7 @@
 #include "run-command.h"
 #include "quote.h"
 #include "sigchain.h"
+#include "streaming.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -13,10 +14,10 @@
  * translation when the "text" attribute or "auto_crlf" option is set.
  */
 
-/* Stat bits: When BIN is set, the txt bits are unset */
 #define CONVERT_STAT_BITS_TXT_LF    0x1
 #define CONVERT_STAT_BITS_TXT_CRLF  0x2
 #define CONVERT_STAT_BITS_BIN       0x4
+#define CONVERT_STAT_BITS_ANY_CR    0x8
 
 enum crlf_action {
  CRLF_UNDEFINED,
@@ -31,30 +32,36 @@ enum crlf_action {
 
 struct text_stat {
  /* NUL, CR, LF and CRLF counts */
- unsigned nul, lonecr, lonelf, crlf;
+ unsigned stat_bits, lonecr, lonelf, crlf;
 
  /* These are just approximations! */
  unsigned printable, nonprintable;
 };
 
-static void gather_stats(const char *buf, unsigned long size, struct text_stat *stats)
+static void do_gather_stats(const char *buf, unsigned long size,
+    struct text_stat *stats, unsigned earlyout)
 {
  unsigned long i;
 
- memset(stats, 0, sizeof(*stats));
-
+ if (!buf || !size)
+ return;
  for (i = 0; i < size; i++) {
  unsigned char c = buf[i];
  if (c == '\r') {
+ stats->stat_bits |= CONVERT_STAT_BITS_ANY_CR;
  if (i+1 < size && buf[i+1] == '\n') {
  stats->crlf++;
  i++;
- } else
+ stats->stat_bits |= CONVERT_STAT_BITS_TXT_CRLF;
+ } else {
  stats->lonecr++;
+ stats->stat_bits |= CONVERT_STAT_BITS_BIN;
+ }
  continue;
  }
  if (c == '\n') {
  stats->lonelf++;
+ stats->stat_bits |= CONVERT_STAT_BITS_TXT_LF;
  continue;
  }
  if (c == 127)
@@ -67,7 +74,7 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
  stats->printable++;
  break;
  case 0:
- stats->nul++;
+ stats->stat_bits |= CONVERT_STAT_BITS_BIN;
  /* fall through */
  default:
  stats->nonprintable++;
@@ -75,6 +82,8 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
  }
  else
  stats->printable++;
+ if (stats->stat_bits & earlyout)
+ break; /* We found what we have been searching for */
  }
 
  /* If file ends with EOF then don't count this EOF as non-printable. */
@@ -86,41 +95,63 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat *
  * The same heuristics as diff.c::mmfile_is_binary()
  * We treat files with bare CR as binary
  */
-static int convert_is_binary(unsigned long size, const struct text_stat *stats)
+static void convert_nonprintable(struct text_stat *stats)
 {
- if (stats->lonecr)
- return 1;
- if (stats->nul)
- return 1;
  if ((stats->printable >> 7) < stats->nonprintable)
- return 1;
- return 0;
+ stats->stat_bits |= CONVERT_STAT_BITS_BIN;
+}
+
+static void gather_stats(const char *buf, unsigned long size,
+ struct text_stat *stats, unsigned earlyout)
+{
+ memset(stats, 0, sizeof(*stats));
+ do_gather_stats(buf, size, stats, earlyout);
+ convert_nonprintable(stats);
 }
 
-static unsigned int gather_convert_stats(const char *data, unsigned long size)
+
+static unsigned get_convert_stats_sha1(const char *path,
+       unsigned const char *sha1,
+       unsigned earlyout)
 {
+ struct git_istream *st;
  struct text_stat stats;
- int ret = 0;
- if (!data || !size)
- return 0;
- gather_stats(data, size, &stats);
- if (convert_is_binary(size, &stats))
- ret |= CONVERT_STAT_BITS_BIN;
- if (stats.crlf)
- ret |= CONVERT_STAT_BITS_TXT_CRLF;
- if (stats.lonelf)
- ret |=  CONVERT_STAT_BITS_TXT_LF;
+ enum object_type type;
+ unsigned long sz;
 
- return ret;
+ if (!sha1)
+ return 0;
+ memset(&stats, 0, sizeof(stats));
+ st = open_istream(sha1, &type, &sz, NULL);
+ if (!st) {
+ return 0;
+ }
+ if (type != OBJ_BLOB)
+ goto close_and_exit_i;
+ for (;;) {
+ char buf[1024];
+ ssize_t readlen = read_istream(st, buf, sizeof(buf));
+ if (readlen < 0)
+ break;
+ if (!readlen)
+ break;
+ do_gather_stats(buf, (unsigned long)readlen, &stats, earlyout);
+ if (stats.stat_bits & earlyout)
+ break; /* We found what we have been searching for */
+ }
+close_and_exit_i:
+ close_istream(st);
+ convert_nonprintable(&stats);
+ return stats.stat_bits;
 }
 
-static const char *gather_convert_stats_ascii(const char *data, unsigned long size)
+static const char *convert_stats_ascii(unsigned convert_stats)
 {
- unsigned int convert_stats = gather_convert_stats(data, size);
-
+ unsigned mask = CONVERT_STAT_BITS_TXT_LF |
+ CONVERT_STAT_BITS_TXT_CRLF;
  if (convert_stats & CONVERT_STAT_BITS_BIN)
  return "-text";
- switch (convert_stats) {
+ switch (convert_stats & mask) {
  case CONVERT_STAT_BITS_TXT_LF:
  return "lf";
  case CONVERT_STAT_BITS_TXT_CRLF:
@@ -132,24 +163,46 @@ static const char *gather_convert_stats_ascii(const char *data, unsigned long si
  }
 }
 
+static unsigned get_convert_stats_wt(const char *path)
+{
+ struct text_stat stats;
+ unsigned earlyout = CONVERT_STAT_BITS_BIN;
+ int fd;
+ memset(&stats, 0, sizeof(stats));
+ fd = open(path, O_RDONLY);
+ if (fd < 0)
+ return 0;
+ for (;;) {
+ char buf[1024];
+ ssize_t readlen = read(fd, buf, sizeof(buf));
+ if (readlen < 0)
+ break;
+ if (!readlen)
+ break;
+ do_gather_stats(buf, (unsigned long)readlen, &stats, earlyout);
+ if (stats.stat_bits & earlyout)
+ break; /* We found what we have been searching for */
+ }
+ close(fd);
+ convert_nonprintable(&stats);
+ return stats.stat_bits;
+}
+
 const char *get_cached_convert_stats_ascii(const char *path)
 {
- const char *ret;
- unsigned long sz;
- void *data = read_blob_data_from_cache(path, &sz);
- ret = gather_convert_stats_ascii(data, sz);
- free(data);
- return ret;
+ unsigned convert_stats;
+ unsigned earlyout = CONVERT_STAT_BITS_BIN;
+ convert_stats = get_convert_stats_sha1(path,
+       get_sha1_from_cache(path),
+       earlyout);
+ return convert_stats_ascii(convert_stats);
 }
 
 const char *get_wt_convert_stats_ascii(const char *path)
 {
- const char *ret = "";
- struct strbuf sb = STRBUF_INIT;
- if (strbuf_read_file(&sb, path, 0) >= 0)
- ret = gather_convert_stats_ascii(sb.buf, sb.len);
- strbuf_release(&sb);
- return ret;
+ unsigned convert_stats;
+ convert_stats = get_convert_stats_wt(path);
+ return convert_stats_ascii(convert_stats);
 }
 
 static int text_eol_is_crlf(void)
@@ -219,16 +272,11 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 
 static int has_cr_in_index(const char *path)
 {
- unsigned long sz;
- void *data;
- int has_cr;
-
- data = read_blob_data_from_cache(path, &sz);
- if (!data)
- return 0;
- has_cr = memchr(data, '\r', sz) != NULL;
- free(data);
- return has_cr;
+ unsigned convert_stats;
+ convert_stats = get_convert_stats_sha1(path,
+       get_sha1_from_cache(path),
+       CONVERT_STAT_BITS_ANY_CR);
+ return convert_stats & CONVERT_STAT_BITS_ANY_CR;
 }
 
 static int crlf_to_git(const char *path, const char *src, size_t len,
@@ -249,10 +297,10 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
  if (!buf && !src)
  return 1;
 
- gather_stats(src, len, &stats);
+ gather_stats(src, len, &stats, CONVERT_STAT_BITS_BIN);
 
  if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
- if (convert_is_binary(len, &stats))
+ if (stats.stat_bits & CONVERT_STAT_BITS_BIN)
  return 0;
 
  if (crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
@@ -309,11 +357,13 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
 {
  char *to_free = NULL;
  struct text_stat stats;
+ unsigned earlyout = CONVERT_STAT_BITS_TXT_CRLF | CONVERT_STAT_BITS_BIN;
+
 
  if (!len || output_eol(crlf_action) != EOL_CRLF)
  return 0;
 
- gather_stats(src, len, &stats);
+ gather_stats(src, len, &stats, earlyout);
 
  /* No "naked" LF? Nothing to convert, regardless. */
  if (!stats.lonelf)
@@ -327,7 +377,7 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
  return 0;
  }
 
- if (convert_is_binary(len, &stats))
+ if (stats.stat_bits & CONVERT_STAT_BITS_BIN)
  return 0;
  }
 
--
2.8.0.rc2.6.g3847ccb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 3/7] Allow core.autocrlf=input and core.eol=crlf

Torsten Bögershausen-2
In reply to this post by Junio C Hamano
From: Torsten Bögershausen <[hidden email]>

Commit  942e77 "Add "core.eol" config variable" adds a condition to
the config parser: core.autocrlf=input is not allowed together with
core.eol=crlf.

This may lead to unnecessary problems, when config files are parsed:
A global config file sets core.autocrlf=input,
the repo local config file sets is own configuration: core.eol=crlf

There is no need to prevent combinations in config.c, in any case
core.autocrlf overrides core.eol.
Allow all combinations of core.crlf and core.eol and document
that core.autocrlf overrides core.eol.

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 Documentation/config.txt | 6 +++---
 config.c                 | 4 ----
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2cd6bdd..4a27ad4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -337,9 +337,9 @@ core.quotePath::
 
 core.eol::
  Sets the line ending type to use in the working directory for
- files that have the `text` property set.  Alternatives are
- 'lf', 'crlf' and 'native', which uses the platform's native
- line ending.  The default value is `native`.  See
+ files that have the `text` property set when core.autocrlf is false.
+ Alternatives are 'lf', 'crlf' and 'native', which uses the platform's
+ native line ending.  The default value is `native`.  See
  linkgit:gitattributes[5] for more information on end-of-line
  conversion.
 
diff --git a/config.c b/config.c
index 9ba40bc..a6adc8b 100644
--- a/config.c
+++ b/config.c
@@ -803,8 +803,6 @@ static int git_default_core_config(const char *var, const char *value)
 
  if (!strcmp(var, "core.autocrlf")) {
  if (value && !strcasecmp(value, "input")) {
- if (core_eol == EOL_CRLF)
- return error("core.autocrlf=input conflicts with core.eol=crlf");
  auto_crlf = AUTO_CRLF_INPUT;
  return 0;
  }
@@ -830,8 +828,6 @@ static int git_default_core_config(const char *var, const char *value)
  core_eol = EOL_NATIVE;
  else
  core_eol = EOL_UNSET;
- if (core_eol == EOL_CRLF && auto_crlf == AUTO_CRLF_INPUT)
- return error("core.autocrlf=input conflicts with core.eol=crlf");
  return 0;
  }
 
--
2.8.0.rc2.6.g3847ccb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 4/7] t0027: TC for combined attributes

Torsten Bögershausen-2
In reply to this post by Junio C Hamano
From: Torsten Bögershausen <[hidden email]>

Add more test cases for the not normalized files ("NNO").
The  "text" attribute is most important, use it as the first parameter.
"ident", if set, is the second paramater followed by the eol attribute.
The eol attribute overrides core.autocrlf, which overrides core.eol.

Use loops to test more combinations of attributes, like
"* text eol=crlf" or especially "*text=auto eol=crlf".

Currently "* text=auto eol=lf" does the same as "* text eol=lf", as the
eol attribute overrides "text=auto", this will change in future.

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 t/t0027-auto-crlf.sh | 321 +++++++++++++++++++++++++--------------------------
 1 file changed, 158 insertions(+), 163 deletions(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index f33962b..55f45d3 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -52,14 +52,17 @@ create_gitattributes () {
 create_NNO_files () {
  for crlf in false true input
  do
- for attr in "" auto text -text lf crlf
+ for attr in "" auto text -text
  do
- pfx=NNO_${crlf}_attr_${attr} &&
- cp CRLF_mix_LF ${pfx}_LF.txt &&
- cp CRLF_mix_LF ${pfx}_CRLF.txt &&
- cp CRLF_mix_LF ${pfx}_CRLF_mix_LF.txt &&
- cp CRLF_mix_LF ${pfx}_LF_mix_CR.txt &&
- cp CRLF_mix_LF ${pfx}_CRLF_nul.txt
+ for aeol in "" lf crlf
+ do
+ pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf}
+ cp CRLF_mix_LF ${pfx}_LF.txt &&
+ cp CRLF_mix_LF ${pfx}_CRLF.txt &&
+ cp CRLF_mix_LF ${pfx}_CRLF_mix_LF.txt &&
+ cp CRLF_mix_LF ${pfx}_LF_mix_CR.txt &&
+ cp CRLF_mix_LF ${pfx}_CRLF_nul.txt
+ done
  done
  done
 }
@@ -100,16 +103,17 @@ commit_check_warn () {
 }
 
 commit_chk_wrnNNO () {
- crlf=$1
- attr=$2
- lfwarn=$3
- crlfwarn=$4
- lfmixcrlf=$5
- lfmixcr=$6
- crlfnul=$7
- pfx=NNO_${crlf}_attr_${attr}
+ attr=$1 ; shift
+ aeol=$1 ; shift
+ crlf=$1 ; shift
+ lfwarn=$1 ; shift
+ crlfwarn=$1 ; shift
+ lfmixcrlf=$1 ; shift
+ lfmixcr=$1 ; shift
+ crlfnul=$1 ; shift
+ pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf}
  #Commit files on top of existing file
- create_gitattributes "$attr" &&
+ create_gitattributes "$attr" $aeol &&
  for f in LF CRLF CRLF_mix_LF LF_mix_CR CRLF_nul
  do
  fname=${pfx}_$f.txt &&
@@ -121,19 +125,19 @@ commit_chk_wrnNNO () {
  test_expect_success "commit NNO files crlf=$crlf attr=$attr LF" '
  check_warning "$lfwarn" ${pfx}_LF.err
  '
- test_expect_success "commit NNO files crlf=$crlf attr=$attr CRLF" '
+ test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF" '
  check_warning "$crlfwarn" ${pfx}_CRLF.err
  '
 
- test_expect_success "commit NNO files crlf=$crlf attr=$attr CRLF_mix_LF" '
+ test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF_mix_LF" '
  check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err
  '
 
- test_expect_success "commit NNO files crlf=$crlf attr=$attr LF_mix_cr" '
+ test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf LF_mix_cr" '
  check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err
  '
 
- test_expect_success "commit NNO files crlf=$crlf attr=$attr CRLF_nul" '
+ test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF_nul" '
  check_warning "$crlfnul" ${pfx}_CRLF_nul.err
  '
 }
@@ -162,6 +166,7 @@ stats_ascii () {
 
 # contruct the attr/ returned by git ls-files --eol
 # Take none (=empty), one or two args
+# convert.c: eol=XX overrides text=auto
 attr_ascii () {
  case $1,$2 in
  -text,*)   echo "-text" ;;
@@ -169,8 +174,8 @@ attr_ascii () {
  text,lf)   echo "text eol=lf" ;;
  text,crlf) echo "text eol=crlf" ;;
  auto,)     echo "text=auto" ;;
- auto,lf)   echo "text=auto eol=lf" ;;
- auto,crlf) echo "text=auto eol=crlf" ;;
+ auto,lf)   echo "text eol=lf" ;;
+ auto,crlf) echo "text eol=crlf" ;;
  lf,)       echo "text eol=lf" ;;
  crlf,)     echo "text eol=crlf" ;;
  ,) echo "" ;;
@@ -195,28 +200,29 @@ check_files_in_repo () {
 }
 
 check_in_repo_NNO () {
- crlf=$1
- attr=$2
- lfname=$3
- crlfname=$4
- lfmixcrlf=$5
- lfmixcr=$6
- crlfnul=$7
- pfx=NNO_${crlf}_attr_${attr}_
- test_expect_success "compare_files $lfname ${pfx}LF.txt" '
- compare_files $lfname ${pfx}LF.txt
+ attr=$1 ; shift
+ aeol=$1 ; shift
+ crlf=$1 ; shift
+ lfname=$1 ; shift
+ crlfname=$1 ; shift
+ lfmixcrlf=$1 ; shift
+ lfmixcr=$1 ; shift
+ crlfnul=$1 ; shift
+ pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf}
+ test_expect_success "compare_files $lfname ${pfx}_LF.txt" '
+ compare_files $lfname ${pfx}_LF.txt
  '
- test_expect_success "compare_files $crlfname ${pfx}CRLF.txt" '
- compare_files $crlfname ${pfx}CRLF.txt
+ test_expect_success "compare_files $crlfname ${pfx}_CRLF.txt" '
+ compare_files $crlfname ${pfx}_CRLF.txt
  '
- test_expect_success "compare_files $lfmixcrlf ${pfx}CRLF_mix_LF.txt" '
- compare_files $lfmixcrlf ${pfx}CRLF_mix_LF.txt
+ test_expect_success "compare_files $lfmixcrlf ${pfx}_CRLF_mix_LF.txt" '
+ compare_files $lfmixcrlf ${pfx}_CRLF_mix_LF.txt
  '
- test_expect_success "compare_files $lfmixcr ${pfx}LF_mix_CR.txt" '
- compare_files $lfmixcr ${pfx}LF_mix_CR.txt
+ test_expect_success "compare_files $lfmixcr ${pfx}_LF_mix_CR.txt" '
+ compare_files $lfmixcr ${pfx}_LF_mix_CR.txt
  '
- test_expect_success "compare_files $crlfnul ${pfx}CRLF_nul.txt" '
- compare_files $crlfnul ${pfx}CRLF_nul.txt
+ test_expect_success "compare_files $crlfnul ${pfx}_CRLF_nul.txt" '
+ compare_files $crlfnul ${pfx}_CRLF_nul.txt
  '
 }
 
@@ -231,7 +237,7 @@ checkout_files () {
  lfmixcrlf=$1 ; shift
  lfmixcr=$1 ; shift
  crlfnul=$1 ; shift
- create_gitattributes "$attr" "$ident" &&
+ create_gitattributes "$attr" $ident $aeol &&
  git config core.autocrlf $crlf &&
  pfx=eol_${ceol}_crlf_${crlf}_attr_${attr}_ &&
  for f in LF CRLF LF_mix_CR CRLF_mix_LF LF_nul
@@ -244,7 +250,7 @@ checkout_files () {
  fi
  done
 
- test_expect_success "ls-files --eol attr=$attr $ident $aeol core.autocrlf=$crlf core.eol=$ceol" '
+ test_expect_success "ls-files --eol attr=$attr $ident aeol=$aeol core.autocrlf=$crlf core.eol=$ceol" '
  test_when_finished "rm expect actual" &&
  sort <<-EOF >expect &&
  i/crlf w/$(stats_ascii $crlfname) attr/$(attr_ascii $attr $aeol) crlf_false_attr__CRLF.txt
@@ -385,31 +391,31 @@ test_expect_success 'commit files attr=crlf' '
  commit_check_warn input "crlf" "LF_CRLF" ""        "LF_CRLF" "LF_CRLF" ""
 '
 
-#                       attr   LF        CRLF      CRLFmixLF LF_mix_CR   CRLFNUL
-commit_chk_wrnNNO false ""     ""        ""        ""         ""         ""
-commit_chk_wrnNNO true  ""     "LF_CRLF" ""        ""         ""         ""
-commit_chk_wrnNNO input ""     ""        ""        ""         ""         ""
-
+#                 attr                    LF        CRLF      CRLFmixLF   LF_mix_CR   CRLFNUL
+commit_chk_wrnNNO ""      ""      false   ""        ""        ""          ""          ""
+commit_chk_wrnNNO ""      ""      true    LF_CRLF   ""        ""          ""          ""
+commit_chk_wrnNNO ""      ""      input   ""        ""        ""          ""          ""
 
-commit_chk_wrnNNO false "auto" "$WILC"   "$WICL"   "$WAMIX"   ""         ""
-commit_chk_wrnNNO true  "auto" "LF_CRLF" ""        "LF_CRLF" ""         ""
-commit_chk_wrnNNO input "auto" ""        "CRLF_LF" "CRLF_LF" ""         ""
+commit_chk_wrnNNO "auto"  ""      false   "$WILC"   "$WICL"   "$WAMIX"    ""          ""
+commit_chk_wrnNNO "auto"  ""      true    LF_CRLF   ""        LF_CRLF     ""          ""
+commit_chk_wrnNNO "auto"  ""      input   ""        CRLF_LF   CRLF_LF     ""          ""
 
-commit_chk_wrnNNO false "text" "$WILC"   "$WICL"   "$WAMIX"   "$WILC"   "$WICL"
-commit_chk_wrnNNO true  "text" "LF_CRLF" ""        "LF_CRLF" "LF_CRLF" ""
-commit_chk_wrnNNO input "text" ""        "CRLF_LF" "CRLF_LF" ""         "CRLF_LF"
-
-commit_chk_wrnNNO false "-text" ""       ""        ""         ""         ""
-commit_chk_wrnNNO true  "-text" ""       ""        ""         ""         ""
-commit_chk_wrnNNO input "-text" ""       ""        ""         ""         ""
-
-commit_chk_wrnNNO false "lf"    ""       "CRLF_LF" "CRLF_LF"  ""       "CRLF_LF"
-commit_chk_wrnNNO true  "lf"    ""       "CRLF_LF" "CRLF_LF"  ""       "CRLF_LF"
-commit_chk_wrnNNO input "lf"    ""       "CRLF_LF" "CRLF_LF"  ""       "CRLF_LF"
+for crlf in true false input;
+do
+ commit_chk_wrnNNO -text ""      $crlf   ""        ""        ""          ""          ""
+ commit_chk_wrnNNO -text lf      $crlf   ""        ""        ""          ""          ""
+ commit_chk_wrnNNO -text crlf    $crlf   ""        ""        ""          ""          ""
+ commit_chk_wrnNNO ""    lf      $crlf   ""       CRLF_LF    CRLF_LF      ""         CRLF_LF
+ commit_chk_wrnNNO ""    crlf    $crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
+ commit_chk_wrnNNO auto  lf     $crlf   ""       CRLF_LF    CRLF_LF     ""          CRLF_LF
+ commit_chk_wrnNNO auto  crlf   $crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
+ commit_chk_wrnNNO text  lf     $crlf   ""       CRLF_LF    CRLF_LF     ""          CRLF_LF
+ commit_chk_wrnNNO text  crlf   $crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
+done
 
-commit_chk_wrnNNO false "crlf" "LF_CRLF" ""        "LF_CRLF" "LF_CRLF" ""
-commit_chk_wrnNNO true  "crlf" "LF_CRLF" ""        "LF_CRLF" "LF_CRLF" ""
-commit_chk_wrnNNO input "crlf" "LF_CRLF" ""        "LF_CRLF" "LF_CRLF" ""
+commit_chk_wrnNNO "text"  ""      false   "$WILC"   "$WICL"   "$WAMIX"    "$WILC"     "$WICL"
+commit_chk_wrnNNO "text"  ""      true    LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
+commit_chk_wrnNNO "text"  ""      input   ""        CRLF_LF   CRLF_LF     ""          CRLF_LF
 
 test_expect_success 'create files cleanup' '
  rm -f *.txt &&
@@ -440,24 +446,20 @@ test_expect_success 'commit -text' '
  check_files_in_repo input "-text" LF CRLF CRLF_mix_LF LF_mix_CR CRLF_nul
 '
 
-#                       attr    LF        CRLF      CRLF_mix_LF  LF_mix_CR CRLFNUL
-check_in_repo_NNO false ""      LF        CRLF      CRLF_mix_LF  LF_mix_CR CRLF_nul
-check_in_repo_NNO true  ""      LF        CRLF      CRLF_mix_LF  LF_mix_CR CRLF_nul
-check_in_repo_NNO input ""      LF        CRLF      CRLF_mix_LF  LF_mix_CR CRLF_nul
-
-check_in_repo_NNO false "auto"  LF        LF        LF           LF_mix_CR CRLF_nul
-check_in_repo_NNO true  "auto"  LF        LF        LF           LF_mix_CR CRLF_nul
-check_in_repo_NNO input "auto"  LF        LF        LF           LF_mix_CR CRLF_nul
-
-check_in_repo_NNO false "text"  LF        LF        LF           LF_mix_CR LF_nul
-check_in_repo_NNO true  "text"  LF        LF        LF           LF_mix_CR LF_nul
-check_in_repo_NNO input "text"  LF        LF        LF           LF_mix_CR LF_nul
-
-check_in_repo_NNO false "-text" LF        CRLF      CRLF_mix_LF  LF_mix_CR CRLF_nul
-check_in_repo_NNO true  "-text" LF        CRLF      CRLF_mix_LF  LF_mix_CR CRLF_nul
-check_in_repo_NNO input "-text" LF        CRLF      CRLF_mix_LF  LF_mix_CR CRLF_nul
-
-
+for crlf in true false input;
+do
+ #                 attr  aeol           LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLFNUL
+ check_in_repo_NNO ""    ""     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+ check_in_repo_NNO -text ""     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+ check_in_repo_NNO -text lf     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+ check_in_repo_NNO -text crlf   $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+ check_in_repo_NNO auto  ""     $crlf   LF  LF    LF           LF_mix_CR  CRLF_nul
+ check_in_repo_NNO auto  lf     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
+ check_in_repo_NNO auto  crlf   $crlf   LF  LF    LF           LF_mix_CR  LF_nul
+ check_in_repo_NNO text  ""     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
+ check_in_repo_NNO text  lf     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
+ check_in_repo_NNO text  crlf   $crlf   LF  LF    LF           LF_mix_CR  LF_nul
+done
 ################################################################################
 # Check how files in the repo are changed when they are checked out
 # How to read the table below:
@@ -489,89 +491,82 @@ LFNUL=LF_nul
 fi
 export CRLF_MIX_LF_CR MIX NL
 
-checkout_files ""      "" ""    false  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      "" ""    false  crlf     LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      "" ""    false  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      "" ""    false  native   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      "" ""    input  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      "" ""    input  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      "" ""    true   ""       CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      "" ""    true   crlf     CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      "" ""    true   lf       CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      "" ""    true   native   CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    false  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    false  crlf     LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    false  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    false  native   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    input  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    input  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    true   ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    true   crlf     LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    true   lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files ""      ident ""    true   native   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    false  ""       $NL   CRLF  $MIX_CRLF_LF LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    false  crlf     CRLF  CRLF  CRLF         LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    false  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    false  native   $NL   CRLF  $MIX_CRLF_LF LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    input  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    input  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    true   ""       CRLF  CRLF  CRLF         LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    true   crlf     CRLF  CRLF  CRLF         LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    true   lf       CRLF  CRLF  CRLF         LF_mix_CR    LF_nul
-checkout_files "auto"  "" ""    true   native   CRLF  CRLF  CRLF         LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    false  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    false  crlf     LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    false  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    false  native   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    input  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    input  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    true   ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    true   crlf     LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    true   lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-checkout_files "auto"  ident ""    true   native   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-
+# Same handling with and without ident
 for id in "" ident;
 do
- checkout_files "crlf"  "$id" ""    false  ""       CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "crlf"  "$id" ""    false  crlf     CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "crlf"  "$id" ""    false  lf       CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "crlf"  "$id" ""    false  native   CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "crlf"  "$id" ""    input  ""       CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "crlf"  "$id" ""    input  lf       CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "crlf"  "$id" ""    true   ""       CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "crlf"  "$id" ""    true   crlf     CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "crlf"  "$id" ""    true   lf       CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "crlf"  "$id" ""    true   native   CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "lf"    "$id" ""    false  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "lf"    "$id" ""    false  crlf     LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "lf"    "$id" ""    false  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "lf"    "$id" ""    false  native   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "lf"    "$id" ""    input  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "lf"    "$id" ""    input  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "lf"    "$id" ""    true   ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "lf"    "$id" ""    true   crlf     LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "lf"    "$id" ""    true   lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "lf"    "$id" ""    true   native   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "text"  "$id" ""    false  ""       $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
- checkout_files "text"  "$id" ""    false  crlf     CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "text"  "$id" ""    false  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "text"  "$id" ""    false  native   $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
- checkout_files "text"  "$id" ""    input  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "text"  "$id" ""    input  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "text"  "$id" ""    true   ""       CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "text"  "$id" ""    true   crlf     CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "text"  "$id" ""    true   lf       CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "text"  "$id" ""    true   native   CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
- checkout_files "-text" "$id" ""    false  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "-text" "$id" ""    false  crlf     LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "-text" "$id" ""    false  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "-text" "$id" ""    false  native   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "-text" "$id" ""    input  ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "-text" "$id" ""    input  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "-text" "$id" ""    true   ""       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "-text" "$id" ""    true   crlf     LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "-text" "$id" ""    true   lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "-text" "$id" ""    true   native   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ # -text overrides core.autocrlf and core.eol
+ # text and eol=crlf or eol=lf override core.autocrlf and core.eol
+ for crlf in true false input;
+ do
+ for ceol in "" lf crlf native;
+ do
+ checkout_files "-text" "$id" ""     "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "-text" "$id" "lf"   "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "-text" "$id" "crlf" "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "text"  "$id" "lf"   "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "text"  "$id" "crlf" "$crlf"  "$ceol"    CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ done
+ done
+ # text: core.autocrlf=false uses core.eol
+ checkout_files "text"  "$id"  ""    false  crlf     CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ checkout_files "text"  "$id"  ""    false  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ # text: core.autocrlf=false and core.eol unset(or native) uses native eol
+ checkout_files "text"  "$id"  ""    false  ""       $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
+ checkout_files "text"  "$id"  ""    false  native   $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
+ checkout_files "auto"  "$id"  ""    false  ""       $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
+ checkout_files "auto"  "$id"  ""    false  native   $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
+
+ for ceol in "" lf crlf native;
+ do
+ # text: core.autocrlf = true overrides core.eol
+ checkout_files "text"  "$id"  ""    true   "$ceol"  CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ # text: core.autocrlf = input overrides core.eol
+ checkout_files "text"  "$id"  ""    input  "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id"  ""    input  "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ done
+ # No text attr
+ for ceol in "" lf crlf native;
+ do
+ # core.autocrlf!=true gives LF (or mixed) at checkout
+ for crlf in false input;
+ do
+ checkout_files ""    "$id" ""    "$crlf" "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ done
+ done
+done
+
+#Handling without ident
+for id in "";
+do
+ for ceol in "" lf crlf native;
+ do
+ # core.autocrlf=true
+ checkout_files ""      "$id" ""    true   "$ceol"   CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id" ""    true   "$ceol"   CRLF  CRLF  CRLF         LF_mix_CR    LF_nul
+ # text=auto + eol=XXX
+ # currently the same as text, eol=XXX
+ checkout_files "auto"  "$id" "lf"   "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id" "crlf" "$crlf"  "$ceol"    CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ done
+done
+
+#Handling with ident
+for id in ident;
+do
+ for crlf in "" true false input;
+ do
+ for ceol in "" lf crlf native;
+ do
+ # ident does not react on core.autocrlf=true
+ checkout_files ""      "$id" ""   "$crlf" "$ceol"   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ # ident does not react on text=auto
+ checkout_files "auto"  "$id" ""    true   "$ceol"   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ # text=auto + eol=XXX
+ # currently the same as text, eol=XXX
+ checkout_files "auto"  "$id" "lf"   "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id" "crlf" "$crlf"  "$ceol"    CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ done
+ done
 done
 
 # Should be the last test case: remove some files from the worktree
--
2.8.0.rc2.6.g3847ccb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 5/7] CRLF: unify the "auto" handling

Torsten Bögershausen-2
In reply to this post by Junio C Hamano
From: Torsten Bögershausen <[hidden email]>

Make .gitattributes "* text=auto eol=crlf" to do the same as
setting core.autocrlf=true and "* text=auto eol=crlf" the same
as core.autocrlf=input

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 Documentation/config.txt        | 11 +++++-----
 Documentation/gitattributes.txt |  7 ++++---
 convert.c                       | 37 ++++++++++++++++-----------------
 t/t0025-crlf-auto.sh            |  4 ++--
 t/t0027-auto-crlf.sh            | 45 +++++++++++++++++++----------------------
 5 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4a27ad4..3191a42 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -389,14 +389,15 @@ file with mixed line endings would be reported by the `core.safecrlf`
 mechanism.
 
 core.autocrlf::
- Setting this variable to "true" is almost the same as setting
- the `text` attribute to "auto" on all files except that text
- files are not guaranteed to be normalized: files that contain
+ Setting this variable to "true" is the same as setting
+ the attributes to "auto eol=crlf" on all files.
+ Files are not guaranteed to be normalized: files that contain
  `CRLF` in the repository will not be touched.  Use this
  setting if you want to have `CRLF` line endings in your
- working directory even though the repository does not have
- normalized line endings.  This variable can be set to 'input',
+ working directory and he repository has normalized line endings.
+ This variable can be set to 'input',
  in which case no output conversion is performed.
+ Note: older versions of Git behave different.
 
 core.symlinks::
  If false, symbolic links are checked out as small plain files that
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..43b70b5 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -115,6 +115,7 @@ text file is normalized, its line endings are converted to LF in the
 repository.  To control what line ending style is used in the working
 directory, use the `eol` attribute for a single file and the
 `core.eol` configuration variable for all text files.
+Note that `core.autocrlf` overrides `core.eol`
 
 Set::
 
@@ -187,8 +188,8 @@ regardless of their content.
 
 ------------------------
 *.txt text
-*.vcproj eol=crlf
-*.sh eol=lf
+*.vcproj text eol=crlf
+*.sh text eol=lf
 *.jpg -text
 ------------------------
 
@@ -198,7 +199,7 @@ normalization in Git.
 
 If you simply want to have CRLF line endings in your working directory
 regardless of the repository you are working with, you can set the
-config variable "core.autocrlf" without changing any attributes.
+config variable "core.autocrlf" without using any attributes.
 
 ------------------------
 [core]
diff --git a/convert.c b/convert.c
index b6da114..4ed5d89 100644
--- a/convert.c
+++ b/convert.c
@@ -229,7 +229,9 @@ static enum eol output_eol(enum crlf_action crlf_action)
  return EOL_LF;
  case CRLF_UNDEFINED:
  case CRLF_AUTO_CRLF:
+ return EOL_CRLF;
  case CRLF_AUTO_INPUT:
+ return EOL_LF;
  case CRLF_TEXT:
  case CRLF_AUTO:
  /* fall through */
@@ -302,15 +304,12 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
  if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
  if (stats.stat_bits & CONVERT_STAT_BITS_BIN)
  return 0;
-
- if (crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
- /*
- * If the file in the index has any CR in it, do not convert.
- * This is the new safer autocrlf handling.
- */
- if (has_cr_in_index(path))
- return 0;
- }
+ /*
+ * If the file in the index has any CR in it, do not convert.
+ * This is the new safer autocrlf handling.
+ */
+ if (has_cr_in_index(path))
+ return 0;
  }
 
  check_safe_crlf(path, crlf_action, &stats, checksafe);
@@ -370,12 +369,10 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
  return 0;
 
  if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
- if (crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
- /* If we have any CR or CRLF line endings, we do not touch it */
- /* This is the new safer autocrlf-handling */
- if (stats.lonecr || stats.crlf )
- return 0;
- }
+ /* If we have any CR or CRLF line endings, we do not touch it */
+ /* This is the new safer autocrlf-handling */
+ if (stats.lonecr || stats.crlf )
+ return 0;
 
  if (stats.stat_bits & CONVERT_STAT_BITS_BIN)
  return 0;
@@ -836,7 +833,11 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
  ca->drv = git_path_check_convert(ccheck + 2);
  if (ca->crlf_action != CRLF_BINARY) {
  enum eol eol_attr = git_path_check_eol(ccheck + 3);
- if (eol_attr == EOL_LF)
+ if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
+ ca->crlf_action = CRLF_AUTO_INPUT;
+ else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_CRLF)
+ ca->crlf_action = CRLF_AUTO_CRLF;
+ else if (eol_attr == EOL_LF)
  ca->crlf_action = CRLF_TEXT_INPUT;
  else if (eol_attr == EOL_CRLF)
  ca->crlf_action = CRLF_TEXT_CRLF;
@@ -895,9 +896,9 @@ const char *get_convert_attr_ascii(const char *path)
  case CRLF_AUTO:
  return "text=auto";
  case CRLF_AUTO_CRLF:
- return "text=auto eol=crlf"; /* This is not supported yet */
+ return "text=auto eol=crlf";
  case CRLF_AUTO_INPUT:
- return "text=auto eol=lf"; /* This is not supported yet */
+ return "text=auto eol=lf";
  }
  return "";
 }
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index c164b46..d0bee08 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -114,7 +114,7 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
  test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
-test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
+test_expect_success 'text=auto, autocrlf=true does not normalize CRLF files' '
 
  rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
  git config core.autocrlf true &&
@@ -126,7 +126,7 @@ test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
  LFonlydiff=$(git diff LFonly) &&
  CRLFonlydiff=$(git diff CRLFonly) &&
  LFwithNULdiff=$(git diff LFwithNUL) &&
- test -z "$LFonlydiff" -a -n "$CRLFonlydiff" -a -z "$LFwithNULdiff"
+ test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
 test_expect_success 'text=auto, autocrlf=true does not normalize binary files' '
diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 55f45d3..86175cf 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -174,8 +174,8 @@ attr_ascii () {
  text,lf)   echo "text eol=lf" ;;
  text,crlf) echo "text eol=crlf" ;;
  auto,)     echo "text=auto" ;;
- auto,lf)   echo "text eol=lf" ;;
- auto,crlf) echo "text eol=crlf" ;;
+ auto,lf)   echo "text=auto eol=lf" ;;
+ auto,crlf) echo "text=auto eol=crlf" ;;
  lf,)       echo "text eol=lf" ;;
  crlf,)     echo "text eol=crlf" ;;
  ,) echo "" ;;
@@ -396,10 +396,9 @@ commit_chk_wrnNNO ""      ""      false   ""        ""        ""          ""
 commit_chk_wrnNNO ""      ""      true    LF_CRLF   ""        ""          ""          ""
 commit_chk_wrnNNO ""      ""      input   ""        ""        ""          ""          ""
 
-commit_chk_wrnNNO "auto"  ""      false   "$WILC"   "$WICL"   "$WAMIX"    ""          ""
-commit_chk_wrnNNO "auto"  ""      true    LF_CRLF   ""        LF_CRLF     ""          ""
-commit_chk_wrnNNO "auto"  ""      input   ""        CRLF_LF   CRLF_LF     ""          ""
-
+commit_chk_wrnNNO "auto"  ""      false   "$WILC"   ""        ""          ""          ""
+commit_chk_wrnNNO "auto"  ""      true    LF_CRLF   ""        ""          ""          ""
+commit_chk_wrnNNO "auto"  ""      input   ""        ""        ""          ""          ""
 for crlf in true false input;
 do
  commit_chk_wrnNNO -text ""      $crlf   ""        ""        ""          ""          ""
@@ -407,8 +406,8 @@ do
  commit_chk_wrnNNO -text crlf    $crlf   ""        ""        ""          ""          ""
  commit_chk_wrnNNO ""    lf      $crlf   ""       CRLF_LF    CRLF_LF      ""         CRLF_LF
  commit_chk_wrnNNO ""    crlf    $crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
- commit_chk_wrnNNO auto  lf     $crlf   ""       CRLF_LF    CRLF_LF     ""          CRLF_LF
- commit_chk_wrnNNO auto  crlf   $crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
+ commit_chk_wrnNNO auto  lf     $crlf   ""        ""        ""          ""          ""
+ commit_chk_wrnNNO auto  crlf   $crlf   LF_CRLF   ""        ""          ""          ""
  commit_chk_wrnNNO text  lf     $crlf   ""       CRLF_LF    CRLF_LF     ""          CRLF_LF
  commit_chk_wrnNNO text  crlf   $crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
 done
@@ -453,9 +452,9 @@ do
  check_in_repo_NNO -text ""     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
  check_in_repo_NNO -text lf     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
  check_in_repo_NNO -text crlf   $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
- check_in_repo_NNO auto  ""     $crlf   LF  LF    LF           LF_mix_CR  CRLF_nul
- check_in_repo_NNO auto  lf     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
- check_in_repo_NNO auto  crlf   $crlf   LF  LF    LF           LF_mix_CR  LF_nul
+ check_in_repo_NNO auto  ""     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+ check_in_repo_NNO auto  lf     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+ check_in_repo_NNO auto  crlf   $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
  check_in_repo_NNO text  ""     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
  check_in_repo_NNO text  lf     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
  check_in_repo_NNO text  crlf   $crlf   LF  LF    LF           LF_mix_CR  LF_nul
@@ -513,8 +512,6 @@ do
  # text: core.autocrlf=false and core.eol unset(or native) uses native eol
  checkout_files "text"  "$id"  ""    false  ""       $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
  checkout_files "text"  "$id"  ""    false  native   $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
- checkout_files "auto"  "$id"  ""    false  ""       $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
- checkout_files "auto"  "$id"  ""    false  native   $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
 
  for ceol in "" lf crlf native;
  do
@@ -541,30 +538,30 @@ do
  for ceol in "" lf crlf native;
  do
  # core.autocrlf=true
- checkout_files ""      "$id" ""    true   "$ceol"   CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "auto"  "$id" ""    true   "$ceol"   CRLF  CRLF  CRLF         LF_mix_CR    LF_nul
+ checkout_files ""      "$id" ""     true   "$ceol"   CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id" ""     true   "$ceol"   CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id"  ""    false   ""       $NL   CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id"  ""    false   native   $NL   CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
  # text=auto + eol=XXX
- # currently the same as text, eol=XXX
- checkout_files "auto"  "$id" "lf"   "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "auto"  "$id" "crlf" "$crlf"  "$ceol"    CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ checkout_files "auto"  "$id" "lf"   "$crlf" "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id" "crlf" "$crlf" "$ceol"  CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
  done
 done
 
 #Handling with ident
 for id in ident;
 do
- for crlf in "" true false input;
+ for crlf in true false input;
  do
  for ceol in "" lf crlf native;
  do
  # ident does not react on core.autocrlf=true
- checkout_files ""      "$id" ""   "$crlf" "$ceol"   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files ""      "$id" ""     "$crlf" "$ceol"   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
  # ident does not react on text=auto
- checkout_files "auto"  "$id" ""    true   "$ceol"   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id" ""      true   "$ceol"   LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
  # text=auto + eol=XXX
- # currently the same as text, eol=XXX
- checkout_files "auto"  "$id" "lf"   "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "auto"  "$id" "crlf" "$crlf"  "$ceol"    CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ checkout_files "auto"  "$id" "lf"   "$crlf"  "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+ checkout_files "auto"  "$id" "crlf" "$crlf"  "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
  done
  done
 done
--
2.8.0.rc2.6.g3847ccb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 7/7] convert.c: more safer crlf handling with text attribute

Torsten Bögershausen-2
In reply to this post by Junio C Hamano
From: Torsten Bögershausen <[hidden email]>

A follow-up after a discussion how to fix the flaky execution
of t0025, gmane/$284352.

This patch extends the work done in commit c480539:
"Make it work also for un-normalized repositories". Make sure that CRLF
can be converted round trip, or don't convert them at all.

The old handling would treat a file as unchanged after checkout,
as long as it is not touched in the work tree and mtime matches the value
recorded in the index.
When the mtime is changed in the working tree, or the inode is changed,
the file is reported as modified.

The following sequence is now handled reproducable:
$ git init
$ printf "line1\r\n" >file.bat
$ git add file.bat
$ git commit -m "Add file with CRLF" file.bat
$ echo "*.bat text eol=crlf" >.gitattributes
$ git commit -m "bat files should have CRLF"
$ git status
 # nothing to commit, working directory clean
$ git push <upstream>
$ printf "newline\r\n" >>file.bat
$ mv file.bat file.sav
$ git checkout file.bat
$ git status
 #modified:   file.bat

The new handling makes sure that after running "git reset --hard".
"git status" reports the working tree as clean regarding CRLF conversion.
It makes sure that the Git-internal eol conversion is
doing roundtrip. A user can still write an external smudge/clean filter
outside Git, which doesn't do a roundtrip and the working directory is
not clean.

The functionality of has_cr_in_index() is turned into has_crlf_in_index(),
and the function is integrated into would_convert_crlf_at_commit().

Check for CRLF in the index instead of CR, the bit CONVERT_STAT_BITS_ANY_CR
is no longer used and removed, as well as "lonecr" in struct text_stat.

Rewrite check_safe_crlf() in convert.c to simulate checkin-checkout,
to detect whether any line endings are converted.

Modify the lf_to_crlf_filter:
Files with LF are converted into CRLF, file with CRLF are not changed.
Files with mixed line endings are not converted, the filter fails, and Git
falls back to the non-streaming handling, see write_entry().

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 Documentation/gitattributes.txt |  10 +-
 convert.c                       | 215 ++++++++++++++++++++++++----------------
 t/t0025-crlf-auto.sh            |   8 +-
 t/t0027-auto-crlf.sh            |  38 +++----
 4 files changed, 158 insertions(+), 113 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 43b70b5..7929880 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -110,7 +110,7 @@ repository upon 'git add' and 'git commit'.
 `text`
 ^^^^^^
 
-This attribute enables and controls end-of-line normalization.  When a
+This attribute enables and controls end-of-line conversion.  When a
 text file is normalized, its line endings are converted to LF in the
 repository.  To control what line ending style is used in the working
 directory, use the `eol` attribute for a single file and the
@@ -120,8 +120,11 @@ Note that `core.autocrlf` overrides `core.eol`
 Set::
 
  Setting the `text` attribute on a path enables end-of-line
- normalization and marks the path as a text file.  End-of-line
+ conversion and marks the path as a text file.  End-of-line
  conversion takes place without guessing the content type.
+ Files that have been commited with CRLF before the text attribute
+ is set and commited are not normalized. No end-of-line conversion
+ is done at checkout or checkin.
 
 Unset::
 
@@ -132,7 +135,8 @@ Set to string value "auto"::
 
  When `text` is set to "auto", the path is marked for automatic
  end-of-line normalization.  If Git decides that the content is
- text, its line endings are normalized to LF on checkin.
+ text, and the path has no CRLF in the index,
+ its line endings are converted to LF on checkin.
 
 Unspecified::
 
diff --git a/convert.c b/convert.c
index 02c50da..8266d87 100644
--- a/convert.c
+++ b/convert.c
@@ -17,7 +17,6 @@
 #define CONVERT_STAT_BITS_TXT_LF    0x1
 #define CONVERT_STAT_BITS_TXT_CRLF  0x2
 #define CONVERT_STAT_BITS_BIN       0x4
-#define CONVERT_STAT_BITS_ANY_CR    0x8
 
 enum crlf_action {
  CRLF_UNDEFINED,
@@ -32,7 +31,7 @@ enum crlf_action {
 
 struct text_stat {
  /* NUL, CR, LF and CRLF counts */
- unsigned stat_bits, lonecr, lonelf, crlf;
+ unsigned stat_bits, lonelf;
 
  /* These are just approximations! */
  unsigned printable, nonprintable;
@@ -48,13 +47,10 @@ static void do_gather_stats(const char *buf, unsigned long size,
  for (i = 0; i < size; i++) {
  unsigned char c = buf[i];
  if (c == '\r') {
- stats->stat_bits |= CONVERT_STAT_BITS_ANY_CR;
  if (i+1 < size && buf[i+1] == '\n') {
- stats->crlf++;
  i++;
  stats->stat_bits |= CONVERT_STAT_BITS_TXT_CRLF;
  } else {
- stats->lonecr++;
  stats->stat_bits |= CONVERT_STAT_BITS_BIN;
  }
  continue;
@@ -136,7 +132,7 @@ static unsigned get_convert_stats_sha1(const char *path,
  if (!readlen)
  break;
  do_gather_stats(buf, (unsigned long)readlen, &stats, earlyout);
- if (stats.stat_bits & earlyout)
+ if ((stats.stat_bits & earlyout) == earlyout)
  break; /* We found what we have been searching for */
  }
 close_and_exit_i:
@@ -147,11 +143,9 @@ close_and_exit_i:
 
 static const char *convert_stats_ascii(unsigned convert_stats)
 {
- unsigned mask = CONVERT_STAT_BITS_TXT_LF |
- CONVERT_STAT_BITS_TXT_CRLF;
  if (convert_stats & CONVERT_STAT_BITS_BIN)
  return "-text";
- switch (convert_stats & mask) {
+ switch (convert_stats) {
  case CONVERT_STAT_BITS_TXT_LF:
  return "lf";
  case CONVERT_STAT_BITS_TXT_CRLF:
@@ -163,7 +157,17 @@ static const char *convert_stats_ascii(unsigned convert_stats)
  }
 }
 
-static unsigned get_convert_stats_wt(const char *path)
+const char *get_cached_convert_stats_ascii(const char *path)
+{
+ unsigned convert_stats;
+ unsigned earlyout = CONVERT_STAT_BITS_BIN;
+ convert_stats = get_convert_stats_sha1(path,
+       get_sha1_from_cache(path),
+       earlyout);
+ return convert_stats_ascii(convert_stats);
+}
+
+const char *get_wt_convert_stats_ascii(const char *path)
 {
  struct text_stat stats;
  unsigned earlyout = CONVERT_STAT_BITS_BIN;
@@ -185,24 +189,7 @@ static unsigned get_convert_stats_wt(const char *path)
  }
  close(fd);
  convert_nonprintable(&stats);
- return stats.stat_bits;
-}
-
-const char *get_cached_convert_stats_ascii(const char *path)
-{
- unsigned convert_stats;
- unsigned earlyout = CONVERT_STAT_BITS_BIN;
- convert_stats = get_convert_stats_sha1(path,
-       get_sha1_from_cache(path),
-       earlyout);
- return convert_stats_ascii(convert_stats);
-}
-
-const char *get_wt_convert_stats_ascii(const char *path)
-{
- unsigned convert_stats;
- convert_stats = get_convert_stats_wt(path);
- return convert_stats_ascii(convert_stats);
+ return convert_stats_ascii(stats.stat_bits);
 }
 
 static int text_eol_is_crlf(void)
@@ -241,53 +228,90 @@ static enum eol output_eol(enum crlf_action crlf_action)
  return core_eol;
 }
 
+static int would_convert_lf_at_checkout(unsigned convert_stats,
+ size_t len,
+ enum crlf_action crlf_action)
+{
+ if (output_eol(crlf_action) != EOL_CRLF)
+ return 0;
+
+ /* No "naked" LF? Nothing to convert, regardless. */
+ if (!convert_stats & CONVERT_STAT_BITS_TXT_LF)
+ return 0;
+
+ if (crlf_action == CRLF_AUTO ||
+    crlf_action == CRLF_AUTO_INPUT ||
+    crlf_action == CRLF_AUTO_CRLF) {
+ /* auto: binary files are not converted */
+ if (convert_stats & CONVERT_STAT_BITS_BIN)
+ return 0;
+ }
+ /* If we have any CRLF line endings, we do not touch it */
+ /* This is the new safer autocrlf-handling */
+ if (convert_stats & CONVERT_STAT_BITS_TXT_CRLF)
+ return 0;
+ return 1;
+
+}
+
+static int would_convert_crlf_at_commit(const char * path,
+ const struct text_stat *stats,
+ size_t len,
+ enum crlf_action crlf_action)
+{
+ unsigned stat_bits_index;
+ /* No CRLF? Nothing to convert, regardless. */
+ if (!(stats->stat_bits & CONVERT_STAT_BITS_TXT_CRLF))
+ return 0;
+ /*
+ * If the file in the index has any CRLF in it, do not convert.
+ * This is the new safer autocrlf handling.
+ */
+ stat_bits_index = get_convert_stats_sha1(path,
+ get_sha1_from_cache(path),
+ CONVERT_STAT_BITS_TXT_CRLF);
+ if (stat_bits_index & CONVERT_STAT_BITS_TXT_CRLF)
+ return 0;
+ return 1;
+}
+
 static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
-                            struct text_stat *stats, enum safe_crlf checksafe)
+    enum safe_crlf checksafe,
+    unsigned convert_stats, unsigned new_convert_stats)
 {
  if (!checksafe)
  return;
-
- if (output_eol(crlf_action) == EOL_LF) {
+ if (convert_stats & CONVERT_STAT_BITS_TXT_CRLF &&
+    !(new_convert_stats & CONVERT_STAT_BITS_TXT_CRLF)) {
  /*
  * CRLFs would not be restored by checkout:
  * check if we'd remove CRLFs
  */
- if (stats->crlf) {
- if (checksafe == SAFE_CRLF_WARN)
- warning("CRLF will be replaced by LF in %s.\nThe file will have its original line endings in your working directory.", path);
- else /* i.e. SAFE_CRLF_FAIL */
- die("CRLF would be replaced by LF in %s.", path);
- }
- } else if (output_eol(crlf_action) == EOL_CRLF) {
+ if (checksafe == SAFE_CRLF_WARN)
+ warning("CRLF will be replaced by LF in %s.\nThe file will have its original line endings in your working directory.", path);
+ else /* i.e. SAFE_CRLF_FAIL */
+ die("CRLF would be replaced by LF in %s.", path);
+ }
+ if (convert_stats & CONVERT_STAT_BITS_TXT_LF &&
+    !(new_convert_stats & CONVERT_STAT_BITS_TXT_LF)) {
  /*
  * CRLFs would be added by checkout:
  * check if we have "naked" LFs
  */
- if (stats->lonelf) {
- if (checksafe == SAFE_CRLF_WARN)
- warning("LF will be replaced by CRLF in %s.\nThe file will have its original line endings in your working directory.", path);
- else /* i.e. SAFE_CRLF_FAIL */
- die("LF would be replaced by CRLF in %s", path);
- }
+ if (checksafe == SAFE_CRLF_WARN)
+ warning("LF will be replaced by CRLF in %s.\nThe file will have its original line endings in your working directory.", path);
+ else /* i.e. SAFE_CRLF_FAIL */
+ die("LF would be replaced by CRLF in %s", path);
  }
 }
 
-static int has_cr_in_index(const char *path)
-{
- unsigned convert_stats;
- convert_stats = get_convert_stats_sha1(path,
-       get_sha1_from_cache(path),
-       CONVERT_STAT_BITS_ANY_CR);
- return convert_stats & CONVERT_STAT_BITS_ANY_CR;
-}
-
 static int crlf_to_git(const char *path, const char *src, size_t len,
        struct strbuf *buf,
        enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
  struct text_stat stats;
  char *dst;
-
+ int convert_crlf;
  if (crlf_action == CRLF_BINARY ||
     (src && !len))
  return 0;
@@ -299,23 +323,36 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
  if (!buf && !src)
  return 1;
 
- gather_stats(src, len, &stats, CONVERT_STAT_BITS_BIN);
-
- if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
+ if (crlf_action == CRLF_AUTO ||
+    crlf_action == CRLF_AUTO_INPUT ||
+    crlf_action == CRLF_AUTO_CRLF) {
+ gather_stats(src, len, &stats, CONVERT_STAT_BITS_BIN);
  if (stats.stat_bits & CONVERT_STAT_BITS_BIN)
  return 0;
- /*
- * If the file in the index has any CR in it, do not convert.
- * This is the new safer autocrlf handling.
- */
- if (has_cr_in_index(path))
- return 0;
+ } else {
+ gather_stats(src, len, &stats, 0);
  }
-
- check_safe_crlf(path, crlf_action, &stats, checksafe);
-
- /* Optimization: No CRLF? Nothing to convert, regardless. */
- if (!stats.crlf)
+ convert_crlf = would_convert_crlf_at_commit(path, &stats, len,
+    crlf_action);
+ if (checksafe) {
+ unsigned convert_stats = stats.stat_bits;
+ unsigned new_convert_stats = convert_stats;
+ /* Simulate commit */
+ if (convert_crlf &&
+    (new_convert_stats & CONVERT_STAT_BITS_TXT_CRLF)) {
+ new_convert_stats |= CONVERT_STAT_BITS_TXT_LF;
+ new_convert_stats &= ~CONVERT_STAT_BITS_TXT_CRLF;
+ }
+ /* Simulate checkout */
+ if (would_convert_lf_at_checkout(new_convert_stats,
+ len, crlf_action)) {
+ new_convert_stats |= CONVERT_STAT_BITS_TXT_CRLF;
+ new_convert_stats &= ~CONVERT_STAT_BITS_TXT_LF;
+ }
+ check_safe_crlf(path, crlf_action, checksafe,
+ convert_stats, new_convert_stats);
+ }
+ if (!convert_crlf)
  return 0;
 
  /*
@@ -329,7 +366,9 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
  if (strbuf_avail(buf) + buf->len < len)
  strbuf_grow(buf, len - buf->len);
  dst = buf->buf;
- if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
+ if (crlf_action == CRLF_AUTO ||
+    crlf_action == CRLF_AUTO_INPUT ||
+    crlf_action == CRLF_AUTO_CRLF) {
  /*
  * If we guessed, we already know we rejected a file with
  * lone CR, and we can strip a CR without looking at what
@@ -356,28 +395,15 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
 {
  char *to_free = NULL;
  struct text_stat stats;
- unsigned earlyout = CONVERT_STAT_BITS_TXT_CRLF | CONVERT_STAT_BITS_BIN;
-
-
- if (!len || output_eol(crlf_action) != EOL_CRLF)
+ unsigned earlyout = 0; /* Need to count lonelf */
+ if (!len)
  return 0;
 
  gather_stats(src, len, &stats, earlyout);
-
- /* No "naked" LF? Nothing to convert, regardless. */
- if (!stats.lonelf)
+ if (!would_convert_lf_at_checkout(stats.stat_bits,
+  len, crlf_action))
  return 0;
 
- if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
- /* If we have any CR or CRLF line endings, we do not touch it */
- /* This is the new safer autocrlf-handling */
- if (stats.lonecr || stats.crlf )
- return 0;
-
- if (stats.stat_bits & CONVERT_STAT_BITS_BIN)
- return 0;
- }
-
  /* are we "faking" in place editing ? */
  if (src == buf->buf)
  to_free = strbuf_detach(buf, NULL);
@@ -1079,6 +1105,8 @@ int is_null_stream_filter(struct stream_filter *filter)
 struct lf_to_crlf_filter {
  struct stream_filter filter;
  unsigned has_held:1;
+ unsigned expanded_loneLF:1;
+ unsigned had_CRLF:1;
  char held;
 };
 
@@ -1119,7 +1147,12 @@ static int lf_to_crlf_filter_fn(struct stream_filter *filter,
  char ch = input[i];
 
  if (ch == '\n') {
- output[o++] = '\r';
+ if (!lf_to_crlf->had_CRLF) {
+ output[o++] = '\r';
+ lf_to_crlf->expanded_loneLF = 1;
+ }
+ if (was_cr)
+ lf_to_crlf->had_CRLF = 1;
  } else if (was_cr) {
  /*
  * Previous round saw CR and it is not followed
@@ -1148,6 +1181,14 @@ static int lf_to_crlf_filter_fn(struct stream_filter *filter,
 
  was_cr = 0;
  output[o++] = ch;
+ if (lf_to_crlf->expanded_loneLF &&
+    lf_to_crlf->had_CRLF) {
+ /*
+ * Mixed EOL, round trip not possible.
+ */
+ return 1;
+ }
+
  }
 
  *osize_p -= o;
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index d0bee08..b5f93e2 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -39,7 +39,7 @@ test_expect_success 'default settings cause no changes' '
  test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
-test_expect_success 'crlf=true causes a CRLF file to be normalized' '
+test_expect_success 'crlf=true causes a CRLF file not to be normalized' '
 
  # Backwards compatibility check
  rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
@@ -49,10 +49,10 @@ test_expect_success 'crlf=true causes a CRLF file to be normalized' '
  # Note, "normalized" means that git will normalize it if added
  has_cr CRLFonly &&
  CRLFonlydiff=$(git diff CRLFonly) &&
- test -n "$CRLFonlydiff"
+ test -z "$CRLFonlydiff"
 '
 
-test_expect_success 'text=true causes a CRLF file to be normalized' '
+test_expect_success 'text=true causes a CRLF file not to be normalized' '
 
  rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
  echo "CRLFonly text" > .gitattributes &&
@@ -61,7 +61,7 @@ test_expect_success 'text=true causes a CRLF file to be normalized' '
  # Note, "normalized" means that git will normalize it if added
  has_cr CRLFonly &&
  CRLFonlydiff=$(git diff CRLFonly) &&
- test -n "$CRLFonlydiff"
+ test -z "$CRLFonlydiff"
 '
 
 test_expect_success 'eol=crlf gives a normalized file CRLFs with autocrlf=false' '
diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 86175cf..3fa9be4 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -354,7 +354,7 @@ else
  WAMIX=CRLF_LF
 fi
 
-#                         attr   LF        CRLF      CRLFmixLF LFmixCR   CRLFNUL
+#                               attr   LF        CRLF      CRLFmixLF LFmixCR   CRLFNUL
 test_expect_success 'commit files empty attr' '
  commit_check_warn false ""     ""        ""        ""        ""        "" &&
  commit_check_warn true  ""     "LF_CRLF" ""        "LF_CRLF" ""        "" &&
@@ -391,7 +391,7 @@ test_expect_success 'commit files attr=crlf' '
  commit_check_warn input "crlf" "LF_CRLF" ""        "LF_CRLF" "LF_CRLF" ""
 '
 
-#                 attr                    LF        CRLF      CRLFmixLF   LF_mix_CR   CRLFNUL
+#                 attr    aeol    ceol    LF        CRLF      CRLFmixLF   LF_mix_CR   CRLFNUL
 commit_chk_wrnNNO ""      ""      false   ""        ""        ""          ""          ""
 commit_chk_wrnNNO ""      ""      true    LF_CRLF   ""        ""          ""          ""
 commit_chk_wrnNNO ""      ""      input   ""        ""        ""          ""          ""
@@ -401,20 +401,22 @@ commit_chk_wrnNNO "auto"  ""      true    LF_CRLF   ""        ""          ""
 commit_chk_wrnNNO "auto"  ""      input   ""        ""        ""          ""          ""
 for crlf in true false input;
 do
+#                          attr aeol    ceol    LF        CRLF      CRLFmixLF   LF_mix_CR   CRLFNUL
  commit_chk_wrnNNO -text ""      $crlf   ""        ""        ""          ""          ""
  commit_chk_wrnNNO -text lf      $crlf   ""        ""        ""          ""          ""
  commit_chk_wrnNNO -text crlf    $crlf   ""        ""        ""          ""          ""
- commit_chk_wrnNNO ""    lf      $crlf   ""       CRLF_LF    CRLF_LF      ""         CRLF_LF
- commit_chk_wrnNNO ""    crlf    $crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
+ commit_chk_wrnNNO ""    lf      $crlf   ""        ""        ""          ""          ""
+ commit_chk_wrnNNO ""    crlf    $crlf   LF_CRLF   ""        ""          LF_CRLF     ""
  commit_chk_wrnNNO auto  lf     $crlf   ""        ""        ""          ""          ""
  commit_chk_wrnNNO auto  crlf   $crlf   LF_CRLF   ""        ""          ""          ""
- commit_chk_wrnNNO text  lf     $crlf   ""       CRLF_LF    CRLF_LF     ""          CRLF_LF
- commit_chk_wrnNNO text  crlf   $crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
+ commit_chk_wrnNNO text  lf     $crlf   ""        ""        ""          ""          ""
+ commit_chk_wrnNNO text  crlf   $crlf   LF_CRLF   ""        ""          LF_CRLF     ""
 done
 
-commit_chk_wrnNNO "text"  ""      false   "$WILC"   "$WICL"   "$WAMIX"    "$WILC"     "$WICL"
-commit_chk_wrnNNO "text"  ""      true    LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
-commit_chk_wrnNNO "text"  ""      input   ""        CRLF_LF   CRLF_LF     ""          CRLF_LF
+#                  attr   aeol    ceol    LF        CRLF      CRLFmixLF   LF_mix_CR   CRLFNUL
+commit_chk_wrnNNO "text"  ""      false   "$WILC"   ""        ""          "$WILC"     ""
+commit_chk_wrnNNO "text"  ""      true    LF_CRLF   ""        ""          LF_CRLF     ""
+commit_chk_wrnNNO "text"  ""      input   ""        ""        ""          ""          ""
 
 test_expect_success 'create files cleanup' '
  rm -f *.txt &&
@@ -455,9 +457,9 @@ do
  check_in_repo_NNO auto  ""     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
  check_in_repo_NNO auto  lf     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
  check_in_repo_NNO auto  crlf   $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
- check_in_repo_NNO text  ""     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
- check_in_repo_NNO text  lf     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
- check_in_repo_NNO text  crlf   $crlf   LF  LF    LF           LF_mix_CR  LF_nul
+ check_in_repo_NNO text  ""     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+ check_in_repo_NNO text  lf     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+ check_in_repo_NNO text  crlf   $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
 done
 ################################################################################
 # Check how files in the repo are changed when they are checked out
@@ -478,12 +480,10 @@ done
 
 if test_have_prereq NATIVE_CRLF
 then
-MIX_CRLF_LF=CRLF
 MIX_LF_CR=CRLF_mix_CR
 NL=CRLF
 LFNUL=CRLF_nul
 else
-MIX_CRLF_LF=CRLF_mix_LF
 MIX_LF_CR=LF_mix_CR
 NL=LF
 LFNUL=LF_nul
@@ -503,20 +503,20 @@ do
  checkout_files "-text" "$id" "lf"   "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
  checkout_files "-text" "$id" "crlf" "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
  checkout_files "text"  "$id" "lf"   "$crlf"  "$ceol"    LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
- checkout_files "text"  "$id" "crlf" "$crlf"  "$ceol"    CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ checkout_files "text"  "$id" "crlf" "$crlf"  "$ceol"    CRLF  CRLF  CRLF_mix_LF  CRLF_mix_CR  CRLF_nul
  done
  done
  # text: core.autocrlf=false uses core.eol
- checkout_files "text"  "$id"  ""    false  crlf     CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ checkout_files "text"  "$id"  ""    false  crlf     CRLF  CRLF  CRLF_mix_LF  CRLF_mix_CR  CRLF_nul
  checkout_files "text"  "$id"  ""    false  lf       LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
  # text: core.autocrlf=false and core.eol unset(or native) uses native eol
- checkout_files "text"  "$id"  ""    false  ""       $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
- checkout_files "text"  "$id"  ""    false  native   $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
+ checkout_files "text"  "$id"  ""    false  ""       $NL   CRLF  CRLF_mix_LF  $MIX_LF_CR   $LFNUL
+ checkout_files "text"  "$id"  ""    false  native   $NL   CRLF  CRLF_mix_LF  $MIX_LF_CR   $LFNUL
 
  for ceol in "" lf crlf native;
  do
  # text: core.autocrlf = true overrides core.eol
- checkout_files "text"  "$id"  ""    true   "$ceol"  CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+ checkout_files "text"  "$id"  ""    true   "$ceol"  CRLF  CRLF  CRLF_mix_LF  CRLF_mix_CR  CRLF_nul
  # text: core.autocrlf = input overrides core.eol
  checkout_files "text"  "$id"  ""    input  "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
  checkout_files "auto"  "$id"  ""    input  "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
--
2.8.0.rc2.6.g3847ccb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH v1 6/7] correct blame for files commited with CRLF

Torsten Bögershausen-2
In reply to this post by Junio C Hamano
From: Torsten Bögershausen <[hidden email]>

git blame reports lines as not "Not Committed Yet" when they have
CRLF in the index, CRLF in the worktree and e.g. core.autocrlf is true.

Since commit c48053 "new safer autocrlf handling", files that have CRLF
in the index are not normalized at commit when e.g. core.autocrl is set.

Whenever a CLRF conversion is needed (or any filter us set), load the
index temporally, before calling convert_to_git()

Signed-off-by: Torsten Bögershausen <[hidden email]>
---
 builtin/blame.c               |  6 +++++-
 convert.c                     | 10 ++++++++++
 convert.h                     |  1 +
 t/t8003-blame-corner-cases.sh | 14 ++++++++++++++
 4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index e982fb8..a219068 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2376,7 +2376,11 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
  if (strbuf_read(&buf, 0, 0) < 0)
  die_errno("failed to read from stdin");
  }
- convert_to_git(path, buf.buf, buf.len, &buf, 0);
+ if (convert_needs_conversion(path)) {
+ read_cache();
+ convert_to_git(path, buf.buf, buf.len, &buf, 0);
+ discard_cache();
+ }
  origin->file.ptr = buf.buf;
  origin->file.size = buf.len;
  pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_sha1);
diff --git a/convert.c b/convert.c
index 4ed5d89..02c50da 100644
--- a/convert.c
+++ b/convert.c
@@ -903,6 +903,16 @@ const char *get_convert_attr_ascii(const char *path)
  return "";
 }
 
+int convert_needs_conversion(const char *path)
+{
+ struct conv_attrs ca;
+
+ convert_attrs(&ca, path);
+ if (ca.drv || ca.ident || ca.crlf_action != CRLF_BINARY)
+ return 1;
+ return 0;
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
                    struct strbuf *dst, enum safe_crlf checksafe)
 {
diff --git a/convert.h b/convert.h
index ccf436b..ffd9c32 100644
--- a/convert.h
+++ b/convert.h
@@ -35,6 +35,7 @@ extern enum eol core_eol;
 extern const char *get_cached_convert_stats_ascii(const char *path);
 extern const char *get_wt_convert_stats_ascii(const char *path);
 extern const char *get_convert_attr_ascii(const char *path);
+int convert_needs_conversion(const char *path);
 
 /* returns 1 if *dst was used */
 extern int convert_to_git(const char *path, const char *src, size_t len,
diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
index 6568429..a9b266f 100755
--- a/t/t8003-blame-corner-cases.sh
+++ b/t/t8003-blame-corner-cases.sh
@@ -212,4 +212,18 @@ test_expect_success 'blame file with CRLF attributes text' '
  grep "A U Thor" actual
 '
 
+test_expect_success 'blame file with CRLF core.autocrlf=true' '
+ git config core.autocrlf false &&
+ printf "testcase\r\n" >crlfinrepo &&
+ >.gitattributes &&
+ git add crlfinrepo &&
+ git commit -m "add crlfinrepo" &&
+ git config core.autocrlf true &&
+ mv crlfinrepo tmp &&
+ git checkout crlfinrepo &&
+ rm tmp &&
+ git blame crlfinrepo >actual &&
+ grep "A U Thor" actual
+'
+
 test_done
--
2.8.0.rc2.6.g3847ccb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v1 1/7] Make it possible to get sha1 for a path from the index

Duy Nguyen
In reply to this post by Torsten Bögershausen-2
On Tue, Mar 29, 2016 at 8:25 PM,  <[hidden email]> wrote:
> From: Torsten Bögershausen <[hidden email]>
>
> Factor out the retrival of the sha1 for a given path in
> read_blob_data_from_index() into the function get_sha1_from_index().

Getting _sha1_ from index is one function call and a memory
dereference or two. I think you mean get sha1 _file_ (or data) from
index. Maybe put either word in the function name.

> This will be used in the next commit, when convert.c can do the analyze
> for "text=auto" without slurping the whole blob into memory at once.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v1 1/7] Make it possible to get sha1 for a path from the index

Duy Nguyen
On Tue, Mar 29, 2016 at 8:28 PM, Duy Nguyen <[hidden email]> wrote:
> On Tue, Mar 29, 2016 at 8:25 PM,  <[hidden email]> wrote:
>> From: Torsten Bögershausen <[hidden email]>
>>
>> Factor out the retrival of the sha1 for a given path in
>> read_blob_data_from_index() into the function get_sha1_from_index().
>
> Getting _sha1_ from index is one function call and a memory
> dereference or two. I think you mean get sha1 _file_ (or data) from
> index. Maybe put either word in the function name.

Oops, shouldn't have trusted that function name in the @@ line. No
what you write in the commit message matches the patch. Sorry for the
noise.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v1 1/7] Make it possible to get sha1 for a path from the index

Torsten Bögershausen-2
On 2016-03-29 15.31, Duy Nguyen wrote:

> On Tue, Mar 29, 2016 at 8:28 PM, Duy Nguyen <[hidden email]> wrote:
>> On Tue, Mar 29, 2016 at 8:25 PM,  <[hidden email]> wrote:
>>> From: Torsten Bögershausen <[hidden email]>
>>>
>>> Factor out the retrival of the sha1 for a given path in
>>> read_blob_data_from_index() into the function get_sha1_from_index().
>>
>> Getting _sha1_ from index is one function call and a memory
>> dereference or two. I think you mean get sha1 _file_ (or data) from
>> index. Maybe put either word in the function name.
>
> Oops, shouldn't have trusted that function name in the @@ line. No
> what you write in the commit message matches the patch. Sorry for the
> noise.
>
Thanks for the review.
It feels that the naming isn't ideal, let's see if we can find a better
function name.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
1234 ... 6