[Bug?] log -p -W showing the whole file for a patch that adds to the end?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug?] log -p -W showing the whole file for a patch that adds to the end?

Junio C Hamano
Just noticed a curiosity:

    $ git show -W 3e3ceaa58 quote.c

shows the entire file.  The commit in question adds a whole new
function at the end of the file.  If I move that addition to just
before the last function the file already had before the change and
amend the commit, "show -W" would work as expected, i.e. addition of
the function, with three lines before and three lines after context.

The -W feature was added by 14937c2c (diff: add option to show whole
functions as context, 2011-10-09).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Bug?] log -p -W showing the whole file for a patch that adds to the end?

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> Just noticed a curiosity:
>
>     $ git show -W 3e3ceaa58 quote.c
>
> shows the entire file.  The commit in question adds a whole new
> function at the end of the file.  If I move that addition to just
> before the last function the file already had before the change and
> amend the commit, "show -W" would work as expected, i.e. addition of
> the function, with three lines before and three lines after context.
>
> The -W feature was added by 14937c2c (diff: add option to show whole
> functions as context, 2011-10-09).

A workaround (atEnd) seems to give me a slightly better result, but
I am not sure if this breaks other corner cases.

The real problem is that get_func_line() is called with (start,
limit) but a hunk that adds to the end does not even enter the loop
that goes back (i.e. step = -1) to find the funcline:

        for (l = start; l != limit && 0 <= l && l < xe->xdf1.nrec; l += step) {
                const char *rec;
                long reclen = xdl_get_rec(&xe->xdf1, l, &rec);
                long len = ff(rec, reclen, buf, size, xecfg->find_func_priv);

because start == xe->xdf1.nrec in that case.  We could instead force
it to enter the loop by decrementing start when it is xe->xdf1.nrec
and we are going down, but that would show two functions in the
resulting hunk (one that is the existing function at the end of the
file, the other that is added by the patch), which is not better than
showing the patch unmodified.

-- >8 --
Subject: diff -W: do not show the whole file when punting

The -W option to "diff" family of commands allows the pre- and post-
context of hunks extended to cover the previous and the next
"function header line", so that the whole function that is patched
can be seen in the hunk.

The helper function get_func_line() however gets confused when a
hunk adds a new function at the very end, and returns -1 to signal
that it did not find a suitable "function header line", i.e. the
beginning of previous function.  The caller then takes this signal
and shows from the very beginning of the file.  We end up showing
the entire file, starting from the very beginning to the end of the
newly added lines.

We can avoid this by using the original hunk boundary when the
helper gives up.

Signed-off-by: Junio C Hamano <[hidden email]>
---
 xdiff/xemit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 993724b..4e58482 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -166,7 +166,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
  if (xecfg->flags & XDL_EMIT_FUNCCONTEXT) {
  long fs1 = get_func_line(xe, xecfg, NULL, xch->i1, -1);
  if (fs1 < 0)
- fs1 = 0;
+ fs1 = xch->i1;
  if (fs1 < s1) {
  s2 -= s1 - fs1;
  s1 = fs1;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Bug?] log -p -W showing the whole file for a patch that adds to the end?

René Scharfe-2
Am 11.05.2016 um 00:51 schrieb Junio C Hamano:

> The helper function get_func_line() however gets confused when a
> hunk adds a new function at the very end, and returns -1 to signal
> that it did not find a suitable "function header line", i.e. the
> beginning of previous function.  The caller then takes this signal
> and shows from the very beginning of the file.  We end up showing
> the entire file, starting from the very beginning to the end of the
> newly added lines.
>
> We can avoid this by using the original hunk boundary when the
> helper gives up.

It's a more conservative fallback in this case, but it regresses e.g. if
you have a long list of includes at the top of a C file -- it won't show
the full list anymore with -W.

The problem is that we are trying to access lines in the preimage that
are only actually in the postimage.  Simply switching to useing the
postimage unconditionally is not enough -- we'd have the same problem
with the complementary case of lines at the end being removed.

I wonder if using the postimage as a fallback suffices.  Need to look at
this more closely.

René
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Bug?] log -p -W showing the whole file for a patch that adds to the end?

René Scharfe-2
In reply to this post by Junio C Hamano
Am 11.05.2016 um 00:51 schrieb Junio C Hamano:
> The helper function get_func_line() however gets confused when a
> hunk adds a new function at the very end, and returns -1 to signal
> that it did not find a suitable "function header line", i.e. the
> beginning of previous function.  The caller then takes this signal
> and shows from the very beginning of the file.  We end up showing
> the entire file, starting from the very beginning to the end of the
> newly added lines.

In this case we need to look at the added lines in the post-image to
see if the original context suffices.  We currently only look at the
pre-image.  And if we have to extend the context simply search from the
bottom of the pre-image.  That's what the second patch does; the first
one is just a small preparation.

The last three patches introduce special handling of empty lines.
Before them the code for -W only distinguished between function lines
and non-function lines.  Not allowing empty lines between functions to
extend the context with -W is most useful in the case of appended full
functions -- otherwise we'd get the preceding function shown as well as
it "belongs" to the empty line separating them.

And if we do that then it's easy and more consistent to stop showing
empty lines trailing functions with -W (unless they're already included
in the one requested with -u/-U, of course).  Doing the same for grep
then is only fair.

Considering empty lines as uninteresting collides with languages like
Whitespace.  I'm not sure -W was useful for it to begin with, though.
Any other possible downsides?

NB: Comments are still not handled specially.  That means they are
considered to be part of the preceding function.  Fixing that is not in
scope for this series.  (And I'm not sure it would be worth the hassle.)

  diff: factor out match_func_rec()
  diff: handle appended chunks better with -W
  diff: ignore empty lines before added functions with -W
  diff: don't include common trailing empty lines with -W
  grep: don't extend context to trailing empty lines with -W

 grep.c        | 28 ++++++++++++++++++++++++--
 xdiff/xemit.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 82 insertions(+), 9 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 1/5] diff: factor out match_func_rec()

René Scharfe-2
Add match_func_rec(), a helper that wraps accessing a record and calling
the appropriate function for checking if it contains a function line.

Signed-off-by: Rene Scharfe <[hidden email]>
---
We'll use it in the next patch.  A follow-up patch could inline def_ff()
into it.

 xdiff/xemit.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 993724b..0c87637 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -120,6 +120,16 @@ static long def_ff(const char *rec, long len, char *buf, long sz, void *priv)
  return -1;
 }
 
+static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri,
+   char *buf, long sz)
+{
+ const char *rec;
+ long len = xdl_get_rec(xdf, ri, &rec);
+ if (!xecfg->find_func)
+ return def_ff(rec, len, buf, sz, xecfg->find_func_priv);
+ return xecfg->find_func(rec, len, buf, sz, xecfg->find_func_priv);
+}
+
 struct func_line {
  long len;
  char buf[80];
@@ -128,7 +138,6 @@ struct func_line {
 static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg,
   struct func_line *func_line, long start, long limit)
 {
- find_func_t ff = xecfg->find_func ? xecfg->find_func : def_ff;
  long l, size, step = (start > limit) ? -1 : 1;
  char *buf, dummy[1];
 
@@ -136,9 +145,7 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg,
  size = func_line ? sizeof(func_line->buf) : sizeof(dummy);
 
  for (l = start; l != limit && 0 <= l && l < xe->xdf1.nrec; l += step) {
- const char *rec;
- long reclen = xdl_get_rec(&xe->xdf1, l, &rec);
- long len = ff(rec, reclen, buf, size, xecfg->find_func_priv);
+ long len = match_func_rec(&xe->xdf1, xecfg, l, buf, size);
  if (len >= 0) {
  if (func_line)
  func_line->len = len;
--
2.8.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 2/5] diff: handle appended chunks better with -W

René Scharfe-2
In reply to this post by René Scharfe-2
If lines are added at the end of a file, diff -W shows the whole file.
That's because get_func_line() only considers the pre-image and gives up
if it sees a record index beyond its end.

Consider the post-image as well to see if the added lines already make
up a full function.  If it doesn't then search for the previous function
line by starting from the bottom of the pre-image, thereby avoiding to
confuse get_func_line().

Reuse the existing label called "again", as it's exactly where we need
to jump to when we're done handling the pre-context, but rename it to
"post_context_calculation" in order to document its new purpose better.

Reported-by: Junio C Hamano <[hidden email]>
Initial-patch-by: Junio C Hamano <[hidden email]>
Signed-off-by: Rene Scharfe <[hidden email]>
---
 xdiff/xemit.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 0c87637..969100d 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -171,7 +171,28 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
  s2 = XDL_MAX(xch->i2 - xecfg->ctxlen, 0);
 
  if (xecfg->flags & XDL_EMIT_FUNCCONTEXT) {
- long fs1 = get_func_line(xe, xecfg, NULL, xch->i1, -1);
+ long fs1, i1 = xch->i1;
+
+ /* Appended chunk? */
+ if (i1 >= xe->xdf1.nrec) {
+ char dummy[1];
+
+ /*
+ * We don't need additional context if
+ * a whole function was added.
+ */
+ if (match_func_rec(&xe->xdf2, xecfg, xch->i2,
+   dummy, sizeof(dummy)) >= 0)
+ goto post_context_calculation;
+
+ /*
+ * Otherwise get more context from the
+ * pre-image.
+ */
+ i1 = xe->xdf1.nrec - 1;
+ }
+
+ fs1 = get_func_line(xe, xecfg, NULL, i1, -1);
  if (fs1 < 0)
  fs1 = 0;
  if (fs1 < s1) {
@@ -180,7 +201,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
  }
  }
 
- again:
+ post_context_calculation:
  lctx = xecfg->ctxlen;
  lctx = XDL_MIN(lctx, xe->xdf1.nrec - (xche->i1 + xche->chg1));
  lctx = XDL_MIN(lctx, xe->xdf2.nrec - (xche->i2 + xche->chg2));
@@ -209,7 +230,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
  if (l <= e1 ||
     get_func_line(xe, xecfg, NULL, l, e1) < 0) {
  xche = xche->next;
- goto again;
+ goto post_context_calculation;
  }
  }
  }
--
2.8.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3/5] diff: ignore empty lines before added functions with -W

René Scharfe-2
In reply to this post by René Scharfe-2
If a new function and a preceding empty line is appended, diff -W shows
the previous function in full in order to provide context for that empty
line.  In most languages empty lines between sections are not
interesting in and off themselves and showing a whole extra function for
them is not what we want.

Skip empty lines when checking of the appended chunk starts with a
function line, thereby avoiding to extend the context just for them.

Signed-off-by: Rene Scharfe <[hidden email]>
---
 xdiff/xemit.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 969100d..087956a 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -155,6 +155,18 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg,
  return -1;
 }
 
+static int is_empty_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri)
+{
+ const char *rec;
+ long len = xdl_get_rec(xdf, ri, &rec);
+
+ while (len > 0 && isspace(*rec)) {
+ rec++;
+ len--;
+ }
+ return !len;
+}
+
 int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
   xdemitconf_t const *xecfg) {
  long s1, s2, e1, e2, lctx;
@@ -176,12 +188,18 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
  /* Appended chunk? */
  if (i1 >= xe->xdf1.nrec) {
  char dummy[1];
+ long i2 = xch->i2;
 
  /*
  * We don't need additional context if
- * a whole function was added.
+ * a whole function was added, possibly
+ * starting with empty lines.
  */
- if (match_func_rec(&xe->xdf2, xecfg, xch->i2,
+ while (i2 < xe->xdf2.nrec &&
+       is_empty_rec(&xe->xdf2, xecfg, i2))
+ i2++;
+ if (i2 < xe->xdf2.nrec &&
+    match_func_rec(&xe->xdf2, xecfg, i2,
    dummy, sizeof(dummy)) >= 0)
  goto post_context_calculation;
 
--
2.8.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 4/5] diff: don't include common trailing empty lines with -W

René Scharfe-2
In reply to this post by René Scharfe-2
Empty lines between functions are shown by diff -W, as it considers them
to be part of the function preceding them.  They are not interesting in
most languages.  The previous patch stopped showing them in the special
case of a function added at the end of a file.

Stop extending context to those empty lines by skipping back over them
from the start of the next function.

Signed-off-by: Rene Scharfe <[hidden email]>
---
 xdiff/xemit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 087956a..d0c0738 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -231,6 +231,9 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
  long fe1 = get_func_line(xe, xecfg, NULL,
  xche->i1 + xche->chg1,
  xe->xdf1.nrec);
+ while (fe1 > 0 &&
+       is_empty_rec(&xe->xdf1, xecfg, fe1 - 1))
+ fe1--;
  if (fe1 < 0)
  fe1 = xe->xdf1.nrec;
  if (fe1 > e1) {
--
2.8.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 5/5] grep: don't extend context to trailing empty lines with -W

René Scharfe-2
In reply to this post by René Scharfe-2
Empty lines between functions are shown by grep -W, as it considers them
to be part of the function preceding them.  They are not interesting in
most languages.  The previous patches stopped showing them for diff -W.

Stop showing empty lines trailing a function with grep -W.  Grep scans
the lines of a buffer from top to bottom and prints matching lines
immediately.  Thus we need to peek ahead in order to determine if an
empty line is part of a function body and worth showing or not.

Remember how far ahead we peeked in order to avoid having to do so
repeatedly when handling multiple consecutive empty lines.

Signed-off-by: Rene Scharfe <[hidden email]>
---
 grep.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/grep.c b/grep.c
index ec6f7ff..1e15b62 100644
--- a/grep.c
+++ b/grep.c
@@ -1396,9 +1396,17 @@ static int fill_textconv_grep(struct userdiff_driver *driver,
  return 0;
 }
 
+static int is_empty_line(const char *bol, const char *eol)
+{
+ while (bol < eol && isspace(*bol))
+ bol++;
+ return bol == eol;
+}
+
 static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int collect_hits)
 {
  char *bol;
+ char *peek_bol = NULL;
  unsigned long left;
  unsigned lno = 1;
  unsigned last_hit = 0;
@@ -1543,8 +1551,24 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
  show_function = 1;
  goto next_line;
  }
- if (show_function && match_funcname(opt, gs, bol, eol))
- show_function = 0;
+ if (show_function && (!peek_bol || peek_bol < bol)) {
+ unsigned long peek_left = left;
+ char *peek_eol = eol;
+
+ /*
+ * Trailing empty lines are not interesting.
+ * Peek past them to see if they belong to the
+ * body of the current function.
+ */
+ peek_bol = bol;
+ while (is_empty_line(peek_bol, peek_eol)) {
+ peek_bol = peek_eol + 1;
+ peek_eol = end_of_line(peek_bol, &peek_left);
+ }
+
+ if (match_funcname(opt, gs, peek_bol, peek_eol))
+ show_function = 0;
+ }
  if (show_function ||
     (last_hit && lno <= last_hit + opt->post_context)) {
  /* If the last hit is within the post context,
--
2.8.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Bug?] log -p -W showing the whole file for a patch that adds to the end?

Junio C Hamano
In reply to this post by René Scharfe-2
René Scharfe <[hidden email]> writes:

>   diff: factor out match_func_rec()
>   diff: handle appended chunks better with -W
>   diff: ignore empty lines before added functions with -W
>   diff: don't include common trailing empty lines with -W
>   grep: don't extend context to trailing empty lines with -W
>
>  grep.c        | 28 ++++++++++++++++++++++++--
>  xdiff/xemit.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 82 insertions(+), 9 deletions(-)

Thanks.  The flow of reasoning in this cover letter was written very
well and I think I can agree with all of them.

It is curious that this much behaviour change does not need any
changes in the test scripts.  We do not have sufficient coverage,
perhaps?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Bug?] log -p -W showing the whole file for a patch that adds to the end?

René Scharfe-2
Am 24.05.2016 um 20:16 schrieb Junio C Hamano:

> René Scharfe <[hidden email]> writes:
>
>>    diff: factor out match_func_rec()
>>    diff: handle appended chunks better with -W
>>    diff: ignore empty lines before added functions with -W
>>    diff: don't include common trailing empty lines with -W
>>    grep: don't extend context to trailing empty lines with -W
>>
>>   grep.c        | 28 ++++++++++++++++++++++++--
>>   xdiff/xemit.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>>   2 files changed, 82 insertions(+), 9 deletions(-)

> It is curious that this much behaviour change does not need any
> changes in the test scripts.  We do not have sufficient coverage,
> perhaps?

Well, -W is not tested at all.  I'll include some in the next round.  No
need to hurry, it's too late to land in 2.9.0 anyway.

René
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Bug?] log -p -W showing the whole file for a patch that adds to the end?

Junio C Hamano
René Scharfe <[hidden email]> writes:

> Am 24.05.2016 um 20:16 schrieb Junio C Hamano:
>> René Scharfe <[hidden email]> writes:
>>
>>>    diff: factor out match_func_rec()
>>>    diff: handle appended chunks better with -W
>>>    diff: ignore empty lines before added functions with -W
>>>    diff: don't include common trailing empty lines with -W
>>>    grep: don't extend context to trailing empty lines with -W
>>>
>>>   grep.c        | 28 ++++++++++++++++++++++++--
>>>   xdiff/xemit.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>>>   2 files changed, 82 insertions(+), 9 deletions(-)
>
>> It is curious that this much behaviour change does not need any
>> changes in the test scripts.  We do not have sufficient coverage,
>> perhaps?
>
> Well, -W is not tested at all.  I'll include some in the next round.
> No need to hurry, it's too late to land in 2.9.0 anyway.

True.

I'd say that these patches are fine as they are, and follow-up patch
for adding -W tests (instead of rerolling them) is sufficient,
though.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...