Tabs in commit messages - de-tabify option in strbuf_stripspace()?

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Linus Torvalds-3
So I end up doing this manually when I notice, but I was wondering ig
maybe git could just have an option to "git am" and friends to
de-tabify the commit message.

It's particularly noticeable when people line things up using tabs
(for the kernel, it's often things like "cpu1 does X, cpu2 does Y"),
and then when you do "git log" it looks like a unholy mess, because
the 4-char indentation of the log message ends up causing those things
to not line up at all after all.

The natural thing to do would be to pass in a "tab size" parameter to
strbuf_stripspace(), and default it to 0 (for no change), but have
some way to let people say "expand tabs to spaces at 8-character
tab-stops" or similar (but let people use different tab-stops if they
want).

Do people hate that idea? I may not get around to it for a while (it's
the kernel merge window right now), but I can write the patch
eventually - I just wanted to do an RFC first.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Stefan Beller-4
On Tue, Mar 15, 2016 at 5:16 PM, Linus Torvalds
<[hidden email]> wrote:
> Do people hate that idea? I may not get around to it for a while (it's
> the kernel merge window right now), but I can write the patch
> eventually - I just wanted to do an RFC first.

Could you point at some example to better understand the problem?

Thanks,
Stefan

>
>                 Linus
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

RE: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Randall S. Becker
In reply to this post by Linus Torvalds-3
On March 15, 2016 8:17 PM Linus Torvalds wrote:

> So I end up doing this manually when I notice, but I was wondering ig maybe
> git could just have an option to "git am" and friends to de-tabify the commit
> message.
>
> It's particularly noticeable when people line things up using tabs (for the
> kernel, it's often things like "cpu1 does X, cpu2 does Y"), and then when you
> do "git log" it looks like a unholy mess, because the 4-char indentation of the
> log message ends up causing those things to not line up at all after all.
>
> The natural thing to do would be to pass in a "tab size" parameter to
> strbuf_stripspace(), and default it to 0 (for no change), but have some way to
> let people say "expand tabs to spaces at 8-character tab-stops" or similar
> (but let people use different tab-stops if they want).
>
> Do people hate that idea? I may not get around to it for a while (it's the
> kernel merge window right now), but I can write the patch eventually - I just
> wanted to do an RFC first.

Speaking partly as a consumer of the comments and partly as someone who generates the commits through APIs, I would ask that the commit tab handling semantic be more formalized than just tab size to strbuf_stripspace(). While it might seem a bit unfair to have to worry about non-git git clients, the detabbing can impact the other commit implementers (e.g., SourceTree, EGit, JGit, and the raft of process automation bits out there using JGit for cool stuff). Personally, I would prefer to have a normalized behaviour so that any bit of automation building a commit message would have a specific definition to go to (and hopefully comply with) in order to properly format the message for posterity and across all consumers. It might also be useful to have some ability to be presentation-compatible with legacy commits (done after this type of enhancement) so that a reasonable presentation can be done for those 8 year old commits that still have embedded tabs. Personally, I don't encourage tabs in commits myself and do see the value of this, but is this really restricted just to git am?

Just my $0.02,

Randall

-- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400)/NonStop(211288444200000000)
-- In my real life, I talk too much.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Junio C Hamano
In reply to this post by Linus Torvalds-3
Linus Torvalds <[hidden email]> writes:

> Do people hate that idea? I may not get around to it for a while (it's
> the kernel merge window right now), but I can write the patch
> eventually - I just wanted to do an RFC first.

Wouldn't it be nicer to do this kind of thing at the output side?  A
stupid way would be to have an option to indent the log text with a
tab instead of 4-spaces, but a more sensible way would be to keep
the visual 4-space-indent and do the expand-tab for each line of
text.

That way, your viewing of existing commits that use 8-space HT to
align (and worse yet, mixture of 8-space HT and 8 spaces and assume
the end result would align in the output) would become more pleasant
without you having to run filter-branch ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Linus Torvalds-3
In reply to this post by Stefan Beller-4
On Tue, Mar 15, 2016 at 5:23 PM, Stefan Beller <[hidden email]> wrote:
>
> Could you point at some example to better understand the problem?

So in the kernel repo, I just randomly looked for tabs that show this
problem, and take for example commit
ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd.

You can see how the original lined up, by doing

    git show --pretty=email ff9a9b4c4334

because the email format doesn't indent the commit message. But if you do just

    git show ff9a9b4c4334

and get the usual indentation, you'll see things not line up at all.

In case you don't want to bother with the kernel repo, here's what it
looks like:

email format:

--- snip snip 8< ---
A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.

Run times for the microbenchmark:

 4.4                            3.8 seconds
 4.5-rc1                        3.7 seconds
 4.5-rc1 + first patch          3.3 seconds
 4.5-rc1 + first 3 patches      3.1 seconds
 4.5-rc1 + all patches          2.3 seconds

A non-NOHZ_FULL cpu (not the housekeeping CPU):

 all kernels                    1.86 seconds
--- snip snip 8< ---

Normal "git show" format:

--- snip snip 8< ---
    A microbenchmark calling an invalid syscall number 10 million
    times in a row speeds up an additional 30% over the numbers
    with just the previous patches, for a total speedup of about
    40% over 4.4 and 4.5-rc1.

    Run times for the microbenchmark:

     4.4                                3.8 seconds
     4.5-rc1                    3.7 seconds
     4.5-rc1 + first patch              3.3 seconds
     4.5-rc1 + first 3 patches  3.1 seconds
     4.5-rc1 + all patches              2.3 seconds

    A non-NOHZ_FULL cpu (not the housekeeping CPU):

     all kernels                        1.86 seconds
--- snip snip 8< ---

which hopefully clarifies.

In the above case, it really isn't very annoying. It's just slightly
ugly. In some other cases, it can get quite hard to see what's up, but
the ones that come through me I actually tend to try to edit, so many
of them have been corrected.

For other examples (again, in the kernel), look at 19b2c30d3cce, or
0dc8c730c98a.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Linus Torvalds-3
In reply to this post by Junio C Hamano
On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamano <[hidden email]> wrote:
>
> Wouldn't it be nicer to do this kind of thing at the output side?  A
> stupid way would be to have an option to indent the log text with a
> tab instead of 4-spaces, but a more sensible way would be to keep
> the visual 4-space-indent and do the expand-tab for each line of
> text.

Yes, that would certainly work for me too. It's just the "ascii boxes
don't line up" that is problematic..

(Also people including example source code etc, where the first tab
looks wildly different from the second one)

            Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Stefan Beller-4
On Tue, Mar 15, 2016 at 5:48 PM, Linus Torvalds
<[hidden email]> wrote:

> On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamano <[hidden email]> wrote:
>>
>> Wouldn't it be nicer to do this kind of thing at the output side?  A
>> stupid way would be to have an option to indent the log text with a
>> tab instead of 4-spaces, but a more sensible way would be to keep
>> the visual 4-space-indent and do the expand-tab for each line of
>> text.
>
> Yes, that would certainly work for me too. It's just the "ascii boxes
> don't line up" that is problematic..

I would also rather side to correct the display side rather than the
applying side. [I still want to send and apply patches with git written in
the white space language ;]

Instead of converting to whitespaces in Git, we could make use of the
set_tabs capability for ttys and setup the terminal for having tabs align
to 12,+8,+8,+8... such that it looks like an indented terminal.
That way we would also preserve the tabs in case you'd want to
copy the message as is.

Thanks,
Stefan

>
> (Also people including example source code etc, where the first tab
> looks wildly different from the second one)
>
>             Linus
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Stefan Beller-4
On Tue, Mar 15, 2016 at 6:00 PM, Stefan Beller <[hidden email]> wrote:
>
> Instead of converting to whitespaces in Git, we could make use of the
> set_tabs capability for ttys and setup the terminal for having tabs align
> to 12,+8,+8,+8...

Or rather read in the existing tabs configuration and shift it by a constant.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Linus Torvalds-3
In reply to this post by Linus Torvalds-3
On Tue, Mar 15, 2016 at 5:48 PM, Linus Torvalds
<[hidden email]> wrote:

> On Tue, Mar 15, 2016 at 5:45 PM, Junio C Hamano <[hidden email]> wrote:
>>
>> Wouldn't it be nicer to do this kind of thing at the output side?  A
>> stupid way would be to have an option to indent the log text with a
>> tab instead of 4-spaces, but a more sensible way would be to keep
>> the visual 4-space-indent and do the expand-tab for each line of
>> text.
>
> Yes, that would certainly work for me too. It's just the "ascii boxes
> don't line up" that is problematic..
Ok, that actually turns out to be pretty easy.

Here's a first try at it. It does tab expansion only for the cases
that indent the commit message, so for things like "pretty=email" it
makes no difference at all.

However, it hardcodes the hardtab size to 8, and makes this all
unconditional. That's how a vt100 terminal works, but fancer terminals
may allow you set actual tab-stops, and if you're in emacs you can
probably do just about anything)

Comments? This does make the kernel commit 0dc8c730c98a look fine, so
it would make me happy.

I can write a commit log etc if people think this is ok, but right now
this is just a silly raw patch in the attachement..

               Linus

patch.diff (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Junio C Hamano
Linus Torvalds <[hidden email]> writes:

> Here's a first try at it. It does tab expansion only for the cases
> that indent the commit message, so for things like "pretty=email" it
> makes no difference at all.

It also ignores that byte counts of non-HT bytes may not necessarily
match display columns.  There is utf8_width() function exported from
utf8.c for that purpose.

I think it is fine to make it default for the pretty=medium, but it
would be nicer to introduce a new command line option --no-untabify
to optionally disable the expansion.

>  pretty.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/pretty.c b/pretty.c
> index 92b2870a7eab..dcd6105d1eb2 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1652,8 +1652,26 @@ void pp_remainder(struct pretty_print_context *pp,
>   first = 0;
>  
>   strbuf_grow(sb, linelen + indent + 20);
> - if (indent)
> + if (indent) {
> + int i, last = 0, pos = 0;
> +
>   strbuf_addchars(sb, ' ', indent);
> + for (i = 0; i < linelen; i++) {
> + int expand;
> + if (line[i] != '\t')
> + continue;
> + strbuf_add(sb, line+last, i-last);
> + pos += i-last;
> + expand = (pos + 8) & ~7;
> + strbuf_addchars(sb, ' ', expand - pos);
> + pos = expand;
> + last = i+1;
> + }
> +
> + // Handle the tail non-tab content
> + line += last;
> + linelen -= last;
> + }
>   strbuf_add(sb, line, linelen);
>   strbuf_addch(sb, '\n');
>   }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Linus Torvalds-3
On Tue, Mar 15, 2016 at 9:46 PM, Junio C Hamano <[hidden email]> wrote:
>
> It also ignores that byte counts of non-HT bytes may not necessarily
> match display columns.  There is utf8_width() function exported from
> utf8.c for that purpose.

Hmm. I did that to now make it horribly slower. Doing the
per-character widths really does horrible things for performance.

That said, I think I can make it do the fast thing for lines that have
no tabs at all, which is the big bulk of it. So it would do the width
calculations only in the rare "yes, I found a tab" case.

I already wrote it in a way that non-tab lines end up just looping
over the line looking for a tab and then fall back to the old code.

I might just have to make that behavior a bit more explicit. Let me
stare at it a bit, but it won't happen until tomorrow, I think.

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Jeff King
On Tue, Mar 15, 2016 at 09:57:16PM -0700, Linus Torvalds wrote:

> On Tue, Mar 15, 2016 at 9:46 PM, Junio C Hamano <[hidden email]> wrote:
> >
> > It also ignores that byte counts of non-HT bytes may not necessarily
> > match display columns.  There is utf8_width() function exported from
> > utf8.c for that purpose.
>
> Hmm. I did that to now make it horribly slower. Doing the
> per-character widths really does horrible things for performance.
>
> That said, I think I can make it do the fast thing for lines that have
> no tabs at all, which is the big bulk of it. So it would do the width
> calculations only in the rare "yes, I found a tab" case.
>
> I already wrote it in a way that non-tab lines end up just looping
> over the line looking for a tab and then fall back to the old code.
>
> I might just have to make that behavior a bit more explicit. Let me
> stare at it a bit, but it won't happen until tomorrow, I think.

I wondered about performance when reading yours, and measured a straight
"git log >/dev/null" on the kernel at about 3% slower (best-of-five and
average). We can get most of that back by sticking a

    memchr(line, '\t', linelen);

in front and skipping the loop if it returns NULL. You can also
implement your loop in terms of memchr() calls to skip to the next tab,
but I don't know if it's worth it. It's really only worth spending time
optimizing the non-tab case (and even then, only worth it for trivial
things like "use the already optimized-as-hell memchr").

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Marc Branchaud
In reply to this post by Stefan Beller-4
On 16-03-15 09:02 PM, Stefan Beller wrote:
> On Tue, Mar 15, 2016 at 6:00 PM, Stefan Beller <[hidden email]> wrote:
>>
>> Instead of converting to whitespaces in Git, we could make use of the
>> set_tabs capability for ttys and setup the terminal for having tabs align
>> to 12,+8,+8,+8...
>
> Or rather read in the existing tabs configuration and shift it by a constant.

Could this also help with diff output, where the leading + or - mars the
indentation in a similar way?

That opens a bit of a deeper problem, because not all the files in a single
repo necessarily use the same tab size.

                M.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Duy Nguyen
On Wed, Mar 16, 2016 at 9:27 PM, Marc Branchaud <[hidden email]> wrote:

> On 16-03-15 09:02 PM, Stefan Beller wrote:
>> On Tue, Mar 15, 2016 at 6:00 PM, Stefan Beller <[hidden email]> wrote:
>>>
>>> Instead of converting to whitespaces in Git, we could make use of the
>>> set_tabs capability for ttys and setup the terminal for having tabs align
>>> to 12,+8,+8,+8...
>>
>> Or rather read in the existing tabs configuration and shift it by a constant.
>
> Could this also help with diff output, where the leading + or - mars the
> indentation in a similar way?

Yes, please! Even if we only activate this when stdout is tty. Much
better output.

> That opens a bit of a deeper problem, because not all the files in a single
> repo necessarily use the same tab size.

Maybe we can pass --tab-width=xx to git?
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Linus Torvalds-3
In reply to this post by Marc Branchaud
On Wed, Mar 16, 2016 at 7:27 AM, Marc Branchaud <[hidden email]> wrote:
>
> Could this also help with diff output, where the leading + or - mars the
> indentation in a similar way?

I don't think that's a good idea at least by default, since then it
will break things like running diff in emacs capture mode.

So even if you're in a terminal, you can't assume that you can munge
the output too much.

Of course, if colorization is on, you might as well pretty-print the
diff by indenting things properly too, since the end result isn't
going to be used as a _diff_.

              Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Junio C Hamano
Linus Torvalds <[hidden email]> writes:

> On Wed, Mar 16, 2016 at 7:27 AM, Marc Branchaud <[hidden email]> wrote:
>>
>> Could this also help with diff output, where the leading + or - mars the
>> indentation in a similar way?
>
> I don't think that's a good idea at least by default, since then it
> will break things like running diff in emacs capture mode.
>
> So even if you're in a terminal, you can't assume that you can munge
> the output too much.
>
> Of course, if colorization is on, you might as well pretty-print the
> diff by indenting things properly too, since the end result isn't
> going to be used as a _diff_.

I agree that I will never use such an end result as a diff, but I
may still be tempted to cut-and-paste individual lines after '^+'
when resurrecting a WIP topic that does not rebase very well, so I
agree with you that the output shouldn't be munged by default even
though I think it is OK to have an option..


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Matthieu Moy-2
In reply to this post by Marc Branchaud
Marc Branchaud <[hidden email]> writes:

> On 16-03-15 09:02 PM, Stefan Beller wrote:
>> On Tue, Mar 15, 2016 at 6:00 PM, Stefan Beller <[hidden email]> wrote:
>>>
>>> Instead of converting to whitespaces in Git, we could make use of the
>>> set_tabs capability for ttys and setup the terminal for having tabs align
>>> to 12,+8,+8,+8...
>>
>> Or rather read in the existing tabs configuration and shift it by a constant.
>
> Could this also help with diff output, where the leading + or - mars the
> indentation in a similar way?

Hardly: the same pager would be used to display the commit message (with
a 4-space prefix) and the diff (with a 1-char +/- prefix). So the
offsets in these two places would need to be different, and I don't
think we can change that on the fly.

--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html