[PATCH] userdiff: add built-in pattern for CSS

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH] userdiff: add built-in pattern for CSS

William Duclot
CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Signed-off-by: William Duclot <[hidden email]>
Signed-off-by: Matthieu Moy <[hidden email]>
---
 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh        |  1 +
 t/t4018/css-rule                |  4 ++++
 t/t4034-diff-words.sh           |  1 +
 t/t4034/css/expect              | 16 ++++++++++++++++
 t/t4034/css/post                |  9 +++++++++
 t/t4034/css/pre                 |  9 +++++++++
 userdiff.c                      |  8 ++++++++
 8 files changed, 50 insertions(+)
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
  bibtex
  cpp
  csharp
+ css
  fortran
  fountain
  html
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 0000000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect b/t/t4034/css/expect
new file mode 100644
index 0000000..b025d88
--- /dev/null
+++ b/t/t4034/css/expect
@@ -0,0 +1,16 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 735f301..bdf6a90 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,9 +1,9 @@<RESET>
+.<RED>class-form<RESET><GREEN>other-form<RESET> label.control-label {
+    margin-top: <RED>10<RESET><GREEN>15<RESET>px!important;
+    border : 10px <RED>dashed<RESET><GREEN>dotted<RESET> #C6C6C6;
+}<RESET>
+<RED>#CCCCCC<RESET>
+<RED>padding-bottom<RESET><GREEN>#CCCCCB<RESET>
+<GREEN>margin-left<RESET>
+150<RED>px<RESET><GREEN>em<RESET>
+10px
+<RED>!important<RESET>
+<RED>div<RESET><GREEN>li<RESET>.class#id
diff --git a/t/t4034/css/post b/t/t4034/css/post
new file mode 100644
index 0000000..bdf6a90
--- /dev/null
+++ b/t/t4034/css/post
@@ -0,0 +1,9 @@
+.other-form label.control-label {
+    margin-top: 15px!important;
+    border : 10px dotted #C6C6C6;
+}
+#CCCCCB
+margin-left
+150em
+10px
+li.class#id
diff --git a/t/t4034/css/pre b/t/t4034/css/pre
new file mode 100644
index 0000000..735f301
--- /dev/null
+++ b/t/t4034/css/pre
@@ -0,0 +1,9 @@
+.class-form label.control-label {
+    margin-top: 10px!important;
+    border : 10px dashed #C6C6C6;
+}
+#CCCCCC
+padding-bottom
+150px
+10px!important
+div.class#id
diff --git a/userdiff.c b/userdiff.c
index 6bf2505..0f9cfbe 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -148,6 +148,14 @@ PATTERNS("csharp",
  "[a-zA-Z_][a-zA-Z0-9_]*"
  "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
  "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+PATTERNS("css",
+ "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
+ /* -- */
+ /* This regex comes from W3C CSS specs. Should theoretically also allow ISO 10646 characters U+00A0 and higher,
+  * this not handled in this regex. */
+ "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
+ "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
+),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
--
2.8.2.403.gdf0b511.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] userdiff: add built-in pattern for CSS

Junio C Hamano
William Duclot <[hidden email]> writes:

> CSS is widely used, motivating it being included as a built-in pattern.
>
> It must be noted that the word_regex for CSS (i.e. the regex defining
> what is a word in the language) does not consider '.' and '#' characters
> (in CSS selectors) to be part of the word. This behavior is documented
> by the test t/t4018/css-rule.
> The logic behind this behavior is the following: identifiers in CSS
> selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
> character are not part of the identifier, but an indicator of the nature
> of the identifier in HTML/XML (class or id). Diffing ".class1" and
> ".class2" must show that the class name is changed, but we still are
> selecting a class.

In other words, if "div#foo" changed to "span#bar", word-diff would
say that "div changed to span, # didn't change and foo changed to
bar".

Which makes sense to me.

The above is not a request to change anything; just me thinking
aloud to see if I agree with the reasoning.

> diff --git a/userdiff.c b/userdiff.c
> index 6bf2505..0f9cfbe 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -148,6 +148,14 @@ PATTERNS("csharp",
>   "[a-zA-Z_][a-zA-Z0-9_]*"
>   "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>   "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
> +PATTERNS("css",
> + "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
> + /* -- */
> + /* This regex comes from W3C CSS specs. Should theoretically also allow ISO 10646 characters U+00A0 and higher,
> +  * this not handled in this regex. */
> + "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
> + "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
> +),

Style:

        /*
         * This regex comes from ...
         * ...
         * but they are not handled with this regex.
         */

I wonder if IPATTERN() may make it easier to express the above.

Also, a-zA-F (twice seen in "identifiers" section) looks somewhat
suspicious.  a-fA-F or a-zA-Z I would understand, and I suspect this
is a misspelled form of the latter.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH] userdiff: add built-in pattern for CSS

William Duclot
In reply to this post by William Duclot
CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Signed-off-by: William Duclot <[hidden email]>
Signed-off-by: Matthieu Moy <[hidden email]>
---
changes since v1:
Fix a typo in the word_regex ("A-F" => "A-Z").
Clearer comment about ISO 10646 characters.

 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh        |  1 +
 t/t4018/css-rule                |  4 ++++
 t/t4034-diff-words.sh           |  1 +
 t/t4034/css/expect              | 16 ++++++++++++++++
 t/t4034/css/post                | 10 ++++++++++
 t/t4034/css/pre                 | 10 ++++++++++
 userdiff.c                      |  8 ++++++++
 8 files changed, 52 insertions(+)
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
  bibtex
  cpp
  csharp
+ css
  fortran
  fountain
  html
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 0000000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect b/t/t4034/css/expect
new file mode 100644
index 0000000..ed10393
--- /dev/null
+++ b/t/t4034/css/expect
@@ -0,0 +1,16 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index b8ae0bb..fe500b7 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,10 +1,10 @@<RESET>
+.<RED>class-form<RESET><GREEN>other-form<RESET> label.control-label {
+    margin-top: <RED>10<RESET><GREEN>15<RESET>px!important;
+    border : 10px <RED>dashed<RESET><GREEN>dotted<RESET> #C6C6C6;
+}<RESET>
+<RED>#CCCCCC<RESET><GREEN>#CCCCCB<RESET>
+10em<RESET>
+<RED>padding-bottom<RESET><GREEN>margin-left<RESET>
+150<RED>px<RESET><GREEN>em<RESET>
+10px
+<RED>!important<RESET>
+<RED>div<RESET><GREEN>li<RESET>.class#id
diff --git a/t/t4034/css/post b/t/t4034/css/post
new file mode 100644
index 0000000..fe500b7
--- /dev/null
+++ b/t/t4034/css/post
@@ -0,0 +1,10 @@
+.other-form label.control-label {
+    margin-top: 15px!important;
+    border : 10px dotted #C6C6C6;
+}
+#CCCCCB
+10em
+margin-left
+150em
+10px
+li.class#id
diff --git a/t/t4034/css/pre b/t/t4034/css/pre
new file mode 100644
index 0000000..b8ae0bb
--- /dev/null
+++ b/t/t4034/css/pre
@@ -0,0 +1,10 @@
+.class-form label.control-label {
+    margin-top: 10px!important;
+    border : 10px dashed #C6C6C6;
+}
+#CCCCCC
+10em
+padding-bottom
+150px
+10px!important
+div.class#id
diff --git a/userdiff.c b/userdiff.c
index 6bf2505..9273969 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -148,6 +148,14 @@ PATTERNS("csharp",
  "[a-zA-Z_][a-zA-Z0-9_]*"
  "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
  "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+PATTERNS("css",
+ "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
+ /* -- */
+ /* This regex comes from W3C CSS specs. Should theoretically also allow ISO 10646 characters U+00A0 and higher,
+  * but they are not handled with this regex. */
+ "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
+ "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
+),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
--
2.9.0.rc0.1.g521d471.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] userdiff: add built-in pattern for CSS

Junio C Hamano
William Duclot <[hidden email]> writes:

> CSS is widely used, motivating it being included as a built-in pattern.
>
> It must be noted that the word_regex for CSS (i.e. the regex defining
> what is a word in the language) does not consider '.' and '#' characters
> (in CSS selectors) to be part of the word. This behavior is documented
> by the test t/t4018/css-rule.
> The logic behind this behavior is the following: identifiers in CSS
> selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
> character are not part of the identifier, but an indicator of the nature
> of the identifier in HTML/XML (class or id). Diffing ".class1" and
> ".class2" must show that the class name is changed, but we still are
> selecting a class.
>
> Signed-off-by: William Duclot <[hidden email]>
> Signed-off-by: Matthieu Moy <[hidden email]>
> ---
> changes since v1:
> Fix a typo in the word_regex ("A-F" => "A-Z").
> Clearer comment about ISO 10646 characters.

It is not a big deal for a small single-patch topic like this, but
it often is hard to reviewers if you do not respond to comments you
received and instead just send a new version of the patch with
"changes since..." comment.  Please make it a habit to do both.

I can see in the above "changes since v1" comment, you took the
A-F/A-Z thing, but I cannot tell if you thought about PATTERNS vs
IPATTERN and rejected IPATTERN with a good reason or if you simply
missed it when reading the review comments you received, for
example.

Three remaining issues are:

 - Have you considered using IPATTERN()?  PATTERNS() that defaults
   case sensitive match is suitable for real languages with fixed
   case keywords, but the pattern you are defining for CSS does not
   do anything special for any set of fixed-case built-in keywords,
   and appears to be better served by IPATTERN().

 - In our codebase, we format multi-line comments in a particular
   way, namely

        /*
         * A multi-line comment begins with slash asterisk
         * on its own line, and its closing asterisk slash
         * also is on its own line.
         */

 - Try not to write overlong lines.  If your expression needs to
   become long and there is no good place to fold lines, that is one
   thing, but an overlong comment is unexcuable, as you can fold
   lines anywhere between words.

Thanks.

> diff --git a/userdiff.c b/userdiff.c
> index 6bf2505..9273969 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -148,6 +148,14 @@ PATTERNS("csharp",
>   "[a-zA-Z_][a-zA-Z0-9_]*"
>   "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>   "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
> +PATTERNS("css",
> + "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
> + /* -- */
> + /* This regex comes from W3C CSS specs. Should theoretically also allow ISO 10646 characters U+00A0 and higher,
> +  * but they are not handled with this regex. */
> + "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
> + "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
> +),
>  { "default", NULL, -1, { NULL, 0 } },
>  };
>  #undef PATTERNS

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] userdiff: add built-in pattern for CSS

William Duclot
> It is not a big deal for a small single-patch topic like this, but
> it often is hard to reviewers if you do not respond to comments you
> received and instead just send a new version of the patch with
> "changes since..." comment.  Please make it a habit to do both.

Apologies, I am not quite used to work with a mailing list yet, I
will make sure to respect this in the future!
 
>  - Have you considered using IPATTERN()?  PATTERNS() that defaults
>    case sensitive match is suitable for real languages with fixed
>    case keywords, but the pattern you are defining for CSS does not
>    do anything special for any set of fixed-case built-in keywords,
>    and appears to be better served by IPATTERN().

I did have considered IPATTERN(), but assumed that case-sensitive was
default and case-insensitive was the exception. As the CSS pattern
does not deal with letters at all it seemed sensible to me to follow
the example of the HTML pattern, which use PATTERNS().

>  - In our codebase, we format multi-line comments in a particular
>    way, namely
>
> /*
>          * A multi-line comment begins with slash asterisk
>          * on its own line, and its closing asterisk slash
>          * also is on its own line.
>          */

I take good note of that. I took example on the fortran pattern
comment, should I fix it too while I'm at it?
 
>  - Try not to write overlong lines.  If your expression needs to
>    become long and there is no good place to fold lines, that is one
>    thing, but an overlong comment is unexcuable, as you can fold
>    lines anywhere between words.

Again, I take good note of that.

Thank you for your time!
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] userdiff: add built-in pattern for CSS

Junio C Hamano
William Duclot <[hidden email]> writes:

> I did have considered IPATTERN(), but assumed that case-sensitive was
> default and case-insensitive was the exception.

Don't assume, but ask ;-)

> As the CSS pattern
> does not deal with letters at all it seemed sensible to me to follow
> the example of the HTML pattern, which use PATTERNS().

Did you notice that HTML pattern has to do an [Hh] that would be
unnecessary if it chose to use IPATTTERN()?

You do not have to ask a person, but instead ask the history.
IPATTERN() was added at 909a5494 (userdiff.c: add builtin fortran
regex patterns, 2010-09-10) when adding fortran support.  Anything
that existed before, including HTML, did [A-Za-z] when they could
have done [a-z] if IPATTERN() existed back then.

>>  - In our codebase, we format multi-line comments in a particular
>>    way, namely
>>
>>   /*
>>          * A multi-line comment begins with slash asterisk
>>          * on its own line, and its closing asterisk slash
>>          * also is on its own line.
>>          */
>
> I take good note of that. I took example on the fortran pattern
> comment, should I fix it too while I'm at it?

Not "while you are at it".

Making existing things better is welcome but such a change shouldn't
be mixed with addition of new things.  You can do it as a separate
patch, probably as a preliminary clean-up before your change, if you
want to.

>>  - Try not to write overlong lines.  If your expression needs to
>>    become long and there is no good place to fold lines, that is one
>>    thing, but an overlong comment is unexcuable, as you can fold
>>    lines anywhere between words.
>
> Again, I take good note of that.
>
> Thank you for your time!

Thank you for working on this.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...