Quantcast

[PATCHv7 0/5] pathspec magic extension to search for attributes

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCHv7 0/5] pathspec magic extension to search for attributes

Stefan Beller-4
This goes on top of origin/jc/attr, (8195c5cde01, SQUASH???, 2016-05-17 14:08:50)

Patches 1 and 2 are small fixes, which could go independently as well.
(1 is new compared to v6)

Patches 3 and 4 are refactoring pathspec.c a little.

The last patch is the most exciting patch as it adds the new feature.
Quoting from the documentation update:

        attr;;
                Additionally to matching the pathspec, the path must have the
                attribute as specified. The syntax for specifying the required
                attributes is "`attr: [mode] <attribute name> [=value]`"
        +
        Attributes can have 4 states (Set, Unset, Set to a value, unspecified) and
        you can query each attribute for certain states. The "`[mode]`" is a special
        character to indicate which attribute states are looked for. The following
        modes are available:
       
        - "`+`" the attribute must be set
        - "`-`" the attribute must be unset
        - "`~`" the attribute must be unspecified
        - "`?`" the attribute must not be unspecified, i.e. set, unset or value matches
        - an empty "`[mode]`" matches if the attribute is set or has a value
        - an empty "`[mode]`" combined with "`[=value]`" matches if the attribute has
          the given value.
        +

While I followed Junios suggestion of the data layout in pathspec.h,
I am not quite happy with it yet as I think it will not be easy in the future
to enhance search for specific attribute values. Quoting from the tests:

        test_expect_success 'check label with 2 labels' '
                cat <<EOF >expect &&
        fileAB     # has both labels
        sub/fileAB # has both labels
        EOF
                git ls-files ":(attr:labelA labelB)" >actual &&
                test_cmp expect actual &&
                git ls-files ":(attr:labelA,attr:labelB)" >actual &&
                test_cmp expect actual
        '

You can see that two notations are possible to narrow the search down when
asking for 2 attributes to be set. Either you can do 2 "attr:" magic strings
or you can have one "attr:" and then a space separated list of attributes.

The problem arises once you'd want to have a better search for values, as
currently only ":(attr:label=value) is supported with "value" being the
exact match, e.g. a path having more than just value would not match as it is
not exact. E.g. in git.git we have

        $ cat .gitattributes
        * whitespace=!indent,trail,space
        *.[ch] whitespace=indent,trail,space
        *.sh whitespace=indent,trail,space
        $ git ls-files ":(attr:whitespace=indent)"
        $ # no match!

So instead of matching exactly we'd want to just look for the value being
one part of the comma separated list as setup in .gitattributes.
However searching for more than one value will be complicated, as we'd then
need to escape commas. My vision was to use white spaces instead, but they
are now taken to separate different attributes.

So in the future I would imagine to have escaping like

        git ls-files ":(attr:label=value\,value2 labelA=value3,exclude)
       
or yet another separator like

        git ls-files ":(attr:label=value1;value2 labelA=value3,exclude)

Thanks,
Stefan

       
Stefan Beller (5):
  string list: improve comment
  Documentation: fix a typo
  pathspec: move long magic parsing out of prefix_pathspec
  pathspec: move prefix check out of the inner loop
  pathspec: allow querying for attributes

 Documentation/gitattributes.txt    |   2 +-
 Documentation/glossary-content.txt |  19 ++++
 attr.c                             |   2 +-
 attr.h                             |   2 +
 dir.c                              |  46 ++++++++++
 pathspec.c                         | 182 +++++++++++++++++++++++++++++--------
 pathspec.h                         |  16 ++++
 string-list.h                      |   2 +-
 t/t6134-pathspec-with-labels.sh    | 177 ++++++++++++++++++++++++++++++++++++
 9 files changed, 407 insertions(+), 41 deletions(-)
 create mode 100755 t/t6134-pathspec-with-labels.sh

--
2.8.2.121.ga97fb08

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCHv7 1/5] string list: improve comment

Stefan Beller-4
Signed-off-by: Stefan Beller <[hidden email]>
---
 string-list.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/string-list.h b/string-list.h
index d3809a1..465a1f0 100644
--- a/string-list.h
+++ b/string-list.h
@@ -106,7 +106,7 @@ void unsorted_string_list_delete_item(struct string_list *list, int i, int free_
  * list->strdup_strings must be set, as new memory needs to be
  * allocated to hold the substrings.  If maxsplit is non-negative,
  * then split at most maxsplit times.  Return the number of substrings
- * appended to list.
+ * appended to list. The list may be non-empty already.
  *
  * Examples:
  *   string_list_split(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"]
--
2.8.2.121.ga97fb08

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCHv7 2/5] Documentation: fix a typo

Stefan Beller-4
In reply to this post by Stefan Beller-4
Signed-off-by: Stefan Beller <[hidden email]>
---
 Documentation/gitattributes.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..af2c682 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -86,7 +86,7 @@ is either not set or empty, $HOME/.config/git/attributes is used instead.
 Attributes for all users on a system should be placed in the
 `$(prefix)/etc/gitattributes` file.
 
-Sometimes you would need to override an setting of an attribute
+Sometimes you would need to override a setting of an attribute
 for a path to `Unspecified` state.  This can be done by listing
 the name of the attribute prefixed with an exclamation point `!`.
 
--
2.8.2.121.ga97fb08

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCHv7 3/5] pathspec: move long magic parsing out of prefix_pathspec

Stefan Beller-4
In reply to this post by Stefan Beller-4
`prefix_pathspec` is quite a lengthy function and we plan on adding more.
Split it up for better readability. As we want to add code into the
inner loop of the long magic parsing, we also benefit from lower
indentation.

Signed-off-by: Stefan Beller <[hidden email]>
---
 pathspec.c | 84 +++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 47 insertions(+), 37 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index c9e9b6c..eba37c2 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -88,6 +88,52 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen,
  strbuf_addf(sb, ",prefix:%d)", prefixlen);
 }
 
+static void eat_long_magic(struct pathspec_item *item, const char *elt,
+ unsigned *magic, int *pathspec_prefix,
+ const char **copyfrom_, const char **long_magic_end)
+{
+ int i;
+ const char *copyfrom = *copyfrom_;
+ /* longhand */
+ const char *nextat;
+ for (copyfrom = elt + 2;
+     *copyfrom && *copyfrom != ')';
+     copyfrom = nextat) {
+ size_t len = strcspn(copyfrom, ",)");
+ if (copyfrom[len] == ',')
+ nextat = copyfrom + len + 1;
+ else
+ /* handle ')' and '\0' */
+ nextat = copyfrom + len;
+ if (!len)
+ continue;
+ for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
+ if (strlen(pathspec_magic[i].name) == len &&
+    !strncmp(pathspec_magic[i].name, copyfrom, len)) {
+ *magic |= pathspec_magic[i].bit;
+ break;
+ }
+ if (starts_with(copyfrom, "prefix:")) {
+ char *endptr;
+ *pathspec_prefix = strtol(copyfrom + 7,
+  &endptr, 10);
+ if (endptr - copyfrom != len)
+ die(_("invalid parameter for pathspec magic 'prefix'"));
+ /* "i" would be wrong, but it does not matter */
+ break;
+ }
+ }
+ if (ARRAY_SIZE(pathspec_magic) <= i)
+ die(_("Invalid pathspec magic '%.*s' in '%s'"),
+    (int) len, copyfrom, elt);
+ }
+ if (*copyfrom != ')')
+ die(_("Missing ')' at the end of pathspec magic in '%s'"), elt);
+ *long_magic_end = copyfrom;
+ copyfrom++;
+ *copyfrom_ = copyfrom;
+}
+
 /*
  * Take an element of a pathspec and check for magic signatures.
  * Append the result to the prefix. Return the magic bitmap.
@@ -150,43 +196,7 @@ static unsigned prefix_pathspec(struct pathspec_item *item,
     (flags & PATHSPEC_LITERAL_PATH)) {
  ; /* nothing to do */
  } else if (elt[1] == '(') {
- /* longhand */
- const char *nextat;
- for (copyfrom = elt + 2;
-     *copyfrom && *copyfrom != ')';
-     copyfrom = nextat) {
- size_t len = strcspn(copyfrom, ",)");
- if (copyfrom[len] == ',')
- nextat = copyfrom + len + 1;
- else
- /* handle ')' and '\0' */
- nextat = copyfrom + len;
- if (!len)
- continue;
- for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
- if (strlen(pathspec_magic[i].name) == len &&
-    !strncmp(pathspec_magic[i].name, copyfrom, len)) {
- magic |= pathspec_magic[i].bit;
- break;
- }
- if (starts_with(copyfrom, "prefix:")) {
- char *endptr;
- pathspec_prefix = strtol(copyfrom + 7,
- &endptr, 10);
- if (endptr - copyfrom != len)
- die(_("invalid parameter for pathspec magic 'prefix'"));
- /* "i" would be wrong, but it does not matter */
- break;
- }
- }
- if (ARRAY_SIZE(pathspec_magic) <= i)
- die(_("Invalid pathspec magic '%.*s' in '%s'"),
-    (int) len, copyfrom, elt);
- }
- if (*copyfrom != ')')
- die(_("Missing ')' at the end of pathspec magic in '%s'"), elt);
- long_magic_end = copyfrom;
- copyfrom++;
+ eat_long_magic(item, elt, &magic, &pathspec_prefix, &copyfrom, &long_magic_end);
  } else {
  /* shorthand */
  for (copyfrom = elt + 1;
--
2.8.2.121.ga97fb08

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCHv7 4/5] pathspec: move prefix check out of the inner loop

Stefan Beller-4
In reply to this post by Stefan Beller-4
The prefix check is not related the check of pathspec magic; also there
is no code that is relevant after we'd break the loop on a match for
"prefix:". So move the check before the loop and shortcircuit the outer
loop.

Signed-off-by: Stefan Beller <[hidden email]>
---
 pathspec.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index eba37c2..4dff252 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -107,21 +107,22 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt,
  nextat = copyfrom + len;
  if (!len)
  continue;
+
+ if (starts_with(copyfrom, "prefix:")) {
+ char *endptr;
+ *pathspec_prefix = strtol(copyfrom + 7,
+  &endptr, 10);
+ if (endptr - copyfrom != len)
+ die(_("invalid parameter for pathspec magic 'prefix'"));
+ continue;
+ }
+
  for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
  if (strlen(pathspec_magic[i].name) == len &&
     !strncmp(pathspec_magic[i].name, copyfrom, len)) {
  *magic |= pathspec_magic[i].bit;
  break;
  }
- if (starts_with(copyfrom, "prefix:")) {
- char *endptr;
- *pathspec_prefix = strtol(copyfrom + 7,
-  &endptr, 10);
- if (endptr - copyfrom != len)
- die(_("invalid parameter for pathspec magic 'prefix'"));
- /* "i" would be wrong, but it does not matter */
- break;
- }
  }
  if (ARRAY_SIZE(pathspec_magic) <= i)
  die(_("Invalid pathspec magic '%.*s' in '%s'"),
--
2.8.2.121.ga97fb08

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCHv7 5/5] pathspec: allow querying for attributes

Stefan Beller-4
In reply to this post by Stefan Beller-4
The pathspec mechanism is extended via the new
":(attr:eol=input)pattern/to/match" syntax to filter paths so that it
requires paths to not just match the given pattern but also have the
specified attrs attached for them to be chosen.

Signed-off-by: Stefan Beller <[hidden email]>
---
 Documentation/glossary-content.txt |  19 ++++
 attr.c                             |   2 +-
 attr.h                             |   2 +
 dir.c                              |  46 ++++++++++
 pathspec.c                         | 103 ++++++++++++++++++++-
 pathspec.h                         |  16 ++++
 t/t6134-pathspec-with-labels.sh    | 177 +++++++++++++++++++++++++++++++++++++
 7 files changed, 360 insertions(+), 5 deletions(-)
 create mode 100755 t/t6134-pathspec-with-labels.sh

diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
index cafc284..866e8d8 100644
--- a/Documentation/glossary-content.txt
+++ b/Documentation/glossary-content.txt
@@ -384,6 +384,25 @@ full pathname may have special meaning:
 +
 Glob magic is incompatible with literal magic.
 
+attr;;
+ Additionally to matching the pathspec, the path must have the
+ attribute as specified. The syntax for specifying the required
+ attributes is "`attr: [mode] <attribute name> [=value]`"
++
+Attributes can have 4 states (Set, Unset, Set to a value, unspecified) and
+you can query each attribute for certain states. The "`[mode]`" is a special
+character to indicate which attribute states are looked for. The following
+modes are available:
+
+ - "`+`" the attribute must be set
+ - "`-`" the attribute must be unset
+ - "`~`" the attribute must be unspecified
+ - "`?`" the attribute must not be unspecified, i.e. set, unset or value matches
+ - an empty "`[mode]`" matches if the attribute is set or has a value
+ - an empty "`[mode]`" combined with "`[=value]`" matches if the attribute has
+   the given value.
++
+
 exclude;;
  After a path matches any non-exclude pathspec, it will be run
  through all exclude pathspec (magic signature: `!`). If it
diff --git a/attr.c b/attr.c
index e0f7965..65cffd8 100644
--- a/attr.c
+++ b/attr.c
@@ -59,7 +59,7 @@ static unsigned hash_name(const char *name, int namelen)
  return val;
 }
 
-static int invalid_attr_name(const char *name, int namelen)
+int invalid_attr_name(const char *name, int namelen)
 {
  /*
  * Attribute name cannot begin with '-' and must consist of
diff --git a/attr.h b/attr.h
index 51ca36d..4bb4848 100644
--- a/attr.h
+++ b/attr.h
@@ -45,6 +45,8 @@ extern void git_attr_check_append(struct git_attr_check *, const struct git_attr
 extern void git_attr_check_clear(struct git_attr_check *);
 extern void git_attr_check_free(struct git_attr_check *);
 
+extern int invalid_attr_name(const char *name, int namelen);
+
 /*
  * Return the name of the attribute represented by the argument.  The
  * return value is a pointer to a null-delimited string that is part
diff --git a/dir.c b/dir.c
index 996653b..3141a5a 100644
--- a/dir.c
+++ b/dir.c
@@ -9,6 +9,7 @@
  */
 #include "cache.h"
 #include "dir.h"
+#include "attr.h"
 #include "refs.h"
 #include "wildmatch.h"
 #include "pathspec.h"
@@ -215,6 +216,48 @@ int within_depth(const char *name, int namelen,
  return 1;
 }
 
+static int match_attrs(const char *name, int namelen,
+       const struct pathspec_item *item)
+{
+ char *path;
+ int i;
+
+ path = xmemdupz(name, namelen);
+ git_check_attr(path, item->attr_check);
+
+ for (i = 0; i < item->attr_match_nr; i++) {
+ const char *value;
+ int matched;
+ enum attr_match_mode match_mode;
+
+ value = item->attr_check->check[i].value;
+
+ match_mode = item->attr_match[i].match_mode;
+
+ if (ATTR_TRUE(value)) {
+ matched = match_mode == MATCH_SET ||
+  match_mode == MATCH_SET_OR_VALUE ||
+  match_mode == MATCH_NOT_UNSPECIFIED;
+ } else if (ATTR_FALSE(value)) {
+ matched = match_mode == MATCH_UNSET ||
+  match_mode == MATCH_NOT_UNSPECIFIED;
+ } else if (ATTR_UNSET(value)) {
+ matched = match_mode == MATCH_UNSPECIFIED;
+ } else {
+ matched = match_mode == MATCH_NOT_UNSPECIFIED ||
+  match_mode == MATCH_SET_OR_VALUE ||
+  (match_mode == MATCH_VALUE &&
+   !strcmp(item->attr_match[i].value, value));
+ }
+ if (!matched)
+ return 0;
+ }
+
+ free(path);
+
+ return 1;
+}
+
 #define DO_MATCH_EXCLUDE   1
 #define DO_MATCH_DIRECTORY 2
 
@@ -270,6 +313,9 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
     strncmp(item->match, name - prefix, item->prefix))
  return 0;
 
+ if (item->attr_match_nr && !match_attrs(name, namelen, item))
+ return 0;
+
  /* If the match was just the prefix, we matched */
  if (!*match)
  return MATCHED_RECURSIVELY;
diff --git a/pathspec.c b/pathspec.c
index 4dff252..32fb6a8 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "attr.h"
 
 /*
  * Finds which of the given pathspecs match items in the index.
@@ -88,12 +89,82 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen,
  strbuf_addf(sb, ",prefix:%d)", prefixlen);
 }
 
+static void parse_pathspec_attr_match(struct pathspec_item *item, const char *value)
+{
+ struct string_list_item *si;
+ struct string_list list = STRING_LIST_INIT_DUP;
+
+
+ if (!value || !strlen(value))
+ goto err;
+
+ string_list_split(&list, value, ' ', -1);
+ string_list_remove_empty_items(&list, 0);
+
+ if (!item->attr_check)
+ item->attr_check = git_attr_check_alloc();
+
+ ALLOC_GROW(item->attr_match, item->attr_match_nr + list.nr, item->attr_match_alloc);
+
+ for_each_string_list_item(si, &list) {
+ size_t val_len;
+
+ int j = item->attr_match_nr++;
+ const char *val = si->string;
+ struct attr_match *am = &item->attr_match[j];
+
+ if (val[0] == '?')
+ am->match_mode = MATCH_NOT_UNSPECIFIED;
+ else if (val[0] == '~')
+ am->match_mode = MATCH_UNSPECIFIED;
+ else if (val[0] == '+')
+ am->match_mode = MATCH_SET;
+ else if (val[0] == '-')
+ am->match_mode = MATCH_UNSET;
+ else
+ am->match_mode = MATCH_SET_OR_VALUE;
+
+ if (am->match_mode != MATCH_SET_OR_VALUE)
+ /* skip first character */
+ val++;
+
+ val_len = strcspn(val, "=,)");
+ if (val[val_len] == '=') {
+ am->match_mode = MATCH_VALUE;
+ am->value = xstrdup(&val[val_len + 1]);
+ /*
+ * NEEDSWORK:
+ * Do we want to allow escaped commas to search
+ * for comma separated values?
+ */
+ if (strchr(am->value, '\\'))
+ die(_("attr spec values must not contain backslashes"));
+ } else
+ am->value = NULL;
+
+ if (invalid_attr_name(val, val_len)) {
+ am->match_mode = INVALID_ATTR;
+ goto err;
+ }
+
+ am->attr = git_attr(xmemdupz(val, val_len));
+ git_attr_check_append(item->attr_check, am->attr);
+ }
+
+ string_list_clear(&list, 0);
+ return;
+err:
+ die(_("attr spec '%s': attrs must not start with '-' and "
+      "be composed of [-A-Za-z0-9_.]."), value);
+}
+
 static void eat_long_magic(struct pathspec_item *item, const char *elt,
  unsigned *magic, int *pathspec_prefix,
  const char **copyfrom_, const char **long_magic_end)
 {
  int i;
  const char *copyfrom = *copyfrom_;
+ const char *body;
  /* longhand */
  const char *nextat;
  for (copyfrom = elt + 2;
@@ -108,15 +179,21 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt,
  if (!len)
  continue;
 
- if (starts_with(copyfrom, "prefix:")) {
+ if (skip_prefix(copyfrom, "prefix:", &body)) {
  char *endptr;
- *pathspec_prefix = strtol(copyfrom + 7,
-  &endptr, 10);
+ *pathspec_prefix = strtol(body, &endptr, 10);
  if (endptr - copyfrom != len)
  die(_("invalid parameter for pathspec magic 'prefix'"));
  continue;
  }
 
+ if (skip_prefix(copyfrom, "attr:", &body)) {
+ char *pass = xmemdupz(body, len - strlen("attr:"));
+ parse_pathspec_attr_match(item, pass);
+ free(pass);
+ continue;
+ }
+
  for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
  if (strlen(pathspec_magic[i].name) == len &&
     !strncmp(pathspec_magic[i].name, copyfrom, len)) {
@@ -425,7 +502,10 @@ void parse_pathspec(struct pathspec *pathspec,
  for (i = 0; i < n; i++) {
  unsigned short_magic;
  entry = argv[i];
-
+ item[i].attr_check = NULL;
+ item[i].attr_match = NULL;
+ item[i].attr_match_nr = 0;
+ item[i].attr_match_alloc = 0;
  item[i].magic = prefix_pathspec(item + i, &short_magic,
  argv + i, flags,
  prefix, prefixlen, entry);
@@ -447,6 +527,13 @@ void parse_pathspec(struct pathspec *pathspec,
  if (item[i].nowildcard_len < item[i].len)
  pathspec->has_wildcard = 1;
  pathspec->magic |= item[i].magic;
+
+ if (item[i].attr_match_nr) {
+ int j;
+ for (j = 0; j < item[i].attr_match_nr; j++)
+ if (item[i].attr_match[j].match_mode == INVALID_ATTR)
+ die(_("attribute spec in the wrong syntax are prohibited."));
+ }
  }
 
  if (nr_exclude == n)
@@ -502,6 +589,14 @@ void copy_pathspec(struct pathspec *dst, const struct pathspec *src)
 
 void free_pathspec(struct pathspec *pathspec)
 {
+ int i, j;
+ for (i = 0; i < pathspec->nr; i++) {
+ for (j = 0; j < pathspec->items[j].attr_match_nr; j++)
+ free(pathspec->items[i].attr_match[j].value);
+ free(pathspec->items[i].attr_match);
+ git_attr_check_free(pathspec->items[i].attr_check);
+ }
+
  free(pathspec->items);
  pathspec->items = NULL;
 }
diff --git a/pathspec.h b/pathspec.h
index 0c11262..5308137 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -32,6 +32,22 @@ struct pathspec {
  int len, prefix;
  int nowildcard_len;
  int flags;
+ int attr_match_nr;
+ int attr_match_alloc;
+ struct attr_match {
+ struct git_attr *attr;
+ char *value;
+ enum attr_match_mode {
+ MATCH_SET,
+ MATCH_UNSET,
+ MATCH_VALUE,
+ MATCH_UNSPECIFIED,
+ MATCH_NOT_UNSPECIFIED,
+ MATCH_SET_OR_VALUE,
+ INVALID_ATTR
+ } match_mode;
+ } *attr_match;
+ struct git_attr_check *attr_check;
  } *items;
 };
 
diff --git a/t/t6134-pathspec-with-labels.sh b/t/t6134-pathspec-with-labels.sh
new file mode 100755
index 0000000..35b3ab2
--- /dev/null
+++ b/t/t6134-pathspec-with-labels.sh
@@ -0,0 +1,177 @@
+#!/bin/sh
+
+test_description='test labels in pathspecs'
+. ./test-lib.sh
+
+test_expect_success 'setup a tree' '
+ mkdir sub &&
+ for p in fileA fileB fileC fileAB fileAC fileBC fileNoLabel fileUnsetLabel fileSetLabel fileValue fileWrongLabel; do
+ : >$p &&
+ git add $p &&
+ : >sub/$p
+ git add sub/$p
+ done &&
+ git commit -m $p &&
+ git ls-files >actual &&
+ cat <<EOF >expect &&
+fileA
+fileAB
+fileAC
+fileB
+fileBC
+fileC
+fileNoLabel
+fileSetLabel
+fileUnsetLabel
+fileValue
+fileWrongLabel
+sub/fileA
+sub/fileAB
+sub/fileAC
+sub/fileB
+sub/fileBC
+sub/fileC
+sub/fileNoLabel
+sub/fileSetLabel
+sub/fileUnsetLabel
+sub/fileValue
+sub/fileWrongLabel
+EOF
+ test_cmp expect actual
+'
+
+test_expect_success 'pathspec with no attr' '
+ test_must_fail git ls-files ":(attr:)" 2>actual &&
+ test_i18ngrep fatal actual
+'
+
+test_expect_success 'pathspec with labels and non existent .gitattributes' '
+ git ls-files ":(attr:label)" >actual &&
+ test_must_be_empty actual
+'
+
+test_expect_success 'setup .gitattributes' '
+ cat <<EOF >.gitattributes &&
+fileA labelA
+fileB labelB
+fileC labelC
+fileAB labelA labelB
+fileAC labelA labelC
+fileBC labelB labelC
+fileUnsetLabel -label
+fileSetLabel label
+fileValue label=foo
+fileWrongLabel label☺
+EOF
+ git add .gitattributes &&
+ git commit -m "add attributes"
+'
+
+sq="'"
+
+test_expect_success 'check specific set attr' '
+ cat <<EOF >expect &&
+fileSetLabel
+sub/fileSetLabel
+EOF
+ git ls-files ":(attr:+label)" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'check specific unset attr' '
+ cat <<EOF >expect &&
+fileUnsetLabel
+sub/fileUnsetLabel
+EOF
+ git ls-files ":(attr:-label)" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'check specific value attr' '
+ cat <<EOF >expect &&
+fileValue
+sub/fileValue
+EOF
+ git ls-files ":(attr:label=foo)" >actual &&
+ test_cmp expect actual &&
+ git ls-files ":(attr:label=bar)" >actual &&
+ test_must_be_empty actual
+'
+
+test_expect_success 'check set or value attr' '
+ cat <<EOF >expect &&
+fileSetLabel
+fileValue
+sub/fileSetLabel
+sub/fileValue
+EOF
+ git ls-files ":(attr:label)" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'check unspecified attr' '
+ cat <<EOF >expect &&
+.gitattributes
+fileC
+fileNoLabel
+fileWrongLabel
+sub/fileC
+sub/fileNoLabel
+sub/fileWrongLabel
+EOF
+ git ls-files ":(attr:~label,attr:~labelA,attr:~labelB)" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'check not unspecified attr' '
+ cat <<EOF >expect &&
+fileSetLabel
+fileUnsetLabel
+fileValue
+sub/fileSetLabel
+sub/fileUnsetLabel
+sub/fileValue
+EOF
+ git ls-files ":(attr:?label)" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'check label with 2 labels' '
+ cat <<EOF >expect &&
+fileAB
+sub/fileAB
+EOF
+ git ls-files ":(attr:labelA labelB)" >actual &&
+ test_cmp expect actual &&
+ git ls-files ":(attr:labelA,attr:labelB)" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'check label with more labels but excluded path' '
+ cat <<EOF >expect &&
+fileAB
+fileB
+fileBC
+EOF
+ git ls-files ":(attr:labelB)" ":(exclude)sub/" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'check label excluding other labels' '
+ cat <<EOF >expect &&
+fileAB
+fileB
+fileBC
+sub/fileAB
+sub/fileB
+EOF
+ git ls-files ":(attr:labelB)" ":(exclude,attr:labelC)sub/" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'abort on giving invalid label on the command line' '
+ test_must_fail git ls-files . ":(attr:☺)" 2>actual &&
+ test_i18ngrep "fatal" actual
+'
+
+test_done
--
2.8.2.121.ga97fb08

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHv7 5/5] pathspec: allow querying for attributes

Stefan Beller-4
On Wed, May 18, 2016 at 12:02 PM, Stefan Beller <[hidden email]> wrote:
>  void free_pathspec(struct pathspec *pathspec)
>  {
> +       int i, j;
> +       for (i = 0; i < pathspec->nr; i++) {
> +               for (j = 0; j < pathspec->items[j].attr_match_nr; j++)
> +                       free(pathspec->items[i].attr_match[j].value);
> +               free(pathspec->items[i].attr_match);
> +               git_attr_check_free(pathspec->items[i].attr_check);

This is faulty as may be NULL and git_attr_check_free assumes its
argument is not
NULL. So I'll add a guard here in a reoll.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHv7 5/5] pathspec: allow querying for attributes

Junio C Hamano
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:

> + - "`+`" the attribute must be set
> + - "`-`" the attribute must be unset
> + - "`~`" the attribute must be unspecified

I think using these three, not the way how .gitattributes specify
them, is highly confusing to the users.

> + - "`?`" the attribute must not be unspecified, i.e. set, unset or value matches

The choice of '?' is OK, as there is no reason .gitattributes wants
to say something fuzzy like "this is set or unset or has value".
The last part "set, unset or value matches" does not make sense,
though.  Did you mean "set, unset or set to value"?

> + - an empty "`[mode]`" matches if the attribute is set or has a value

The same comment as +/-/~ above applies.

> + - an empty "`[mode]`" combined with "`[=value]`" matches if the attribute has
> +   the given value.

I.e. "eol=crlf" matches only for paths whose eol attribute is set to
"crlf".  Makes perfect sense.


> diff --git a/dir.c b/dir.c
> index 996653b..3141a5a 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -9,6 +9,7 @@
>   */
>  #include "cache.h"
>  #include "dir.h"
> +#include "attr.h"
>  #include "refs.h"
>  #include "wildmatch.h"
>  #include "pathspec.h"
> @@ -215,6 +216,48 @@ int within_depth(const char *name, int namelen,
>   return 1;
>  }
>  
> +static int match_attrs(const char *name, int namelen,
> +       const struct pathspec_item *item)
> +{
> + char *path;
> + int i;
> +
> + path = xmemdupz(name, namelen);
> + git_check_attr(path, item->attr_check);
> +
> + for (i = 0; i < item->attr_match_nr; i++) {
> + const char *value;
> + int matched;
> + enum attr_match_mode match_mode;
> +
> + value = item->attr_check->check[i].value;
> +
> + match_mode = item->attr_match[i].match_mode;

Mental note: there is a one-to-one correspondence between
attr_check->check[] and attr_match[].

> + if (!matched)
> + return 0;

So this ANDs together.  OK.

> + }
> +
> + free(path);

Let me see how involved a change would be to allow passing a counted
string to git_check_attr().

> diff --git a/pathspec.c b/pathspec.c
> index 4dff252..32fb6a8 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -1,6 +1,7 @@
>  #include "cache.h"
>  #include "dir.h"
>  #include "pathspec.h"
> +#include "attr.h"
>  
>  /*
>   * Finds which of the given pathspecs match items in the index.
> @@ -88,12 +89,82 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen,
>   strbuf_addf(sb, ",prefix:%d)", prefixlen);
>  }
>  

Mental note after skiming the caller:

The "value" here is like "VAR1=VAL1 VAR2 !VAR3" in a pathspec
":(attr:VAR1=VAL1 VAR2 !VAR3)path/to/limit", i.e. the source string
to be compiled into a list of conditionals that will be evaluated by
match_attr() in dir.c

> +static void parse_pathspec_attr_match(struct pathspec_item *item, const char *value)
> +{
> + struct string_list_item *si;
> + struct string_list list = STRING_LIST_INIT_DUP;
> +
> +
> + if (!value || !strlen(value))
> + goto err;
> +
> + string_list_split(&list, value, ' ', -1);
> + string_list_remove_empty_items(&list, 0);

At this point, we got "VAR1=VAL1", "VAR2", "!VAR3" as elements in
this list.

> + if (!item->attr_check)
> + item->attr_check = git_attr_check_alloc();

Given more than one "attr" magic, e.g. ":(attr:A=a,attr:B)/path",
the check may not be empty when we process the second one; we just
extend it without losing the existing contents.

> + ALLOC_GROW(item->attr_match, item->attr_match_nr + list.nr, item->attr_match_alloc);

Likewise, we extend attr_match[] array.

> + for_each_string_list_item(si, &list) {
> + size_t val_len;
> +
> + int j = item->attr_match_nr++;
> + const char *val = si->string;
> + struct attr_match *am = &item->attr_match[j];
> +
> + if (val[0] == '?')
> + am->match_mode = MATCH_NOT_UNSPECIFIED;
> + else if (val[0] == '~')
> + am->match_mode = MATCH_UNSPECIFIED;
> + else if (val[0] == '+')
> + am->match_mode = MATCH_SET;
> + else if (val[0] == '-')
> + am->match_mode = MATCH_UNSET;
> + else
> + am->match_mode = MATCH_SET_OR_VALUE;
> +
> + if (am->match_mode != MATCH_SET_OR_VALUE)
> + /* skip first character */
> + val++;
> +
> + val_len = strcspn(val, "=,)");

I understand "=", but can "," and ")" appear here?

> + if (val[val_len] == '=') {
> + am->match_mode = MATCH_VALUE;
> + am->value = xstrdup(&val[val_len + 1]);
> + /*
> + * NEEDSWORK:
> + * Do we want to allow escaped commas to search
> + * for comma separated values?
> + */

No, we don't ;-).

> + if (strchr(am->value, '\\'))
> + die(_("attr spec values must not contain backslashes"));

But this is a good escape hatch to reserve for our own use, until we
decide what the quoting mechanism will be (or if it is necessary).

> + } else
> + am->value = NULL;
> +
> + if (invalid_attr_name(val, val_len)) {
> + am->match_mode = INVALID_ATTR;
> + goto err;
> + }
> +
> + am->attr = git_attr(xmemdupz(val, val_len));
> + git_attr_check_append(item->attr_check, am->attr);

GOOD!

I think val and val_len should be renamed to attr and attr_len (in
the VARIABLE=VALUE context, these two identifiers are about parsing
the variable side, not the value side).

> + }
> +
> + string_list_clear(&list, 0);
> + return;
> +err:
> + die(_("attr spec '%s': attrs must not start with '-' and "
> +      "be composed of [-A-Za-z0-9_.]."), value);

What is "value" at this point?  If you failed to parse the second
element in "VAR1=VAL1 FR*TZ=VAL2 !VAR3", I think you would want to
say "'FR*TZ' is malformed".

Existing users of the function seems to say this:

                if (invalid_attr_name(cp, len)) {
                        fprintf(stderr,
                                "%.*s is not a valid attribute name: %s:%d\n",
                                len, cp, src, lineno);
                        return NULL;
                }

when parsing .gitattribute file.  The source file:line reference
does not apply to this context, but it would be better to unify the
message.  I do not mind spelling the rules out explicitly like you
did, but I do not want to see it spread across many places (which
forces us to update them when we have to change the rule later).

Perhaps we want a helper function in attr.c side that takes the
attribute name string (val, val_len in your code above, which I
suggest to be renamed to attr, attr_len) and formats the error
message into a strbuf, or something?

        void
        attr_name_error(struct strbuf *err, const char *name, size_t len)
        {
                strbuf_addf(_("invalid attribute name: '%.*s'"), len, name);
        }

then this one can say

        err: {
        struct strbuf err = STRBUF_INIT;
                attr_name_error(&err, attr, attr_len);
                die(err.buf);
        }

while the error routines can do a similar thing using cp and len,
and then append "%s:%d" % (src, lineno) at the end themselves.

>  static void eat_long_magic(struct pathspec_item *item, const char *elt,
>   unsigned *magic, int *pathspec_prefix,
>   const char **copyfrom_, const char **long_magic_end)
>  {
>   int i;
>   const char *copyfrom = *copyfrom_;
> + const char *body;
>   /* longhand */
>   const char *nextat;
>   for (copyfrom = elt + 2;
> @@ -108,15 +179,21 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt,
>   if (!len)
>   continue;
>  
> - if (starts_with(copyfrom, "prefix:")) {
> + if (skip_prefix(copyfrom, "prefix:", &body)) {
>   char *endptr;
> - *pathspec_prefix = strtol(copyfrom + 7,
> -  &endptr, 10);
> + *pathspec_prefix = strtol(body, &endptr, 10);
>   if (endptr - copyfrom != len)
>   die(_("invalid parameter for pathspec magic 'prefix'"));
>   continue;
>   }
>  
> + if (skip_prefix(copyfrom, "attr:", &body)) {
> + char *pass = xmemdupz(body, len - strlen("attr:"));
> + parse_pathspec_attr_match(item, pass);
> + free(pass);
> + continue;

Makes me wonder what "pass" stands for.  From the look of xmemdupz(),
given ":(attr:VAR1=VAL1 VAR2 !VAR3)path/to/limit" to parse, len
points at ")" and xmemdupz() gives "VAR1=VAL1 VAR2 !VAR3" to it.

> @@ -502,6 +589,14 @@ void copy_pathspec(struct pathspec *dst, const struct pathspec *src)
>  
>  void free_pathspec(struct pathspec *pathspec)
>  {
> + int i, j;
> + for (i = 0; i < pathspec->nr; i++) {
> + for (j = 0; j < pathspec->items[j].attr_match_nr; j++)
> + free(pathspec->items[i].attr_match[j].value);
> + free(pathspec->items[i].attr_match);
> + git_attr_check_free(pathspec->items[i].attr_check);
> + }

OK.

Overall very nicely done (modulo nits mentioned above, none of which
is unfixable).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHv7 5/5] pathspec: allow querying for attributes

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

Just this part:

> The last part "set, unset or value matches" does not make sense,
> though.

I re-read the line and I think you meant

    ?VAR makes a path match if VAR attribute is set, set to false,
    or set to value for the path

and I shouldn't have read it as a 3-tuple ("set", "unset", "value matches"),
but as a 3-tuple ("set", "unset", "value").  That is,

    A path (whose attribute state is one of these three) matches.

So I retract "does not make sense"; it still is "hard to grok", though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHv7 5/5] pathspec: allow querying for attributes

Junio C Hamano
In reply to this post by Junio C Hamano
Junio C Hamano <[hidden email]> writes:

>> +static int match_attrs(const char *name, int namelen,
>> +       const struct pathspec_item *item)
>> +{
>> + char *path;
>> + int i;
>> +
>> + path = xmemdupz(name, namelen);
>> + git_check_attr(path, item->attr_check);
> ...
>> + }
>> +
>> + free(path);
>
> Let me see how involved a change would be to allow passing a counted
> string to git_check_attr().

Perhaps the attached is sufficient, and you can avoid copying the
paths in this codepath.

 attr.c | 23 ++++++++++++++---------
 attr.h |  1 +
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/attr.c b/attr.c
index e0f7965..eeb29e6 100644
--- a/attr.c
+++ b/attr.c
@@ -725,20 +725,19 @@ static int macroexpand_one(int nr, int rem)
  * check_all_attr. If num is non-zero, only attributes in check[] are
  * collected. Otherwise all attributes are collected.
  */
-static void collect_some_attrs(const char *path, int num,
+static void collect_some_attrs(const char *path, int pathlen, int num,
        struct git_attr_check_elem *check)
 
 {
  struct attr_stack *stk;
- int i, pathlen, rem, dirlen;
+ int i, rem, dirlen;
  const char *cp, *last_slash = NULL;
  int basename_offset;
 
- for (cp = path; *cp; cp++) {
+ for (cp = path; cp < path + pathlen; cp++) {
  if (*cp == '/' && cp[1])
  last_slash = cp;
  }
- pathlen = cp - path;
  if (last_slash) {
  basename_offset = last_slash + 1 - path;
  dirlen = last_slash - path;
@@ -769,12 +768,12 @@ static void collect_some_attrs(const char *path, int num,
  rem = fill(path, pathlen, basename_offset, stk, rem);
 }
 
-static int git_check_attrs(const char *path, int num,
+static int git_check_attrs(const char *path, int pathlen, int num,
    struct git_attr_check_elem *check)
 {
  int i;
 
- collect_some_attrs(path, num, check);
+ collect_some_attrs(path, pathlen, num, check);
 
  for (i = 0; i < num; i++) {
  const char *value = check_all_attr[check[i].attr->attr_nr].value;
@@ -791,7 +790,7 @@ void git_all_attrs(const char *path, struct git_attr_check *check)
  int i;
 
  git_attr_check_clear(check);
- collect_some_attrs(path, 0, NULL);
+ collect_some_attrs(path, strlen(path), 0, NULL);
 
  for (i = 0; i < attr_nr; i++) {
  const char *name = check_all_attr[i].attr->name;
@@ -816,10 +815,16 @@ void git_attr_set_direction(enum git_attr_direction new, struct index_state *ist
  use_index = istate;
 }
 
-int git_check_attr(const char *path, struct git_attr_check *check)
+int git_check_attr_counted(const char *path, int pathlen,
+   struct git_attr_check *check)
 {
  check->finalized = 1;
- return git_check_attrs(path, check->check_nr, check->check);
+ return git_check_attrs(path, pathlen, check->check_nr, check->check);
+}
+
+int git_check_attr(const char *path, struct git_attr_check *check)
+{
+ return git_check_attr_counted(path, strlen(path), check);
 }
 
 struct git_attr_check *git_attr_check_initl(const char *one, ...)
diff --git a/attr.h b/attr.h
index 51ca36d..4a4ac76 100644
--- a/attr.h
+++ b/attr.h
@@ -38,6 +38,7 @@ struct git_attr_check {
 
 extern struct git_attr_check *git_attr_check_initl(const char *, ...);
 extern int git_check_attr(const char *path, struct git_attr_check *);
+extern int git_check_attr_counted(const char *, int, struct git_attr_check *);
 
 extern struct git_attr_check *git_attr_check_alloc(void);
 extern void git_attr_check_append(struct git_attr_check *, const struct git_attr *);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH] attr: add counted string version of git_check_attr()

Junio C Hamano
Often a potential caller has <path, pathlen> pair that
represents the path it wants to ask attributes for; when
path[pathlen] is not NUL, the caller has to xmemdupz()
only to call git_check_attr().

Add git_check_attr_counted() that takes such a counted
string instead of "const char *path".

Signed-off-by: Junio C Hamano <[hidden email]>
---
 * This time with a log message

 attr.c | 23 ++++++++++++++---------
 attr.h |  1 +
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/attr.c b/attr.c
index e0f7965..eeb29e6 100644
--- a/attr.c
+++ b/attr.c
@@ -725,20 +725,19 @@ static int macroexpand_one(int nr, int rem)
  * check_all_attr. If num is non-zero, only attributes in check[] are
  * collected. Otherwise all attributes are collected.
  */
-static void collect_some_attrs(const char *path, int num,
+static void collect_some_attrs(const char *path, int pathlen, int num,
        struct git_attr_check_elem *check)
 
 {
  struct attr_stack *stk;
- int i, pathlen, rem, dirlen;
+ int i, rem, dirlen;
  const char *cp, *last_slash = NULL;
  int basename_offset;
 
- for (cp = path; *cp; cp++) {
+ for (cp = path; cp < path + pathlen; cp++) {
  if (*cp == '/' && cp[1])
  last_slash = cp;
  }
- pathlen = cp - path;
  if (last_slash) {
  basename_offset = last_slash + 1 - path;
  dirlen = last_slash - path;
@@ -769,12 +768,12 @@ static void collect_some_attrs(const char *path, int num,
  rem = fill(path, pathlen, basename_offset, stk, rem);
 }
 
-static int git_check_attrs(const char *path, int num,
+static int git_check_attrs(const char *path, int pathlen, int num,
    struct git_attr_check_elem *check)
 {
  int i;
 
- collect_some_attrs(path, num, check);
+ collect_some_attrs(path, pathlen, num, check);
 
  for (i = 0; i < num; i++) {
  const char *value = check_all_attr[check[i].attr->attr_nr].value;
@@ -791,7 +790,7 @@ void git_all_attrs(const char *path, struct git_attr_check *check)
  int i;
 
  git_attr_check_clear(check);
- collect_some_attrs(path, 0, NULL);
+ collect_some_attrs(path, strlen(path), 0, NULL);
 
  for (i = 0; i < attr_nr; i++) {
  const char *name = check_all_attr[i].attr->name;
@@ -816,10 +815,16 @@ void git_attr_set_direction(enum git_attr_direction new, struct index_state *ist
  use_index = istate;
 }
 
-int git_check_attr(const char *path, struct git_attr_check *check)
+int git_check_attr_counted(const char *path, int pathlen,
+   struct git_attr_check *check)
 {
  check->finalized = 1;
- return git_check_attrs(path, check->check_nr, check->check);
+ return git_check_attrs(path, pathlen, check->check_nr, check->check);
+}
+
+int git_check_attr(const char *path, struct git_attr_check *check)
+{
+ return git_check_attr_counted(path, strlen(path), check);
 }
 
 struct git_attr_check *git_attr_check_initl(const char *one, ...)
diff --git a/attr.h b/attr.h
index 51ca36d..4a4ac76 100644
--- a/attr.h
+++ b/attr.h
@@ -38,6 +38,7 @@ struct git_attr_check {
 
 extern struct git_attr_check *git_attr_check_initl(const char *, ...);
 extern int git_check_attr(const char *path, struct git_attr_check *);
+extern int git_check_attr_counted(const char *, int, struct git_attr_check *);
 
 extern struct git_attr_check *git_attr_check_alloc(void);
 extern void git_attr_check_append(struct git_attr_check *, const struct git_attr *);
--
2.8.2-759-geb611ab

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH] attr: add counted string version of git_attr()

Junio C Hamano
In reply to this post by Junio C Hamano
Often a potential caller has <name, namelen> pair that
represents the name it wants to create an attribute out of.

When name[namelen] is not NUL, the caller has to xmemdupz()
only to call git_attr().

Add git_attr_counted() that takes such a counted string instead of
"const char *name".

Signed-off-by: Junio C Hamano <[hidden email]>
---
 * The same for git_attr()

 attr.c | 8 ++++----
 attr.h | 5 ++++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/attr.c b/attr.c
index eeb29e6..7f20e21 100644
--- a/attr.c
+++ b/attr.c
@@ -78,7 +78,7 @@ static int invalid_attr_name(const char *name, int namelen)
  return 0;
 }
 
-static struct git_attr *git_attr_internal(const char *name, int len)
+struct git_attr *git_attr_counted(const char *name, size_t len)
 {
  unsigned hval = hash_name(name, len);
  unsigned pos = hval % HASHSIZE;
@@ -109,7 +109,7 @@ static struct git_attr *git_attr_internal(const char *name, int len)
 
 struct git_attr *git_attr(const char *name)
 {
- return git_attr_internal(name, strlen(name));
+ return git_attr_counted(name, strlen(name));
 }
 
 /* What does a matched pattern decide? */
@@ -199,7 +199,7 @@ static const char *parse_attr(const char *src, int lineno, const char *cp,
  else {
  e->setto = xmemdupz(equals + 1, ep - equals - 1);
  }
- e->attr = git_attr_internal(cp, len);
+ e->attr = git_attr_counted(cp, len);
  }
  return ep + strspn(ep, blank);
 }
@@ -254,7 +254,7 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
       sizeof(struct attr_state) * num_attr +
       (is_macro ? 0 : namelen + 1));
  if (is_macro) {
- res->u.attr = git_attr_internal(name, namelen);
+ res->u.attr = git_attr_counted(name, namelen);
  res->u.attr->maybe_macro = 1;
  } else {
  char *p = (char *)&(res->state[num_attr]);
diff --git a/attr.h b/attr.h
index 4a4ac76..78d6d5a 100644
--- a/attr.h
+++ b/attr.h
@@ -8,7 +8,10 @@ struct git_attr;
  * Given a string, return the gitattribute object that
  * corresponds to it.
  */
-struct git_attr *git_attr(const char *);
+extern struct git_attr *git_attr(const char *);
+
+/* The same, but with counted string */
+extern struct git_attr *git_attr_counted(const char *, size_t);
 
 /* Internal use */
 extern const char git_attr__true[];
--
2.8.2-759-geb611ab

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH] attr: expose validity check for attribute names

Junio C Hamano
In reply to this post by Junio C Hamano
Export attr_name_valid() function, and a helper function that
returns the message to be given when a given <name, len> pair
is not a good name for an attribute.

We could later update the message to exactly spell out what the
rules for a good attribute name are, etc.

Signed-off-by: Junio C Hamano <[hidden email]>
---
 * And then finally the error message one.  I didn't change
   the message itself to spell out the rules here, though.

 attr.c | 39 +++++++++++++++++++++++++--------------
 attr.h |  3 +++
 2 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/attr.c b/attr.c
index 7f20e21..8f54871 100644
--- a/attr.c
+++ b/attr.c
@@ -59,23 +59,38 @@ static unsigned hash_name(const char *name, int namelen)
  return val;
 }
 
-static int invalid_attr_name(const char *name, int namelen)
+int attr_name_valid(const char *name, size_t namelen)
 {
  /*
  * Attribute name cannot begin with '-' and must consist of
  * characters from [-A-Za-z0-9_.].
  */
  if (namelen <= 0 || *name == '-')
- return -1;
+ return 0;
  while (namelen--) {
  char ch = *name++;
  if (! (ch == '-' || ch == '.' || ch == '_' ||
        ('0' <= ch && ch <= '9') ||
        ('a' <= ch && ch <= 'z') ||
        ('A' <= ch && ch <= 'Z')) )
- return -1;
+ return 0;
  }
- return 0;
+ return -1;
+}
+
+void invalid_attr_name_message(struct strbuf *err, const char *name, int len)
+{
+ strbuf_addf(err, _("%.*s is not a valid attribute name"),
+    len, name);
+}
+
+static void report_invalid_attr(const char *name, size_t len,
+ const char *src, int lineno)
+{
+ struct strbuf err = STRBUF_INIT;
+ invalid_attr_name_message(&err, name, len);
+ fprintf(stderr, "%s: %s:%d\n", err.buf, src, lineno);
+ strbuf_release(&err);
 }
 
 struct git_attr *git_attr_counted(const char *name, size_t len)
@@ -90,7 +105,7 @@ struct git_attr *git_attr_counted(const char *name, size_t len)
  return a;
  }
 
- if (invalid_attr_name(name, len))
+ if (!attr_name_valid(name, len))
  return NULL;
 
  FLEX_ALLOC_MEM(a, name, name, len);
@@ -176,17 +191,15 @@ static const char *parse_attr(const char *src, int lineno, const char *cp,
  cp++;
  len--;
  }
- if (invalid_attr_name(cp, len)) {
- fprintf(stderr,
- "%.*s is not a valid attribute name: %s:%d\n",
- len, cp, src, lineno);
+ if (!attr_name_valid(cp, len)) {
+ report_invalid_attr(cp, len, src, lineno);
  return NULL;
  }
  } else {
  /*
  * As this function is always called twice, once with
  * e == NULL in the first pass and then e != NULL in
- * the second pass, no need for invalid_attr_name()
+ * the second pass, no need for attr_name_valid()
  * check here.
  */
  if (*cp == '-' || *cp == '!') {
@@ -229,10 +242,8 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
  name += strlen(ATTRIBUTE_MACRO_PREFIX);
  name += strspn(name, blank);
  namelen = strcspn(name, blank);
- if (invalid_attr_name(name, namelen)) {
- fprintf(stderr,
- "%.*s is not a valid attribute name: %s:%d\n",
- namelen, name, src, lineno);
+ if (!attr_name_valid(name, namelen)) {
+ report_invalid_attr(name, namelen, src, lineno);
  return NULL;
  }
  }
diff --git a/attr.h b/attr.h
index 78d6d5a..fc72030 100644
--- a/attr.h
+++ b/attr.h
@@ -13,6 +13,9 @@ extern struct git_attr *git_attr(const char *);
 /* The same, but with counted string */
 extern struct git_attr *git_attr_counted(const char *, size_t);
 
+extern int attr_name_valid(const char *name, size_t namelen);
+extern void invalid_attr_name_message(struct strbuf *, const char *, int);
+
 /* Internal use */
 extern const char git_attr__true[];
 extern const char git_attr__false[];
--
2.8.2-759-geb611ab

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] attr: add counted string version of git_attr()

Stefan Beller-4
In reply to this post by Junio C Hamano
On Wed, May 18, 2016 at 2:03 PM, Junio C Hamano <[hidden email]> wrote:
> Often a potential caller has <name, namelen> pair that
> represents the name it wants to create an attribute out of.
>
> When name[namelen] is not NUL, the caller has to xmemdupz()
> only to call git_attr().
>
> Add git_attr_counted() that takes such a counted string instead of
> "const char *name".

s/ "const char *name"/a standard C string that is null terminated/ maybe?

"const char *name" is not a strong hint that it is not counted?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] attr: expose validity check for attribute names

Stefan Beller-4
In reply to this post by Junio C Hamano
On Wed, May 18, 2016 at 2:05 PM, Junio C Hamano <[hidden email]> wrote:

> Export attr_name_valid() function, and a helper function that
> returns the message to be given when a given <name, len> pair
> is not a good name for an attribute.
>
> We could later update the message to exactly spell out what the
> rules for a good attribute name are, etc.
>
> Signed-off-by: Junio C Hamano <[hidden email]>
> ---
>  * And then finally the error message one.  I didn't change
>    the message itself to spell out the rules here, though.
>
>  attr.c | 39 +++++++++++++++++++++++++--------------
>  attr.h |  3 +++
>  2 files changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/attr.c b/attr.c
> index 7f20e21..8f54871 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -59,23 +59,38 @@ static unsigned hash_name(const char *name, int namelen)
>         return val;
>  }
>
> -static int invalid_attr_name(const char *name, int namelen)
> +int attr_name_valid(const char *name, size_t namelen)

Mental note:
The patch series with the pathspec attrs on top of this series makes this
non-static and exposes the invalid_attr_name for wider use, so I'll resolve
the merge conflict accordingly.

>  {
>         /*
>          * Attribute name cannot begin with '-' and must consist of
>          * characters from [-A-Za-z0-9_.].
>          */
>         if (namelen <= 0 || *name == '-')
> -               return -1;
> +               return 0;
>         while (namelen--) {
>                 char ch = *name++;
>                 if (! (ch == '-' || ch == '.' || ch == '_' ||
>                        ('0' <= ch && ch <= '9') ||
>                        ('a' <= ch && ch <= 'z') ||
>                        ('A' <= ch && ch <= 'Z')) )
> -                       return -1;
> +                       return 0;
>         }
> -       return 0;
> +       return -1;
> +}
> +
> +void invalid_attr_name_message(struct strbuf *err, const char *name, int len)
> +{
> +       strbuf_addf(err, _("%.*s is not a valid attribute name"),
> +                   len, name);
> +}
> +
> +static void report_invalid_attr(const char *name, size_t len,
> +                               const char *src, int lineno)
> +{
> +       struct strbuf err = STRBUF_INIT;
> +       invalid_attr_name_message(&err, name, len);
> +       fprintf(stderr, "%s: %s:%d\n", err.buf, src, lineno);
> +       strbuf_release(&err);
>  }
>
>  struct git_attr *git_attr_counted(const char *name, size_t len)
> @@ -90,7 +105,7 @@ struct git_attr *git_attr_counted(const char *name, size_t len)
>                         return a;
>         }
>
> -       if (invalid_attr_name(name, len))
> +       if (!attr_name_valid(name, len))
>                 return NULL;
>
>         FLEX_ALLOC_MEM(a, name, name, len);
> @@ -176,17 +191,15 @@ static const char *parse_attr(const char *src, int lineno, const char *cp,
>                         cp++;
>                         len--;
>                 }
> -               if (invalid_attr_name(cp, len)) {
> -                       fprintf(stderr,
> -                               "%.*s is not a valid attribute name: %s:%d\n",
> -                               len, cp, src, lineno);
> +               if (!attr_name_valid(cp, len)) {
> +                       report_invalid_attr(cp, len, src, lineno);
>                         return NULL;
>                 }
>         } else {
>                 /*
>                  * As this function is always called twice, once with
>                  * e == NULL in the first pass and then e != NULL in
> -                * the second pass, no need for invalid_attr_name()
> +                * the second pass, no need for attr_name_valid()
>                  * check here.
>                  */
>                 if (*cp == '-' || *cp == '!') {
> @@ -229,10 +242,8 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
>                 name += strlen(ATTRIBUTE_MACRO_PREFIX);
>                 name += strspn(name, blank);
>                 namelen = strcspn(name, blank);
> -               if (invalid_attr_name(name, namelen)) {
> -                       fprintf(stderr,
> -                               "%.*s is not a valid attribute name: %s:%d\n",
> -                               namelen, name, src, lineno);
> +               if (!attr_name_valid(name, namelen)) {
> +                       report_invalid_attr(name, namelen, src, lineno);
>                         return NULL;
>                 }
>         }
> diff --git a/attr.h b/attr.h
> index 78d6d5a..fc72030 100644
> --- a/attr.h
> +++ b/attr.h
> @@ -13,6 +13,9 @@ extern struct git_attr *git_attr(const char *);
>  /* The same, but with counted string */
>  extern struct git_attr *git_attr_counted(const char *, size_t);
>
> +extern int attr_name_valid(const char *name, size_t namelen);

ok, I agree with this one.

> +extern void invalid_attr_name_message(struct strbuf *, const char *, int);

This makes sense, too, so the caller can just collect the error message and
attach its own flair to it, i.e. another message to explain the context about
wrong attributes.

Thanks,
Stefan

> +
>  /* Internal use */
>  extern const char git_attr__true[];
>  extern const char git_attr__false[];
> --
> 2.8.2-759-geb611ab
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHv7 5/5] pathspec: allow querying for attributes

Stefan Beller-4
In reply to this post by Junio C Hamano
On Wed, May 18, 2016 at 12:47 PM, Junio C Hamano <[hidden email]> wrote:
> Stefan Beller <[hidden email]> writes:
>
>> + - "`+`" the attribute must be set
>> + - "`-`" the attribute must be unset
>> + - "`~`" the attribute must be unspecified

Well the '-' is in line, at least.
So you're proposing empty string for set?

gitattributes documentation:
> Sometimes you would need to override an setting of an attribute for a
> path to Unspecified state. This can be done by listing the name of the
> attribute prefixed with an exclamation point !.

So the unspecified should be '!' to match Git consistency.

[I dislike the exclamation mark because of bash.
    git ls-files ":(attr:!text)"
    bash: !text: event not found
    git ls-files ":(attr:\!text)"
    fatal: attr spec '\!text': attrs must not start with '-' and be
composed of [-A-Za-z0-9_.].
    # we error out because of the \ in there
    git ls-files :\(attr:\!text\)
    ...
    # that actually works.
]

>
> I think using these three, not the way how .gitattributes specify
> them, is highly confusing to the users.

Ok I'll align those.

>
>> + - "`?`" the attribute must not be unspecified, i.e. set, unset or value matches
>
> The choice of '?' is OK, as there is no reason .gitattributes wants
> to say something fuzzy like "this is set or unset or has value".
> The last part "set, unset or value matches" does not make sense,
> though.  Did you mean "set, unset or set to value"?

Actually I meant "this is set or unset or has [any] value".

>
>> + - an empty "`[mode]`" matches if the attribute is set or has a value
>
> The same comment as +/-/~ above applies.

So consistency would ask for

 - an empty "`[mode]`" matches if the attribute is set
 - "`-`" the attribute must be unset
 - "`!`" the attribute must be unspecified
 - an empty "`[mode]`" combined with "`[=value]`" matches if the attribute has
   the given value.

and those are not yet codified, but for discussion then:

 - "`?`" the attribute must not be unspecified, i.e. set, unset or has any value
 - "`+`" the attribute must be set or has any value

>> +
>> +             match_mode = item->attr_match[i].match_mode;
>
> Mental note: there is a one-to-one correspondence between
> attr_check->check[] and attr_match[].

Right. I guess I should add a comment for that. Or better yet a test
with some weird pathspec making use of that

    git ls-files :(attr:labelA !labelB +labelA)

That way we would see if there will be confusion between the modes
and attr names in the future.

>
>> +             if (!matched)
>> +                     return 0;
>
> So this ANDs together.  OK.

Sure, (it's clear to me, maybe I should document that as well),
because multiple pathspecs are OR.

    git ls-files :(attr:labelA) :(attr:labelB)

>
> Mental note after skiming the caller:
>
> The "value" here is like "VAR1=VAL1 VAR2 !VAR3" in a pathspec
> ":(attr:VAR1=VAL1 VAR2 !VAR3)path/to/limit", i.e. the source string
> to be compiled into a list of conditionals that will be evaluated by
> match_attr() in dir.c

Right.

>> +     string_list_split(&list, value, ' ', -1);
>> +     string_list_remove_empty_items(&list, 0);

Right.

>
> At this point, we got "VAR1=VAL1", "VAR2", "!VAR3" as elements in
> this list.

Right.

>
>> +     if (!item->attr_check)
>> +             item->attr_check = git_attr_check_alloc();
>
> Given more than one "attr" magic, e.g. ":(attr:A=a,attr:B)/path",
> the check may not be empty when we process the second one; we just
> extend it without losing the existing contents.

That is why I am not super happy with it though.

    ":(attr:A=a,attr:B)/path",
    ":(attr:A=a B)/path",

are the same for the user as well as in the internal data structures.
This "wastes" the white space as a possible convenient separator
character, e.g. for multiple values. On the other hand it will be easier
to type, specially for many attrs (read submodule groups).

>
>> +     ALLOC_GROW(item->attr_match, item->attr_match_nr + list.nr, item->attr_match_alloc);
>
> Likewise, we extend attr_match[] array.

>> +
>> +             val_len = strcspn(val, "=,)");
>
> I understand "=", but can "," and ")" appear here?

This was overly caution from some intermediate state, where the caller
handed in more than required.

    if (skip_prefix(copyfrom, "attr:", &body)) {
        char *pass = xmemdupz(body, len - strlen("attr:"));
        parse_pathspec_attr_match(item, pass);
        free(pass);
        continue;
    }

When given ":(attr:VAR1=VAL1 VAR2 !VAR3)path/to/limit"
this only passes in "VAR1=VAL1 VAR2 !VAR3".

I contemplated write the caller as

    if (skip_prefix(copyfrom, "attr:", &body)) {
        parse_pathspec_attr_match(item, body);
        continue;
    }

but that would include further characters, i.e. ',' or ')'.
(but not a lot more)

I wanted to pass in an uncounted string though, as
string_list_separate doesn't offer a variant for counted
strings.

>
>> +             if (val[val_len] == '=') {
>> +                     am->match_mode = MATCH_VALUE;
>> +                     am->value = xstrdup(&val[val_len + 1]);
>> +                     /*
>> +                      * NEEDSWORK:
>> +                      * Do we want to allow escaped commas to search
>> +                      * for comma separated values?
>> +                      */
>
> No, we don't ;-).

ok will drop comment, ...

>
>> +                     if (strchr(am->value, '\\'))
>> +                             die(_("attr spec values must not contain backslashes"));
>
> But this is a good escape hatch to reserve for our own use, until we
> decide what the quoting mechanism will be (or if it is necessary).

but keep the code here

>
>> +             } else
>> +                     am->value = NULL;
>> +
>> +             if (invalid_attr_name(val, val_len)) {
>> +                     am->match_mode = INVALID_ATTR;
>> +                     goto err;
>> +             }
>> +
>> +             am->attr = git_attr(xmemdupz(val, val_len));
>> +             git_attr_check_append(item->attr_check, am->attr);
>
> GOOD!

except for the memory leak of xmemdupz. git_attr makes another
internal copy with FLEX_ALLOC_MEM. That ill be fixed once we use
git_attr_counted though.



>
> I think val and val_len should be renamed to attr and attr_len (in
> the VARIABLE=VALUE context, these two identifiers are about parsing
> the variable side, not the value side).

ok

>
>> +     }
>> +
>> +     string_list_clear(&list, 0);
>> +     return;
>> +err:
>> +     die(_("attr spec '%s': attrs must not start with '-' and "
>> +           "be composed of [-A-Za-z0-9_.]."), value);
>
> What is "value" at this point?  If you failed to parse the second
> element in "VAR1=VAL1 FR*TZ=VAL2 !VAR3", I think you would want to
> say "'FR*TZ' is malformed".
>
> Existing users of the function seems to say this:
>
>                 if (invalid_attr_name(cp, len)) {
>                         fprintf(stderr,
>                                 "%.*s is not a valid attribute name: %s:%d\n",
>                                 len, cp, src, lineno);
>                         return NULL;
>                 }
>
> when parsing .gitattribute file.  The source file:line reference
> does not apply to this context, but it would be better to unify the
> message.  I do not mind spelling the rules out explicitly like you
> did, but I do not want to see it spread across many places (which
> forces us to update them when we have to change the rule later).
>
> Perhaps we want a helper function in attr.c side that takes the
> attribute name string (val, val_len in your code above, which I
> suggest to be renamed to attr, attr_len) and formats the error
> message into a strbuf, or something?

That's why you added the reporting function to attrs, I see.

>
> Makes me wonder what "pass" stands for.  From the look of xmemdupz(),
> given ":(attr:VAR1=VAL1 VAR2 !VAR3)path/to/limit" to parse, len
> points at ")" and xmemdupz() gives "VAR1=VAL1 VAR2 !VAR3" to it.

"what we _pass_ to the parse_pathspec_attr_match"... Maybe attr_body?

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCHv7 5/5] pathspec: allow querying for attributes

Junio C Hamano
Stefan Beller <[hidden email]> writes:

> and those are not yet codified, but for discussion then:
>
>  - "`?`" the attribute must not be unspecified, i.e. set, unset or has any value
>  - "`+`" the attribute must be set or has any value

I'd suggest not to support these until we see any concrete and
convincing example use case to tell us why this is useful.

> That is why I am not super happy with it though.
>
>     ":(attr:A=a,attr:B)/path",
>     ":(attr:A=a B)/path",
>
> are the same for the user as well as in the internal data structures.

You can interpret the former as ORed :(attr:A=a) and :(attr:B) if
you really wanted to.  You left the door open for such a future
extension by explicitly rejecting multiple attributes added to a
single pathspec element in earlier round, which was good.  You could
do the same here if you do not want to code that ORed ANDs, so that
it (or some other semantics) can be introduced later without breaking
the mental model the users would form with the initial implementation.

>>> +             val_len = strcspn(val, "=,)");
>>
>> I understand "=", but can "," and ")" appear here?
>
> This was overly caution from some intermediate state, where the caller
> handed in more than required.

Is that being overly cautious?

It looks to me more like being sloppy and sweeping bugs in callers
that you could have diagnosed here.

If you didn't have ",)" in the cspn set above, at least you would
see the effect caused by a broken caller.  E.g. ":(attr:FOO,icase)"
would be given to you as "FOO,icase)" by a broken caller, you would
split "FOO,icase" as one of the SP separated items, and try to parse
it as an attribute name.  You can notice the breakage of the caller
at that time.  With ",)" in cspn set, you silently pass the parameter
given by a broken caller.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...