[RFC-PATCHv6 0/4] pathspec attrs [WAS pathspec labels [WAS submodule groups]]

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC-PATCHv6 0/4] pathspec attrs [WAS pathspec labels [WAS submodule groups]]

Stefan Beller-4
  This goes on top of origin/jc/attr and is RFC as
  I did not write tests nor documentation, yet.
  I wanted to understand Junios series, so I built on top.
 
The meat is in the last patch, which allows for

    git ls-files :(attr:-text)path/pattern       # (ATTR_FALSE)
    git ls-files :(attr:+text)path/pattern       # (ATTR_TRUE)
    git ls-files :(attr:eol=input)path/pattern   # values must match
   
    git ls-files :(attr:!text)path/pattern       # find patterns with no "text" attribute, i.e. neither FALSE, TRUE or value
    git ls-files :(attr:text)path/pattern        # opposite of !;  find files which are TRUE, FALSE or value
 
Of course you can chain them:

    git ls-files :(attr:text,attr:eol=lf)path/pattern # must match both attr specs.
   
Feedback on the parsing and design welcome,

Thanks,
Stefan
   

Stefan Beller (4):
  Documentation: fix a typo
  pathspec: move long magic parsing out of prefix_pathspec
  pathspec: move prefix check out of the inner loop
  pathspec: allow querying for attributes

 Documentation/gitattributes.txt |   2 +-
 attr.c                          |   2 +-
 attr.h                          |   2 +
 dir.c                           |  49 ++++++++++++
 pathspec.c                      | 172 +++++++++++++++++++++++++++++++---------
 pathspec.h                      |  16 ++++
 6 files changed, 203 insertions(+), 40 deletions(-)

--
2.8.2.401.g9c0faef

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[RFC-PATCHv6 1/4] Documentation: fix a typo

Stefan Beller-4
Signed-off-by: Stefan Beller <[hidden email]>
---
 Documentation/gitattributes.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..af2c682 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -86,7 +86,7 @@ is either not set or empty, $HOME/.config/git/attributes is used instead.
 Attributes for all users on a system should be placed in the
 `$(prefix)/etc/gitattributes` file.
 
-Sometimes you would need to override an setting of an attribute
+Sometimes you would need to override a setting of an attribute
 for a path to `Unspecified` state.  This can be done by listing
 the name of the attribute prefixed with an exclamation point `!`.
 
--
2.8.2.401.g9c0faef

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[RFC-PATCHv6 2/4] pathspec: move long magic parsing out of prefix_pathspec

Stefan Beller-4
In reply to this post by Stefan Beller-4
`prefix_pathspec` is quite a lengthy function and we plan on adding more.
Split it up for better readability. As we want to add code into the
inner loop of the long magic parsing, we also benefit from lower
indentation.

Signed-off-by: Stefan Beller <[hidden email]>
---
 pathspec.c | 84 +++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 47 insertions(+), 37 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index c9e9b6c..eba37c2 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -88,6 +88,52 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen,
  strbuf_addf(sb, ",prefix:%d)", prefixlen);
 }
 
+static void eat_long_magic(struct pathspec_item *item, const char *elt,
+ unsigned *magic, int *pathspec_prefix,
+ const char **copyfrom_, const char **long_magic_end)
+{
+ int i;
+ const char *copyfrom = *copyfrom_;
+ /* longhand */
+ const char *nextat;
+ for (copyfrom = elt + 2;
+     *copyfrom && *copyfrom != ')';
+     copyfrom = nextat) {
+ size_t len = strcspn(copyfrom, ",)");
+ if (copyfrom[len] == ',')
+ nextat = copyfrom + len + 1;
+ else
+ /* handle ')' and '\0' */
+ nextat = copyfrom + len;
+ if (!len)
+ continue;
+ for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
+ if (strlen(pathspec_magic[i].name) == len &&
+    !strncmp(pathspec_magic[i].name, copyfrom, len)) {
+ *magic |= pathspec_magic[i].bit;
+ break;
+ }
+ if (starts_with(copyfrom, "prefix:")) {
+ char *endptr;
+ *pathspec_prefix = strtol(copyfrom + 7,
+  &endptr, 10);
+ if (endptr - copyfrom != len)
+ die(_("invalid parameter for pathspec magic 'prefix'"));
+ /* "i" would be wrong, but it does not matter */
+ break;
+ }
+ }
+ if (ARRAY_SIZE(pathspec_magic) <= i)
+ die(_("Invalid pathspec magic '%.*s' in '%s'"),
+    (int) len, copyfrom, elt);
+ }
+ if (*copyfrom != ')')
+ die(_("Missing ')' at the end of pathspec magic in '%s'"), elt);
+ *long_magic_end = copyfrom;
+ copyfrom++;
+ *copyfrom_ = copyfrom;
+}
+
 /*
  * Take an element of a pathspec and check for magic signatures.
  * Append the result to the prefix. Return the magic bitmap.
@@ -150,43 +196,7 @@ static unsigned prefix_pathspec(struct pathspec_item *item,
     (flags & PATHSPEC_LITERAL_PATH)) {
  ; /* nothing to do */
  } else if (elt[1] == '(') {
- /* longhand */
- const char *nextat;
- for (copyfrom = elt + 2;
-     *copyfrom && *copyfrom != ')';
-     copyfrom = nextat) {
- size_t len = strcspn(copyfrom, ",)");
- if (copyfrom[len] == ',')
- nextat = copyfrom + len + 1;
- else
- /* handle ')' and '\0' */
- nextat = copyfrom + len;
- if (!len)
- continue;
- for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
- if (strlen(pathspec_magic[i].name) == len &&
-    !strncmp(pathspec_magic[i].name, copyfrom, len)) {
- magic |= pathspec_magic[i].bit;
- break;
- }
- if (starts_with(copyfrom, "prefix:")) {
- char *endptr;
- pathspec_prefix = strtol(copyfrom + 7,
- &endptr, 10);
- if (endptr - copyfrom != len)
- die(_("invalid parameter for pathspec magic 'prefix'"));
- /* "i" would be wrong, but it does not matter */
- break;
- }
- }
- if (ARRAY_SIZE(pathspec_magic) <= i)
- die(_("Invalid pathspec magic '%.*s' in '%s'"),
-    (int) len, copyfrom, elt);
- }
- if (*copyfrom != ')')
- die(_("Missing ')' at the end of pathspec magic in '%s'"), elt);
- long_magic_end = copyfrom;
- copyfrom++;
+ eat_long_magic(item, elt, &magic, &pathspec_prefix, &copyfrom, &long_magic_end);
  } else {
  /* shorthand */
  for (copyfrom = elt + 1;
--
2.8.2.401.g9c0faef

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[RFC-PATCHv6 3/4] pathspec: move prefix check out of the inner loop

Stefan Beller-4
In reply to this post by Stefan Beller-4
The prefix check is not related the check of pathspec magic; also there
is no code that is relevant after we'd break the loop on a match for
"prefix:". So move the check before the loop and shortcircuit the outer
loop.

Signed-off-by: Stefan Beller <[hidden email]>
---
 pathspec.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index eba37c2..4dff252 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -107,21 +107,22 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt,
  nextat = copyfrom + len;
  if (!len)
  continue;
+
+ if (starts_with(copyfrom, "prefix:")) {
+ char *endptr;
+ *pathspec_prefix = strtol(copyfrom + 7,
+  &endptr, 10);
+ if (endptr - copyfrom != len)
+ die(_("invalid parameter for pathspec magic 'prefix'"));
+ continue;
+ }
+
  for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
  if (strlen(pathspec_magic[i].name) == len &&
     !strncmp(pathspec_magic[i].name, copyfrom, len)) {
  *magic |= pathspec_magic[i].bit;
  break;
  }
- if (starts_with(copyfrom, "prefix:")) {
- char *endptr;
- *pathspec_prefix = strtol(copyfrom + 7,
-  &endptr, 10);
- if (endptr - copyfrom != len)
- die(_("invalid parameter for pathspec magic 'prefix'"));
- /* "i" would be wrong, but it does not matter */
- break;
- }
  }
  if (ARRAY_SIZE(pathspec_magic) <= i)
  die(_("Invalid pathspec magic '%.*s' in '%s'"),
--
2.8.2.401.g9c0faef

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Stefan Beller-4
In reply to this post by Stefan Beller-4
The pathspec mechanism is extended via the new
":(attr:eol=input)pattern/to/match" syntax to filter paths so that it
requires paths to not just match the given pattern but also have the
specified attrs attached for them to be chosen.

Signed-off-by: Stefan Beller <[hidden email]>
---
 attr.c     |  2 +-
 attr.h     |  2 ++
 dir.c      | 49 +++++++++++++++++++++++++++++++++
 pathspec.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 pathspec.h | 16 +++++++++++
 5 files changed, 157 insertions(+), 5 deletions(-)

diff --git a/attr.c b/attr.c
index 5178eb8..2d68a3c 100644
--- a/attr.c
+++ b/attr.c
@@ -59,7 +59,7 @@ static unsigned hash_name(const char *name, int namelen)
  return val;
 }
 
-static int invalid_attr_name(const char *name, int namelen)
+int invalid_attr_name(const char *name, int namelen)
 {
  /*
  * Attribute name cannot begin with '-' and must consist of
diff --git a/attr.h b/attr.h
index 7dc49f8..7fd8b90 100644
--- a/attr.h
+++ b/attr.h
@@ -45,6 +45,8 @@ extern void git_attr_check_append(struct git_attr_check *, const char *);
 extern void git_attr_check_clear(struct git_attr_check *);
 extern void git_attr_check_free(struct git_attr_check *);
 
+extern int invalid_attr_name(const char *name, int namelen);
+
 /*
  * Return the name of the attribute represented by the argument.  The
  * return value is a pointer to a null-delimited string that is part
diff --git a/dir.c b/dir.c
index 996653b..540ea1f 100644
--- a/dir.c
+++ b/dir.c
@@ -9,6 +9,7 @@
  */
 #include "cache.h"
 #include "dir.h"
+#include "attr.h"
 #include "refs.h"
 #include "wildmatch.h"
 #include "pathspec.h"
@@ -215,6 +216,51 @@ int within_depth(const char *name, int namelen,
  return 1;
 }
 
+static struct git_attr_check *check;
+static int match_attrs(const char *name, int namelen,
+       const struct pathspec_item *item)
+{
+ char *path;
+ int i;
+
+ if (!check) {
+ check = git_attr_check_alloc();
+ for (i = 0; i < item->attr_nr; i++)
+ git_attr_check_append(check, item->attrs[i].attr);
+ }
+
+ path = xmemdupz(name, namelen);
+ git_all_attrs(path, check);
+
+ for (i = 0; i < item->attr_nr; i++) {
+ int matched;
+ const char *value = check->check[i].value;
+
+ if (ATTR_TRUE(value)) {
+ matched = (item->attrs[i].mode == MATCH_SET ||
+   item->attrs[i].mode == MATCH_NOT_UNSPECIFIED);
+ } else if (ATTR_FALSE(value)) {
+ matched = (item->attrs[i].mode == MATCH_UNSET ||
+   item->attrs[i].mode == MATCH_NOT_UNSPECIFIED);
+ } else if (ATTR_UNSET(value)) {
+ matched = (item->attrs[i].mode == MATCH_UNSPECIFIED);
+ } else {
+ if (item->attrs[i].mode == MATCH_NOT_UNSPECIFIED) {
+ matched = 1;
+ } else {
+ /* NEEDSWORK: better value matching */
+ matched = !strcmp(item->attrs[i].value, value);
+ }
+ }
+ if (!matched)
+ return 0;
+ }
+
+ free(path);
+
+ return 1;
+}
+
 #define DO_MATCH_EXCLUDE   1
 #define DO_MATCH_DIRECTORY 2
 
@@ -270,6 +316,9 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
     strncmp(item->match, name - prefix, item->prefix))
  return 0;
 
+ if (item->attr_nr && !match_attrs(name, namelen, item))
+ return 0;
+
  /* If the match was just the prefix, we matched */
  if (!*match)
  return MATCHED_RECURSIVELY;
diff --git a/pathspec.c b/pathspec.c
index 4dff252..523ac8c 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "attr.h"
 
 /*
  * Finds which of the given pathspecs match items in the index.
@@ -88,12 +89,73 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen,
  strbuf_addf(sb, ",prefix:%d)", prefixlen);
 }
 
+/*
+ * Check if the given string is a good specification to search for attributes.
+ * Accepted strings:
+ * [+-!] ATTRIBUTE_NAME [= <space separated list of values> ]
+ *
+ * Examples:
+ * attr:+val to find value set to true
+ * attr:-val to find a value set to false
+ * attr:!val to find a value that is not set
+ *     (i.e. it is neither set as "val", "val=<empty>", nor unset as "-val")
+ * attr:val=value: to find value that have at least a and b set.
+ *
+ * This only checks one attr, multiple attrs need to be given as multiple
+ * strings.
+ */
+
+static void parse_attr_item(struct attr_item *attr, const char *value)
+{
+ size_t val_len;
+ const char *val = value;
+
+ if (!val || !strlen(val))
+ goto out;
+
+ if (val[0] == '+')
+ attr->mode = MATCH_SET;
+ else if (val[0] == '-')
+ attr->mode = MATCH_UNSET;
+ else if (val[0] == '!')
+ attr->mode = MATCH_UNSPECIFIED;
+ else
+ attr->mode = NOT_INIT;
+
+ if (attr->mode != NOT_INIT)
+ val++;
+
+ val_len = strcspn(val, "=,)");
+ if (val[val_len] == '=')
+ attr->mode = MATCH_VALUE;
+ else
+ attr->mode = MATCH_NOT_UNSPECIFIED;
+
+ if (invalid_attr_name(val, val_len))
+ goto out;
+
+ attr->attr = xmemdupz(val, val_len);
+
+ if (attr->mode == MATCH_VALUE) {
+ const char *after_equal = val + val_len + 1;
+ size_t after_equal_end = strcspn(after_equal, ",)");
+ attr->value = xmemdupz(after_equal, after_equal_end);
+ } else
+ attr->value = NULL;
+ return;
+out:
+ attr->mode = INVALID_ATTR;
+ warning(_("attr spec '%s': attrs must not start with '-' and "
+  "be composed of [-A-Za-z0-9_.]."), value);
+}
+
 static void eat_long_magic(struct pathspec_item *item, const char *elt,
  unsigned *magic, int *pathspec_prefix,
  const char **copyfrom_, const char **long_magic_end)
 {
  int i;
  const char *copyfrom = *copyfrom_;
+ const char *body;
  /* longhand */
  const char *nextat;
  for (copyfrom = elt + 2;
@@ -108,15 +170,20 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt,
  if (!len)
  continue;
 
- if (starts_with(copyfrom, "prefix:")) {
+ if (skip_prefix(copyfrom, "prefix:", &body)) {
  char *endptr;
- *pathspec_prefix = strtol(copyfrom + 7,
-  &endptr, 10);
+ *pathspec_prefix = strtol(body, &endptr, 10);
  if (endptr - copyfrom != len)
  die(_("invalid parameter for pathspec magic 'prefix'"));
  continue;
  }
 
+ if (skip_prefix(copyfrom, "attr:", &body)) {
+ ALLOC_GROW(item->attrs, item->attr_nr + 1, item->attr_alloc);
+ parse_attr_item(&item->attrs[item->attr_nr++], body);
+ continue;
+ }
+
  for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
  if (strlen(pathspec_magic[i].name) == len &&
     !strncmp(pathspec_magic[i].name, copyfrom, len)) {
@@ -425,7 +492,9 @@ void parse_pathspec(struct pathspec *pathspec,
  for (i = 0; i < n; i++) {
  unsigned short_magic;
  entry = argv[i];
-
+ item[i].attrs = NULL;
+ item[i].attr_nr = 0;
+ item[i].attr_alloc = 0;
  item[i].magic = prefix_pathspec(item + i, &short_magic,
  argv + i, flags,
  prefix, prefixlen, entry);
@@ -447,6 +516,13 @@ void parse_pathspec(struct pathspec *pathspec,
  if (item[i].nowildcard_len < item[i].len)
  pathspec->has_wildcard = 1;
  pathspec->magic |= item[i].magic;
+
+ if (item[i].attr_nr) {
+ int j;
+ for (j = 0; j < item[i].attr_nr; j++)
+ if (item[i].attrs[j].mode == INVALID_ATTR)
+ die(_("attribute spec in the wrong syntax are prohibited."));
+ }
  }
 
  if (nr_exclude == n)
@@ -502,6 +578,15 @@ void copy_pathspec(struct pathspec *dst, const struct pathspec *src)
 
 void free_pathspec(struct pathspec *pathspec)
 {
+ int i, j;
+ for (i = 0; i < pathspec->nr; i++) {
+ for (j = 0; j < pathspec->items[j].attr_nr; j++) {
+ free(pathspec->items[i].attrs[j].attr);
+ free(pathspec->items[i].attrs[j].value);
+ }
+ free(pathspec->items[i].attrs);
+ }
+
  free(pathspec->items);
  pathspec->items = NULL;
 }
diff --git a/pathspec.h b/pathspec.h
index 0c11262..89d73db 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -32,6 +32,21 @@ struct pathspec {
  int len, prefix;
  int nowildcard_len;
  int flags;
+ int attr_nr;
+ int attr_alloc;
+ struct attr_item {
+ char *attr;
+ char *value;
+ enum attr_match_mode {
+ NOT_INIT,
+ MATCH_SET,
+ MATCH_UNSET,
+ MATCH_VALUE,
+ MATCH_UNSPECIFIED,
+ MATCH_NOT_UNSPECIFIED,
+ INVALID_ATTR
+ } mode;
+ } *attrs;
  } *items;
 };
 
@@ -98,5 +113,6 @@ extern char *find_pathspecs_matching_against_index(const struct pathspec *pathsp
 extern void add_pathspec_matches_against_index(const struct pathspec *pathspec, char *seen);
 extern const char *check_path_for_gitlink(const char *path);
 extern void die_if_path_beyond_symlink(const char *path, const char *prefix);
+extern int validate_label_name(const char *label);
 
 #endif /* PATHSPEC_H */
--
2.8.2.401.g9c0faef

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 0/4] pathspec attrs [WAS pathspec labels [WAS submodule groups]]

Junio C Hamano
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:

>   I wanted to understand Junios series, so I built on top.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Junio C Hamano
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:

> + * attr:+val to find value set to true
> + * attr:-val to find a value set to false
> + * attr:!val to find a value that is not set
> + *     (i.e. it is neither set as "val", "val=<empty>", nor unset as "-val")
> + * attr:val=value: to find value that have at least a and b set.

I would have expected that there won't be "attr:+val", but it is
spelled as "attr:val" instead.

> +static void parse_attr_item(struct attr_item *attr, const char *value)

Please do not call something that is not part of the attribute
infrastructure as "attr_item"; I wasted time looking for the
structure definition for "attr_item" in <attr.h>.

> +static int match_attrs(const char *name, int namelen,
> +       const struct pathspec_item *item)
> +{
> + char *path;
> + int i;
> +
> + if (!check) {
> + check = git_attr_check_alloc();
> + for (i = 0; i < item->attr_nr; i++)
> + git_attr_check_append(check, item->attrs[i].attr);
> + }
> +
> + path = xmemdupz(name, namelen);
> + git_all_attrs(path, check);

PLEASE DON'T.  git_all_attrs() asks for all the attribute under the
sun and has no hope to perform sensibly, especially at the very leaf
level of the pathspec logic where one call to this function is made
for each and every path in the tree.

Instead, have a pointer to "struct git_attr_check" in pathspec_item
and make a call to git_check_attr(path, item->check) here.

Which means that you would need to prepare git_attr_check around ...

> + if (skip_prefix(copyfrom, "attr:", &body)) {
> + ALLOC_GROW(item->attrs, item->attr_nr + 1, item->attr_alloc);
> + parse_attr_item(&item->attrs[item->attr_nr++], body);

... HERE.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Junio C Hamano
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:

> +static struct git_attr_check *check;
> +static int match_attrs(const char *name, int namelen,
> +       const struct pathspec_item *item)
> +{
> + char *path;
> + int i;
> +
> + if (!check) {
> + check = git_attr_check_alloc();
> + for (i = 0; i < item->attr_nr; i++)
> + git_attr_check_append(check, item->attrs[i].attr);

This is simply wrong; you may have two pathspec elements with
attribute match magic, the first one may ask for one attribute while
the second one may ask for seven.  The first time around you
allocate and append one attribute.  The second time around you don't
do anything useful, and send a git_attr_check with one element to
deal with 7 attributes.

> + }
> + path = xmemdupz(name, namelen);
> + git_all_attrs(path, check);

However, the above "This is simply wrong" bogosity is covered
because git_all_attrs() is used here, ignoring what is in check.

The loop we see above is an expensive no-op, as the first thing
all_attrs() does is to empty check() and instead stuff it with every
attribute under the sun, not necessarily limited to attributes in
item->attrs[].

By the way, do not call an array as plural.  item->attr[i] is a good
name to call a single ith element in an array.  item->attrs[i] isn't.

> + for (i = 0; i < item->attr_nr; i++) {
> + int matched;
> + const char *value = check->check[i].value;

check[i] has no relevance to item->attrs[i] here.  I do not think
the code after this point is computing anything sensible.

> diff --git a/pathspec.h b/pathspec.h
> index 0c11262..89d73db 100644
> --- a/pathspec.h
> +++ b/pathspec.h
> @@ -32,6 +32,21 @@ struct pathspec {
>   int len, prefix;
>   int nowildcard_len;
>   int flags;
> + int attr_nr;
> + int attr_alloc;
> + struct attr_item {
> + char *attr;
> + char *value;
> + enum attr_match_mode {
> + NOT_INIT,
> + MATCH_SET,
> + MATCH_UNSET,
> + MATCH_VALUE,
> + MATCH_UNSPECIFIED,
> + MATCH_NOT_UNSPECIFIED,
> + INVALID_ATTR
> + } mode;
> + } *attrs;

I'd think the above addition that is in line with the updated API
would look more like this [*1*]:

        int attr_match_nr;
        int attr_match_alloc;
        struct attr_match {
        struct git_attr *attr;
                char *value;
                enum attr_match_mode {
                ...
                } match_mode;
        } *attr_match;
        struct git_attr_check *attr_check;

Then while parsing ":(attr:VAR1=VAL1 -VAR2 VAR3...)path/to/dir/",
you would first do:

        p->attr_check = git_attr_check_alloc();

once, and then send each of VAR1=VAL2, -VAR2, VAR3... to your
parse_one_item() helper function which would:

 * parse the match-mode like your code does;

 * parse out the attribute name (i.e. VAR1, VAR2 and VAR3), and
   instead of keeping it as a "(const) char *", call git_attr()
   to intern it (and keep it in local variable "attr"), and save
   it in p->attr_match[p->attr_nr].attr;

 * call git_attr_check_append(p->attr_check, git_attr_name(attr))

After the above finishes, you would end up with something like:

        .attr_match = {
            { .attr = git_attr("VAR1"), .value = "VAL2",
              .match_mode = MATCH_VALUE },
            { .attr = git_attr("VAR2"), .value = <does not matter>,
              .match_mode = MATCH_UNSET },
            ...
        },
        .attr_check = {
            .check = {
            { .attr = git_attr("VAR1"), .value = <does not matter> },
            { .attr = git_attr("VAR2"), .value = <does not matter> },
            { .attr = git_attr("VAR3"), .value = <does not matter> },
            }
           
When matching (i.e. the match_attrs() function), you would instead
do

        path = xmemdupz(name, namelen);
        git_check_attr(path, item->attr_check);

to grab values for only attributes that matter to you, instead of
calling git_all_attrs() [*2*].

After git_check_attr() returns, item->attr_check.check[0].attr would
be git_attr("VAR1") and item->attr_check.check[0].value would be
whatever setting the path has for the VAR1 attribute.  You can use
your match_mode logic to compare it with the values .attr_match
expects.

You do not necessarily have to have the same number of elements in
.attr_match and .attr_check.check by the way.  .attr_match might say

        VAR1=VAL2 !VAR1 -VAR1

which may be always false if these are ANDed together, but in order
to evaluate it, you need only one git_attr_check_elem for VAR1.


[Footnote]

*1* With the old API, things would not be that much different.
    Instead of single structure .attr_check, you would make an
    array of git_attr_check structure, exactly like the array
    at .attr_check.check[] in the new API by hand.  The new API
    makes this preparation simpler by managing the array on the API
    implementation side.

*2* Please do not use that silly function especially in performance
    sensitive codepath.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Stefan Beller-4
In reply to this post by Junio C Hamano
On Mon, May 16, 2016 at 9:23 PM, Junio C Hamano <[hidden email]> wrote:

> Stefan Beller <[hidden email]> writes:
>
>> + * attr:+val to find value set to true
>> + * attr:-val to find a value set to false
>> + * attr:!val to find a value that is not set
>> + *     (i.e. it is neither set as "val", "val=<empty>", nor unset as "-val")
>> + * attr:val=value: to find value that have at least a and b set.
>
> I would have expected that there won't be "attr:+val", but it is
> spelled as "attr:val" instead.

"val" matches if the attr is not unspecified, i.e. one of {true, false, value}
"+val" matches {true} only.

Maybe we want to redo that to

"val" matches {true} only.
"?val" matches {true, false, value}. (I can leave this case out in the
first series, too)

>
>> +static void parse_attr_item(struct attr_item *attr, const char *value)
>
> Please do not call something that is not part of the attribute
> infrastructure as "attr_item"; I wasted time looking for the
> structure definition for "attr_item" in <attr.h>.

So "parse_pathspec_attr_match" instead?

>
>> +static int match_attrs(const char *name, int namelen,
>> +                    const struct pathspec_item *item)
>> +{
>> +     char *path;
>> +     int i;
>> +
>> +     if (!check) {
>> +             check = git_attr_check_alloc();
>> +             for (i = 0; i < item->attr_nr; i++)
>> +                     git_attr_check_append(check, item->attrs[i].attr);
>> +     }
>> +
>> +     path = xmemdupz(name, namelen);
>> +     git_all_attrs(path, check);
>
> PLEASE DON'T.  git_all_attrs() asks for all the attribute under the
> sun and has no hope to perform sensibly, especially at the very leaf
> level of the pathspec logic where one call to this function is made
> for each and every path in the tree.

This is executed only once, as check is static? From a users perception
it doesn't matter if it is executed once just after parsing all pathspec
items or at the first path to check for a match, no?

The mistake is using the API wrong. So inside the '!check', after the
preparation loop of git_attr_check_append, we'd need to
hand over the "check" to git_check_attr instead?

>
> Instead, have a pointer to "struct git_attr_check" in pathspec_item
> and make a call to git_check_attr(path, item->check) here.

I see, then we have multiple `check` structs. Makes sense.

>
> Which means that you would need to prepare git_attr_check around ...
>
>> +             if (skip_prefix(copyfrom, "attr:", &body)) {
>> +                     ALLOC_GROW(item->attrs, item->attr_nr + 1, item->attr_alloc);
>> +                     parse_attr_item(&item->attrs[item->attr_nr++], body);
>
> ... HERE.
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Stefan Beller-4
In reply to this post by Junio C Hamano
On Mon, May 16, 2016 at 10:03 PM, Junio C Hamano <[hidden email]> wrote:

>
>         int attr_match_nr;
>         int attr_match_alloc;
>         struct attr_match {
>                 struct git_attr *attr;
>                 char *value;
>                 enum attr_match_mode {
>                         ...
>                 } match_mode;
>         } *attr_match;
>         struct git_attr_check *attr_check;

ok, I'll use that structure and go from here.


>
> Then while parsing ":(attr:VAR1=VAL1 -VAR2 VAR3...)path/to/dir/",

This syntax is not pleasant to parse IMHO as it is not clear if the token
after white space (-VAR2 here) is another attribute or the next part of
the list of VAR1, i.e. this could also be:

    :(attr:VAR1=VAL1 VAL2)

I wonder if we'd want to use the colon to separate different VARs:

    :(attr:VAR1=VAL1 VAL2:-VAR2:VAR3)

which means:

   match all path, that
    * have VAR1 set to a value list containing at least
      VAL1 and VAL2
    * have VAR2 unset
    * have VAR3 set to true.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Junio C Hamano
Stefan Beller <[hidden email]> writes:

>> Then while parsing ":(attr:VAR1=VAL1 -VAR2 VAR3...)path/to/dir/",
>
> This syntax is not pleasant to parse IMHO as it is not clear if the token
> after white space (-VAR2 here) is another attribute or the next part of
> the list of VAR1, ...

Remove the ambiguity by declaring that the list is always whitespace
separated.  No whitespace in var, no whitespace in val, no quoting.

The set of attributes with values expected to be used in the
pathspec "attribute match" magic, I do not think there is anything
that wants such a random arbitrary string.  The value side of an
attribute with value, e.g. "eol=crlf", "conflict-marker-size=7", is
designed to be a token that our C code is prepared to parse.

In other words, if you match the parsing semantics of parse_attr()
in attr.c, you are OK.  The attribute subsystem will not give users
anything that is more complex than what that routine is prepared to
parse, and that is a "whitespace separated list, no whitespace in
attribute names, no whitespace in values, no quoting".

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Stefan Beller-4
On Tue, May 17, 2016 at 10:34 AM, Junio C Hamano <[hidden email]> wrote:

> Stefan Beller <[hidden email]> writes:
>
>>> Then while parsing ":(attr:VAR1=VAL1 -VAR2 VAR3...)path/to/dir/",
>>
>> This syntax is not pleasant to parse IMHO as it is not clear if the token
>> after white space (-VAR2 here) is another attribute or the next part of
>> the list of VAR1, ...
>
> Remove the ambiguity by declaring that the list is always whitespace
> separated.  No whitespace in var, no whitespace in val, no quoting.
>
> The set of attributes with values expected to be used in the
> pathspec "attribute match" magic, I do not think there is anything
> that wants such a random arbitrary string.  The value side of an
> attribute with value, e.g. "eol=crlf", "conflict-marker-size=7", is
> designed to be a token that our C code is prepared to parse.

I am not talking about crazy stuff here, but consider our own
.gitattributes file:

    * whitespace=!indent,trail,space
    *.[ch] whitespace=indent,trail,space
    *.sh whitespace=indent,trail,space

Now I want to search for

    "the whitespace attribute that is set to at least trail and space"

We cannot use commas for the specification as they are used in .gitattributes,
because that would make it even harder, so
I would imagine this could be used:

     :(attr:whitespace=space trail)

See the whitespace separates the values, not the next variable.

To add another variable, I would suggest using a ':', such as

     :(attr:whitespace=space trail:text)

might a viable thing.



>
> In other words, if you match the parsing semantics of parse_attr()
> in attr.c, you are OK.  The attribute subsystem will not give users
> anything that is more complex than what that routine is prepared to
> parse, and that is a "whitespace separated list, no whitespace in
> attribute names, no whitespace in values, no quoting".
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Junio C Hamano
Stefan Beller <[hidden email]> writes:

> I am not talking about crazy stuff here, but consider our own
> .gitattributes file:
>
>     * whitespace=!indent,trail,space
>     *.[ch] whitespace=indent,trail,space
>     *.sh whitespace=indent,trail,space
>
> Now I want to search for
>
>     "the whitespace attribute that is set to at least trail and space"

With :(attr:VAR=VAL) syntax, you can only look for whitespace
attribute exactly set to "!indent,trail,space", so you can't.  So
you are talking about crazy stuff after all, aren't you?  Are you
now extending it to do "whitespace~=indent" or something like that?

You do not even need VAR=VAL form for your main topic.  You only
need group-doc group-code etc. to look for "set to TRUE", no?

I do not want to see us wasting too much time over-engineering it
without having a concrete and useful use case, and "find path to
whom whitespace checks are set for 'trail' and 'space'" is not.
These comma-separated tokens augment WS_DEFAULT_RULE which has
'trail' already, so you do not look for 'trail' in the first place.

"I want submodules under subs/ that is in either group-doc or
group-code" is more reasonable and realistic use case, but that does
not need any crazy stuff.  Either two separate pathspec elements,

        ":(attr:group-doc)subs/" ":(attr:group-code)/subs/"

or if we are to do the "ORed collection of ANDed attrs", it would be
something like:

        ":(attr:group-doc):(attr:group-code)/subs/"

no?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Stefan Beller-4
On Tue, May 17, 2016 at 11:05 AM, Junio C Hamano <[hidden email]> wrote:

> Stefan Beller <[hidden email]> writes:
>
>> I am not talking about crazy stuff here, but consider our own
>> .gitattributes file:
>>
>>     * whitespace=!indent,trail,space
>>     *.[ch] whitespace=indent,trail,space
>>     *.sh whitespace=indent,trail,space
>>
>> Now I want to search for
>>
>>     "the whitespace attribute that is set to at least trail and space"
>
> With :(attr:VAR=VAL) syntax, you can only look for whitespace
> attribute exactly set to "!indent,trail,space", so you can't.  So
> you are talking about crazy stuff after all, aren't you?

I think that stuck mentally with me from the "label" series,
as then you had to have an exact specification of the value list.

We don't do that any more, so in a very first series I can omit the
value parsing at all.

> Are you
> now extending it to do "whitespace~=indent" or something like that?
>
> You do not even need VAR=VAL form for your main topic.  You only
> need group-doc group-code etc. to look for "set to TRUE", no?
>
> I do not want to see us wasting too much time over-engineering it
> without having a concrete and useful use case, and "find path to
> whom whitespace checks are set for 'trail' and 'space'" is not.
> These comma-separated tokens augment WS_DEFAULT_RULE which has
> 'trail' already, so you do not look for 'trail' in the first place.
>
> "I want submodules under subs/ that is in either group-doc or
> group-code" is more reasonable and realistic use case, but that does
> not need any crazy stuff.  Either two separate pathspec elements,
>
>         ":(attr:group-doc)subs/" ":(attr:group-code)/subs/"
>
> or if we are to do the "ORed collection of ANDed attrs", it would be
> something like:
>
>         ":(attr:group-doc):(attr:group-code)/subs/"
>
> no?

Thanks for bringing me back to realities. I'll just drop support for values
in the first series.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Junio C Hamano
Stefan Beller <[hidden email]> writes:

> I'll just drop support for values
> in the first series.

I do not think an exact string match to support :(attr:eol=crlf) is
so bad.  The "crazy stuff" aka over-engineering is when it goes
beyond that, e.g. 'eol is set to one of these values"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Stefan Beller-4
In reply to this post by Junio C Hamano
On Mon, May 16, 2016 at 10:03 PM, Junio C Hamano <[hidden email]> wrote:

> When matching (i.e. the match_attrs() function), you would instead
> do
>
>         path = xmemdupz(name, namelen);
>         git_check_attr(path, item->attr_check);
>
> to grab values for only attributes that matter to you, instead of
> calling git_all_attrs() [*2*].
>
> After git_check_attr() returns, item->attr_check.check[0].attr would
> be git_attr("VAR1") and item->attr_check.check[0].value would be
> whatever setting the path has for the VAR1 attribute.  You can use
> your match_mode logic to compare it with the values .attr_match
> expects.
>
> You do not necessarily have to have the same number of elements in
> .attr_match and .attr_check.check by the way.  .attr_match might say
>
>         VAR1=VAL2 !VAR1 -VAR1
>
> which may be always false if these are ANDed together, but in order
> to evaluate it, you need only one git_attr_check_elem for VAR1.

So for the matching we would need to get the order right, i.e.

    const char *inspect_name = git_attr_name(item.attr_match[i].attr);
    for (j=0; j <  p.attr_check.check_nr; j++) {
        const char *cur_name = git_attr_name(p.attr_check.check[j].attr);
        if (!strcmp(inspect_name, cur_name))
            break;
    // now compare .attr_match[i] with attr_check.check[j]

This doesn't look cheap to me? Am I holding it wrong again?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Junio C Hamano
Stefan Beller <[hidden email]> writes:

> On Mon, May 16, 2016 at 10:03 PM, Junio C Hamano <[hidden email]> wrote:
>> When matching (i.e. the match_attrs() function), you would instead
>> do
>>
>>         path = xmemdupz(name, namelen);
>>         git_check_attr(path, item->attr_check);
>>
>> to grab values for only attributes that matter to you, instead of
>> calling git_all_attrs() [*2*].
>>
>> After git_check_attr() returns, item->attr_check.check[0].attr would
>> be git_attr("VAR1") and item->attr_check.check[0].value would be
>> whatever setting the path has for the VAR1 attribute.  You can use
>> your match_mode logic to compare it with the values .attr_match
>> expects.
>>
>> You do not necessarily have to have the same number of elements in
>> .attr_match and .attr_check.check by the way.  .attr_match might say
>>
>>         VAR1=VAL2 !VAR1 -VAR1
>>
>> which may be always false if these are ANDed together, but in order
>> to evaluate it, you need only one git_attr_check_elem for VAR1.
>

The key phrase in the message you are reacting to is "not
necessarily".  It is not a crime to ask for the same attribute twice
in a git_attr_check structure.

    $ git check-attr text text text -- path

would stuff three instances of "text" in there and ask them for
"path".  The simple in-code callers that uses git_attr_check_initl()
do rely on the order of the attributes it placed in attr_check
structure (see e.g. how ll_merge() uses check[0].value and
check[1].value to see the driver name and marker size), and that is
perfectly kosher.  Existing code is your friend.

The mention of the possibility is purely as a hint useful for a
possible enhancement in the far future.  If we ever want to support
something like this:

        ":(attr-expression (VAR1=VAL1 | VAR1=VAL2) & VAR2)"

you can remember that you can put VAR1 and VAR2 in attr_check to
grab values for VAR1 and VAR2 (even though VAR1 is mentioned twice
in the expression), and use them in the evaluation you will perform.

> So for the matching we would need to get the order right, i.e.
>
>     const char *inspect_name = git_attr_name(item.attr_match[i].attr);
>     for (j=0; j <  p.attr_check.check_nr; j++) {
>         const char *cur_name = git_attr_name(p.attr_check.check[j].attr);
>         if (!strcmp(inspect_name, cur_name))
>             break;

You do not strcmp() when you have attributes.  They are interned so
that you can compare their addresses.  That makes it somewhat
cheaper.

Once you start "expression over random attributes", you'd need to
map attr => value somehow.  The format attr_check structure gives
you, i.e. a list of <attr, value>, is aimed at compactness than
random hashmap-like access.  If the caller wants a hashmap-like
access for performance purposes, the caller does that itself.

Existing users do not need a hashmap-like access, because they know
at which index in attr_check they placed request for what attribute.
An array that can be indexed with a small integer is exactly what
they want.

> This doesn't look cheap to me? Am I holding it wrong again?

By the way, I do not think during the entire discussion on this
topic, you have never been in a situation to deserve the "holding it
wrong" label (which implies "a ware is broken, but somehow the end
user is blamed for using it incorrectly").  When you were wrong, you
were simply wrong.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [RFC-PATCHv6 4/4] pathspec: allow querying for attributes

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> The mention of the possibility is purely as a hint useful for a
> possible enhancement in the far future.  If we ever want to support
> something like this:
>
> ":(attr-expression (VAR1=VAL1 | VAR1=VAL2) & VAR2)"
>
> you can remember that you can put VAR1 and VAR2 in attr_check to
> grab values for VAR1 and VAR2 (even though VAR1 is mentioned twice
> in the expression), and use them in the evaluation you will perform.
>
>> So for the matching we would need to get the order right, i.e.
>>
>>     const char *inspect_name = git_attr_name(item.attr_match[i].attr);
>>     for (j=0; j <  p.attr_check.check_nr; j++) {
>>         const char *cur_name = git_attr_name(p.attr_check.check[j].attr);
>>         if (!strcmp(inspect_name, cur_name))
>>             break;
>
> You do not strcmp() when you have attributes.  They are interned so
> that you can compare their addresses.  That makes it somewhat
> cheaper.
>
> Once you start "expression over random attributes", you'd need to
> map attr => value somehow.  The format attr_check structure gives
> you, i.e. a list of <attr, value>, is aimed at compactness than
> random hashmap-like access.  If the caller wants a hashmap-like
> access for performance purposes, the caller does that itself.

To expand this a bit, I actually do not think hashmap-like access is
necessary even in such an application.

An implementation of the evaluator, at least a production-quality
one, for the attr expression example shown above is unlikely to keep
the expression in a single string "(VAR1=VAL1 | VAR1=VAL2) & VAR2".
Instead it would use a parse tree with nodes that represent
operators (e.g. OR, AND, "=", etc.)  and terms (e.g. attribute
reference VAR1, VAR2, and constatnts "VAL1", "VAL2") as its internal
representation.

And a node in such a parse tree that refers to an attribute VAR1
would unlikely to keep a "const char *" that is "VAR1".  The node
would have a field that stores an index into the attr_check.check[]
array when the textual expression is "compiled" into a parse tree
and the set of attributes the expression uses (hence it needs to
ask the attributes API about, to prepare attr_check array) is
identified.

When "evaluating" the parse-tree, a node that refers to an attribute
has an index into attr_check.check[] array, so there is no need for
a loop like the one shown above at all.

When "showing" the expression (for debugging purposes), it would
grab the "index into the attr_check.check[] array" from the node,
and the element in that array is an <attr, value> pair, so it can
use git_attr_name() to obtain the attribute name.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html