This goes on top of origin/jc/attr and is RFC as
I did not write tests nor documentation, yet. I wanted to understand Junios series, so I built on top. The meat is in the last patch, which allows for git ls-files :(attr:-text)path/pattern # (ATTR_FALSE) git ls-files :(attr:+text)path/pattern # (ATTR_TRUE) git ls-files :(attr:eol=input)path/pattern # values must match git ls-files :(attr:!text)path/pattern # find patterns with no "text" attribute, i.e. neither FALSE, TRUE or value git ls-files :(attr:text)path/pattern # opposite of !; find files which are TRUE, FALSE or value Of course you can chain them: git ls-files :(attr:text,attr:eol=lf)path/pattern # must match both attr specs. Feedback on the parsing and design welcome, Thanks, Stefan Stefan Beller (4): Documentation: fix a typo pathspec: move long magic parsing out of prefix_pathspec pathspec: move prefix check out of the inner loop pathspec: allow querying for attributes Documentation/gitattributes.txt | 2 +- attr.c | 2 +- attr.h | 2 + dir.c | 49 ++++++++++++ pathspec.c | 172 +++++++++++++++++++++++++++++++--------- pathspec.h | 16 ++++ 6 files changed, 203 insertions(+), 40 deletions(-) -- 2.8.2.401.g9c0faef -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
Signed-off-by: Stefan Beller <[hidden email]>
--- Documentation/gitattributes.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index e3b1de8..af2c682 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -86,7 +86,7 @@ is either not set or empty, $HOME/.config/git/attributes is used instead. Attributes for all users on a system should be placed in the `$(prefix)/etc/gitattributes` file. -Sometimes you would need to override an setting of an attribute +Sometimes you would need to override a setting of an attribute for a path to `Unspecified` state. This can be done by listing the name of the attribute prefixed with an exclamation point `!`. -- 2.8.2.401.g9c0faef -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
In reply to this post by Stefan Beller-4
`prefix_pathspec` is quite a lengthy function and we plan on adding more.
Split it up for better readability. As we want to add code into the inner loop of the long magic parsing, we also benefit from lower indentation. Signed-off-by: Stefan Beller <[hidden email]> --- pathspec.c | 84 +++++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 47 insertions(+), 37 deletions(-) diff --git a/pathspec.c b/pathspec.c index c9e9b6c..eba37c2 100644 --- a/pathspec.c +++ b/pathspec.c @@ -88,6 +88,52 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen, strbuf_addf(sb, ",prefix:%d)", prefixlen); } +static void eat_long_magic(struct pathspec_item *item, const char *elt, + unsigned *magic, int *pathspec_prefix, + const char **copyfrom_, const char **long_magic_end) +{ + int i; + const char *copyfrom = *copyfrom_; + /* longhand */ + const char *nextat; + for (copyfrom = elt + 2; + *copyfrom && *copyfrom != ')'; + copyfrom = nextat) { + size_t len = strcspn(copyfrom, ",)"); + if (copyfrom[len] == ',') + nextat = copyfrom + len + 1; + else + /* handle ')' and '\0' */ + nextat = copyfrom + len; + if (!len) + continue; + for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) { + if (strlen(pathspec_magic[i].name) == len && + !strncmp(pathspec_magic[i].name, copyfrom, len)) { + *magic |= pathspec_magic[i].bit; + break; + } + if (starts_with(copyfrom, "prefix:")) { + char *endptr; + *pathspec_prefix = strtol(copyfrom + 7, + &endptr, 10); + if (endptr - copyfrom != len) + die(_("invalid parameter for pathspec magic 'prefix'")); + /* "i" would be wrong, but it does not matter */ + break; + } + } + if (ARRAY_SIZE(pathspec_magic) <= i) + die(_("Invalid pathspec magic '%.*s' in '%s'"), + (int) len, copyfrom, elt); + } + if (*copyfrom != ')') + die(_("Missing ')' at the end of pathspec magic in '%s'"), elt); + *long_magic_end = copyfrom; + copyfrom++; + *copyfrom_ = copyfrom; +} + /* * Take an element of a pathspec and check for magic signatures. * Append the result to the prefix. Return the magic bitmap. @@ -150,43 +196,7 @@ static unsigned prefix_pathspec(struct pathspec_item *item, (flags & PATHSPEC_LITERAL_PATH)) { ; /* nothing to do */ } else if (elt[1] == '(') { - /* longhand */ - const char *nextat; - for (copyfrom = elt + 2; - *copyfrom && *copyfrom != ')'; - copyfrom = nextat) { - size_t len = strcspn(copyfrom, ",)"); - if (copyfrom[len] == ',') - nextat = copyfrom + len + 1; - else - /* handle ')' and '\0' */ - nextat = copyfrom + len; - if (!len) - continue; - for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) { - if (strlen(pathspec_magic[i].name) == len && - !strncmp(pathspec_magic[i].name, copyfrom, len)) { - magic |= pathspec_magic[i].bit; - break; - } - if (starts_with(copyfrom, "prefix:")) { - char *endptr; - pathspec_prefix = strtol(copyfrom + 7, - &endptr, 10); - if (endptr - copyfrom != len) - die(_("invalid parameter for pathspec magic 'prefix'")); - /* "i" would be wrong, but it does not matter */ - break; - } - } - if (ARRAY_SIZE(pathspec_magic) <= i) - die(_("Invalid pathspec magic '%.*s' in '%s'"), - (int) len, copyfrom, elt); - } - if (*copyfrom != ')') - die(_("Missing ')' at the end of pathspec magic in '%s'"), elt); - long_magic_end = copyfrom; - copyfrom++; + eat_long_magic(item, elt, &magic, &pathspec_prefix, ©from, &long_magic_end); } else { /* shorthand */ for (copyfrom = elt + 1; -- 2.8.2.401.g9c0faef -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
In reply to this post by Stefan Beller-4
The prefix check is not related the check of pathspec magic; also there
is no code that is relevant after we'd break the loop on a match for "prefix:". So move the check before the loop and shortcircuit the outer loop. Signed-off-by: Stefan Beller <[hidden email]> --- pathspec.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/pathspec.c b/pathspec.c index eba37c2..4dff252 100644 --- a/pathspec.c +++ b/pathspec.c @@ -107,21 +107,22 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt, nextat = copyfrom + len; if (!len) continue; + + if (starts_with(copyfrom, "prefix:")) { + char *endptr; + *pathspec_prefix = strtol(copyfrom + 7, + &endptr, 10); + if (endptr - copyfrom != len) + die(_("invalid parameter for pathspec magic 'prefix'")); + continue; + } + for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) { if (strlen(pathspec_magic[i].name) == len && !strncmp(pathspec_magic[i].name, copyfrom, len)) { *magic |= pathspec_magic[i].bit; break; } - if (starts_with(copyfrom, "prefix:")) { - char *endptr; - *pathspec_prefix = strtol(copyfrom + 7, - &endptr, 10); - if (endptr - copyfrom != len) - die(_("invalid parameter for pathspec magic 'prefix'")); - /* "i" would be wrong, but it does not matter */ - break; - } } if (ARRAY_SIZE(pathspec_magic) <= i) die(_("Invalid pathspec magic '%.*s' in '%s'"), -- 2.8.2.401.g9c0faef -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
In reply to this post by Stefan Beller-4
The pathspec mechanism is extended via the new
":(attr:eol=input)pattern/to/match" syntax to filter paths so that it requires paths to not just match the given pattern but also have the specified attrs attached for them to be chosen. Signed-off-by: Stefan Beller <[hidden email]> --- attr.c | 2 +- attr.h | 2 ++ dir.c | 49 +++++++++++++++++++++++++++++++++ pathspec.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- pathspec.h | 16 +++++++++++ 5 files changed, 157 insertions(+), 5 deletions(-) diff --git a/attr.c b/attr.c index 5178eb8..2d68a3c 100644 --- a/attr.c +++ b/attr.c @@ -59,7 +59,7 @@ static unsigned hash_name(const char *name, int namelen) return val; } -static int invalid_attr_name(const char *name, int namelen) +int invalid_attr_name(const char *name, int namelen) { /* * Attribute name cannot begin with '-' and must consist of diff --git a/attr.h b/attr.h index 7dc49f8..7fd8b90 100644 --- a/attr.h +++ b/attr.h @@ -45,6 +45,8 @@ extern void git_attr_check_append(struct git_attr_check *, const char *); extern void git_attr_check_clear(struct git_attr_check *); extern void git_attr_check_free(struct git_attr_check *); +extern int invalid_attr_name(const char *name, int namelen); + /* * Return the name of the attribute represented by the argument. The * return value is a pointer to a null-delimited string that is part diff --git a/dir.c b/dir.c index 996653b..540ea1f 100644 --- a/dir.c +++ b/dir.c @@ -9,6 +9,7 @@ */ #include "cache.h" #include "dir.h" +#include "attr.h" #include "refs.h" #include "wildmatch.h" #include "pathspec.h" @@ -215,6 +216,51 @@ int within_depth(const char *name, int namelen, return 1; } +static struct git_attr_check *check; +static int match_attrs(const char *name, int namelen, + const struct pathspec_item *item) +{ + char *path; + int i; + + if (!check) { + check = git_attr_check_alloc(); + for (i = 0; i < item->attr_nr; i++) + git_attr_check_append(check, item->attrs[i].attr); + } + + path = xmemdupz(name, namelen); + git_all_attrs(path, check); + + for (i = 0; i < item->attr_nr; i++) { + int matched; + const char *value = check->check[i].value; + + if (ATTR_TRUE(value)) { + matched = (item->attrs[i].mode == MATCH_SET || + item->attrs[i].mode == MATCH_NOT_UNSPECIFIED); + } else if (ATTR_FALSE(value)) { + matched = (item->attrs[i].mode == MATCH_UNSET || + item->attrs[i].mode == MATCH_NOT_UNSPECIFIED); + } else if (ATTR_UNSET(value)) { + matched = (item->attrs[i].mode == MATCH_UNSPECIFIED); + } else { + if (item->attrs[i].mode == MATCH_NOT_UNSPECIFIED) { + matched = 1; + } else { + /* NEEDSWORK: better value matching */ + matched = !strcmp(item->attrs[i].value, value); + } + } + if (!matched) + return 0; + } + + free(path); + + return 1; +} + #define DO_MATCH_EXCLUDE 1 #define DO_MATCH_DIRECTORY 2 @@ -270,6 +316,9 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix, strncmp(item->match, name - prefix, item->prefix)) return 0; + if (item->attr_nr && !match_attrs(name, namelen, item)) + return 0; + /* If the match was just the prefix, we matched */ if (!*match) return MATCHED_RECURSIVELY; diff --git a/pathspec.c b/pathspec.c index 4dff252..523ac8c 100644 --- a/pathspec.c +++ b/pathspec.c @@ -1,6 +1,7 @@ #include "cache.h" #include "dir.h" #include "pathspec.h" +#include "attr.h" /* * Finds which of the given pathspecs match items in the index. @@ -88,12 +89,73 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen, strbuf_addf(sb, ",prefix:%d)", prefixlen); } +/* + * Check if the given string is a good specification to search for attributes. + * Accepted strings: + * [+-!] ATTRIBUTE_NAME [= <space separated list of values> ] + * + * Examples: + * attr:+val to find value set to true + * attr:-val to find a value set to false + * attr:!val to find a value that is not set + * (i.e. it is neither set as "val", "val=<empty>", nor unset as "-val") + * attr:val=value: to find value that have at least a and b set. + * + * This only checks one attr, multiple attrs need to be given as multiple + * strings. + */ + +static void parse_attr_item(struct attr_item *attr, const char *value) +{ + size_t val_len; + const char *val = value; + + if (!val || !strlen(val)) + goto out; + + if (val[0] == '+') + attr->mode = MATCH_SET; + else if (val[0] == '-') + attr->mode = MATCH_UNSET; + else if (val[0] == '!') + attr->mode = MATCH_UNSPECIFIED; + else + attr->mode = NOT_INIT; + + if (attr->mode != NOT_INIT) + val++; + + val_len = strcspn(val, "=,)"); + if (val[val_len] == '=') + attr->mode = MATCH_VALUE; + else + attr->mode = MATCH_NOT_UNSPECIFIED; + + if (invalid_attr_name(val, val_len)) + goto out; + + attr->attr = xmemdupz(val, val_len); + + if (attr->mode == MATCH_VALUE) { + const char *after_equal = val + val_len + 1; + size_t after_equal_end = strcspn(after_equal, ",)"); + attr->value = xmemdupz(after_equal, after_equal_end); + } else + attr->value = NULL; + return; +out: + attr->mode = INVALID_ATTR; + warning(_("attr spec '%s': attrs must not start with '-' and " + "be composed of [-A-Za-z0-9_.]."), value); +} + static void eat_long_magic(struct pathspec_item *item, const char *elt, unsigned *magic, int *pathspec_prefix, const char **copyfrom_, const char **long_magic_end) { int i; const char *copyfrom = *copyfrom_; + const char *body; /* longhand */ const char *nextat; for (copyfrom = elt + 2; @@ -108,15 +170,20 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt, if (!len) continue; - if (starts_with(copyfrom, "prefix:")) { + if (skip_prefix(copyfrom, "prefix:", &body)) { char *endptr; - *pathspec_prefix = strtol(copyfrom + 7, - &endptr, 10); + *pathspec_prefix = strtol(body, &endptr, 10); if (endptr - copyfrom != len) die(_("invalid parameter for pathspec magic 'prefix'")); continue; } + if (skip_prefix(copyfrom, "attr:", &body)) { + ALLOC_GROW(item->attrs, item->attr_nr + 1, item->attr_alloc); + parse_attr_item(&item->attrs[item->attr_nr++], body); + continue; + } + for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) { if (strlen(pathspec_magic[i].name) == len && !strncmp(pathspec_magic[i].name, copyfrom, len)) { @@ -425,7 +492,9 @@ void parse_pathspec(struct pathspec *pathspec, for (i = 0; i < n; i++) { unsigned short_magic; entry = argv[i]; - + item[i].attrs = NULL; + item[i].attr_nr = 0; + item[i].attr_alloc = 0; item[i].magic = prefix_pathspec(item + i, &short_magic, argv + i, flags, prefix, prefixlen, entry); @@ -447,6 +516,13 @@ void parse_pathspec(struct pathspec *pathspec, if (item[i].nowildcard_len < item[i].len) pathspec->has_wildcard = 1; pathspec->magic |= item[i].magic; + + if (item[i].attr_nr) { + int j; + for (j = 0; j < item[i].attr_nr; j++) + if (item[i].attrs[j].mode == INVALID_ATTR) + die(_("attribute spec in the wrong syntax are prohibited.")); + } } if (nr_exclude == n) @@ -502,6 +578,15 @@ void copy_pathspec(struct pathspec *dst, const struct pathspec *src) void free_pathspec(struct pathspec *pathspec) { + int i, j; + for (i = 0; i < pathspec->nr; i++) { + for (j = 0; j < pathspec->items[j].attr_nr; j++) { + free(pathspec->items[i].attrs[j].attr); + free(pathspec->items[i].attrs[j].value); + } + free(pathspec->items[i].attrs); + } + free(pathspec->items); pathspec->items = NULL; } diff --git a/pathspec.h b/pathspec.h index 0c11262..89d73db 100644 --- a/pathspec.h +++ b/pathspec.h @@ -32,6 +32,21 @@ struct pathspec { int len, prefix; int nowildcard_len; int flags; + int attr_nr; + int attr_alloc; + struct attr_item { + char *attr; + char *value; + enum attr_match_mode { + NOT_INIT, + MATCH_SET, + MATCH_UNSET, + MATCH_VALUE, + MATCH_UNSPECIFIED, + MATCH_NOT_UNSPECIFIED, + INVALID_ATTR + } mode; + } *attrs; } *items; }; @@ -98,5 +113,6 @@ extern char *find_pathspecs_matching_against_index(const struct pathspec *pathsp extern void add_pathspec_matches_against_index(const struct pathspec *pathspec, char *seen); extern const char *check_path_for_gitlink(const char *path); extern void die_if_path_beyond_symlink(const char *path, const char *prefix); +extern int validate_label_name(const char *label); #endif /* PATHSPEC_H */ -- 2.8.2.401.g9c0faef -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:
> I wanted to understand Junios series, so I built on top. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:
> + * attr:+val to find value set to true > + * attr:-val to find a value set to false > + * attr:!val to find a value that is not set > + * (i.e. it is neither set as "val", "val=<empty>", nor unset as "-val") > + * attr:val=value: to find value that have at least a and b set. I would have expected that there won't be "attr:+val", but it is spelled as "attr:val" instead. > +static void parse_attr_item(struct attr_item *attr, const char *value) Please do not call something that is not part of the attribute infrastructure as "attr_item"; I wasted time looking for the structure definition for "attr_item" in <attr.h>. > +static int match_attrs(const char *name, int namelen, > + const struct pathspec_item *item) > +{ > + char *path; > + int i; > + > + if (!check) { > + check = git_attr_check_alloc(); > + for (i = 0; i < item->attr_nr; i++) > + git_attr_check_append(check, item->attrs[i].attr); > + } > + > + path = xmemdupz(name, namelen); > + git_all_attrs(path, check); PLEASE DON'T. git_all_attrs() asks for all the attribute under the sun and has no hope to perform sensibly, especially at the very leaf level of the pathspec logic where one call to this function is made for each and every path in the tree. Instead, have a pointer to "struct git_attr_check" in pathspec_item and make a call to git_check_attr(path, item->check) here. Which means that you would need to prepare git_attr_check around ... > + if (skip_prefix(copyfrom, "attr:", &body)) { > + ALLOC_GROW(item->attrs, item->attr_nr + 1, item->attr_alloc); > + parse_attr_item(&item->attrs[item->attr_nr++], body); ... HERE. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:
> +static struct git_attr_check *check; > +static int match_attrs(const char *name, int namelen, > + const struct pathspec_item *item) > +{ > + char *path; > + int i; > + > + if (!check) { > + check = git_attr_check_alloc(); > + for (i = 0; i < item->attr_nr; i++) > + git_attr_check_append(check, item->attrs[i].attr); This is simply wrong; you may have two pathspec elements with attribute match magic, the first one may ask for one attribute while the second one may ask for seven. The first time around you allocate and append one attribute. The second time around you don't do anything useful, and send a git_attr_check with one element to deal with 7 attributes. > + } > + path = xmemdupz(name, namelen); > + git_all_attrs(path, check); However, the above "This is simply wrong" bogosity is covered because git_all_attrs() is used here, ignoring what is in check. The loop we see above is an expensive no-op, as the first thing all_attrs() does is to empty check() and instead stuff it with every attribute under the sun, not necessarily limited to attributes in item->attrs[]. By the way, do not call an array as plural. item->attr[i] is a good name to call a single ith element in an array. item->attrs[i] isn't. > + for (i = 0; i < item->attr_nr; i++) { > + int matched; > + const char *value = check->check[i].value; check[i] has no relevance to item->attrs[i] here. I do not think the code after this point is computing anything sensible. > diff --git a/pathspec.h b/pathspec.h > index 0c11262..89d73db 100644 > --- a/pathspec.h > +++ b/pathspec.h > @@ -32,6 +32,21 @@ struct pathspec { > int len, prefix; > int nowildcard_len; > int flags; > + int attr_nr; > + int attr_alloc; > + struct attr_item { > + char *attr; > + char *value; > + enum attr_match_mode { > + NOT_INIT, > + MATCH_SET, > + MATCH_UNSET, > + MATCH_VALUE, > + MATCH_UNSPECIFIED, > + MATCH_NOT_UNSPECIFIED, > + INVALID_ATTR > + } mode; > + } *attrs; I'd think the above addition that is in line with the updated API would look more like this [*1*]: int attr_match_nr; int attr_match_alloc; struct attr_match { struct git_attr *attr; char *value; enum attr_match_mode { ... } match_mode; } *attr_match; struct git_attr_check *attr_check; Then while parsing ":(attr:VAR1=VAL1 -VAR2 VAR3...)path/to/dir/", you would first do: p->attr_check = git_attr_check_alloc(); once, and then send each of VAR1=VAL2, -VAR2, VAR3... to your parse_one_item() helper function which would: * parse the match-mode like your code does; * parse out the attribute name (i.e. VAR1, VAR2 and VAR3), and instead of keeping it as a "(const) char *", call git_attr() to intern it (and keep it in local variable "attr"), and save it in p->attr_match[p->attr_nr].attr; * call git_attr_check_append(p->attr_check, git_attr_name(attr)) After the above finishes, you would end up with something like: .attr_match = { { .attr = git_attr("VAR1"), .value = "VAL2", .match_mode = MATCH_VALUE }, { .attr = git_attr("VAR2"), .value = <does not matter>, .match_mode = MATCH_UNSET }, ... }, .attr_check = { .check = { { .attr = git_attr("VAR1"), .value = <does not matter> }, { .attr = git_attr("VAR2"), .value = <does not matter> }, { .attr = git_attr("VAR3"), .value = <does not matter> }, } When matching (i.e. the match_attrs() function), you would instead do path = xmemdupz(name, namelen); git_check_attr(path, item->attr_check); to grab values for only attributes that matter to you, instead of calling git_all_attrs() [*2*]. After git_check_attr() returns, item->attr_check.check[0].attr would be git_attr("VAR1") and item->attr_check.check[0].value would be whatever setting the path has for the VAR1 attribute. You can use your match_mode logic to compare it with the values .attr_match expects. You do not necessarily have to have the same number of elements in .attr_match and .attr_check.check by the way. .attr_match might say VAR1=VAL2 !VAR1 -VAR1 which may be always false if these are ANDed together, but in order to evaluate it, you need only one git_attr_check_elem for VAR1. [Footnote] *1* With the old API, things would not be that much different. Instead of single structure .attr_check, you would make an array of git_attr_check structure, exactly like the array at .attr_check.check[] in the new API by hand. The new API makes this preparation simpler by managing the array on the API implementation side. *2* Please do not use that silly function especially in performance sensitive codepath. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
In reply to this post by Junio C Hamano
On Mon, May 16, 2016 at 9:23 PM, Junio C Hamano <[hidden email]> wrote:
> Stefan Beller <[hidden email]> writes: > >> + * attr:+val to find value set to true >> + * attr:-val to find a value set to false >> + * attr:!val to find a value that is not set >> + * (i.e. it is neither set as "val", "val=<empty>", nor unset as "-val") >> + * attr:val=value: to find value that have at least a and b set. > > I would have expected that there won't be "attr:+val", but it is > spelled as "attr:val" instead. "val" matches if the attr is not unspecified, i.e. one of {true, false, value} "+val" matches {true} only. Maybe we want to redo that to "val" matches {true} only. "?val" matches {true, false, value}. (I can leave this case out in the first series, too) > >> +static void parse_attr_item(struct attr_item *attr, const char *value) > > Please do not call something that is not part of the attribute > infrastructure as "attr_item"; I wasted time looking for the > structure definition for "attr_item" in <attr.h>. So "parse_pathspec_attr_match" instead? > >> +static int match_attrs(const char *name, int namelen, >> + const struct pathspec_item *item) >> +{ >> + char *path; >> + int i; >> + >> + if (!check) { >> + check = git_attr_check_alloc(); >> + for (i = 0; i < item->attr_nr; i++) >> + git_attr_check_append(check, item->attrs[i].attr); >> + } >> + >> + path = xmemdupz(name, namelen); >> + git_all_attrs(path, check); > > PLEASE DON'T. git_all_attrs() asks for all the attribute under the > sun and has no hope to perform sensibly, especially at the very leaf > level of the pathspec logic where one call to this function is made > for each and every path in the tree. This is executed only once, as check is static? From a users perception it doesn't matter if it is executed once just after parsing all pathspec items or at the first path to check for a match, no? The mistake is using the API wrong. So inside the '!check', after the preparation loop of git_attr_check_append, we'd need to hand over the "check" to git_check_attr instead? > > Instead, have a pointer to "struct git_attr_check" in pathspec_item > and make a call to git_check_attr(path, item->check) here. I see, then we have multiple `check` structs. Makes sense. > > Which means that you would need to prepare git_attr_check around ... > >> + if (skip_prefix(copyfrom, "attr:", &body)) { >> + ALLOC_GROW(item->attrs, item->attr_nr + 1, item->attr_alloc); >> + parse_attr_item(&item->attrs[item->attr_nr++], body); > > ... HERE. > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
In reply to this post by Junio C Hamano
On Mon, May 16, 2016 at 10:03 PM, Junio C Hamano <[hidden email]> wrote:
> > int attr_match_nr; > int attr_match_alloc; > struct attr_match { > struct git_attr *attr; > char *value; > enum attr_match_mode { > ... > } match_mode; > } *attr_match; > struct git_attr_check *attr_check; ok, I'll use that structure and go from here. > > Then while parsing ":(attr:VAR1=VAL1 -VAR2 VAR3...)path/to/dir/", This syntax is not pleasant to parse IMHO as it is not clear if the token after white space (-VAR2 here) is another attribute or the next part of the list of VAR1, i.e. this could also be: :(attr:VAR1=VAL1 VAL2) I wonder if we'd want to use the colon to separate different VARs: :(attr:VAR1=VAL1 VAL2:-VAR2:VAR3) which means: match all path, that * have VAR1 set to a value list containing at least VAL1 and VAL2 * have VAR2 unset * have VAR3 set to true. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
Stefan Beller <[hidden email]> writes:
>> Then while parsing ":(attr:VAR1=VAL1 -VAR2 VAR3...)path/to/dir/", > > This syntax is not pleasant to parse IMHO as it is not clear if the token > after white space (-VAR2 here) is another attribute or the next part of > the list of VAR1, ... Remove the ambiguity by declaring that the list is always whitespace separated. No whitespace in var, no whitespace in val, no quoting. The set of attributes with values expected to be used in the pathspec "attribute match" magic, I do not think there is anything that wants such a random arbitrary string. The value side of an attribute with value, e.g. "eol=crlf", "conflict-marker-size=7", is designed to be a token that our C code is prepared to parse. In other words, if you match the parsing semantics of parse_attr() in attr.c, you are OK. The attribute subsystem will not give users anything that is more complex than what that routine is prepared to parse, and that is a "whitespace separated list, no whitespace in attribute names, no whitespace in values, no quoting". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
On Tue, May 17, 2016 at 10:34 AM, Junio C Hamano <[hidden email]> wrote:
> Stefan Beller <[hidden email]> writes: > >>> Then while parsing ":(attr:VAR1=VAL1 -VAR2 VAR3...)path/to/dir/", >> >> This syntax is not pleasant to parse IMHO as it is not clear if the token >> after white space (-VAR2 here) is another attribute or the next part of >> the list of VAR1, ... > > Remove the ambiguity by declaring that the list is always whitespace > separated. No whitespace in var, no whitespace in val, no quoting. > > The set of attributes with values expected to be used in the > pathspec "attribute match" magic, I do not think there is anything > that wants such a random arbitrary string. The value side of an > attribute with value, e.g. "eol=crlf", "conflict-marker-size=7", is > designed to be a token that our C code is prepared to parse. I am not talking about crazy stuff here, but consider our own .gitattributes file: * whitespace=!indent,trail,space *.[ch] whitespace=indent,trail,space *.sh whitespace=indent,trail,space Now I want to search for "the whitespace attribute that is set to at least trail and space" We cannot use commas for the specification as they are used in .gitattributes, because that would make it even harder, so I would imagine this could be used: :(attr:whitespace=space trail) See the whitespace separates the values, not the next variable. To add another variable, I would suggest using a ':', such as :(attr:whitespace=space trail:text) might a viable thing. > > In other words, if you match the parsing semantics of parse_attr() > in attr.c, you are OK. The attribute subsystem will not give users > anything that is more complex than what that routine is prepared to > parse, and that is a "whitespace separated list, no whitespace in > attribute names, no whitespace in values, no quoting". > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
Stefan Beller <[hidden email]> writes:
> I am not talking about crazy stuff here, but consider our own > .gitattributes file: > > * whitespace=!indent,trail,space > *.[ch] whitespace=indent,trail,space > *.sh whitespace=indent,trail,space > > Now I want to search for > > "the whitespace attribute that is set to at least trail and space" With :(attr:VAR=VAL) syntax, you can only look for whitespace attribute exactly set to "!indent,trail,space", so you can't. So you are talking about crazy stuff after all, aren't you? Are you now extending it to do "whitespace~=indent" or something like that? You do not even need VAR=VAL form for your main topic. You only need group-doc group-code etc. to look for "set to TRUE", no? I do not want to see us wasting too much time over-engineering it without having a concrete and useful use case, and "find path to whom whitespace checks are set for 'trail' and 'space'" is not. These comma-separated tokens augment WS_DEFAULT_RULE which has 'trail' already, so you do not look for 'trail' in the first place. "I want submodules under subs/ that is in either group-doc or group-code" is more reasonable and realistic use case, but that does not need any crazy stuff. Either two separate pathspec elements, ":(attr:group-doc)subs/" ":(attr:group-code)/subs/" or if we are to do the "ORed collection of ANDed attrs", it would be something like: ":(attr:group-doc):(attr:group-code)/subs/" no? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
On Tue, May 17, 2016 at 11:05 AM, Junio C Hamano <[hidden email]> wrote:
> Stefan Beller <[hidden email]> writes: > >> I am not talking about crazy stuff here, but consider our own >> .gitattributes file: >> >> * whitespace=!indent,trail,space >> *.[ch] whitespace=indent,trail,space >> *.sh whitespace=indent,trail,space >> >> Now I want to search for >> >> "the whitespace attribute that is set to at least trail and space" > > With :(attr:VAR=VAL) syntax, you can only look for whitespace > attribute exactly set to "!indent,trail,space", so you can't. So > you are talking about crazy stuff after all, aren't you? I think that stuck mentally with me from the "label" series, as then you had to have an exact specification of the value list. We don't do that any more, so in a very first series I can omit the value parsing at all. > Are you > now extending it to do "whitespace~=indent" or something like that? > > You do not even need VAR=VAL form for your main topic. You only > need group-doc group-code etc. to look for "set to TRUE", no? > > I do not want to see us wasting too much time over-engineering it > without having a concrete and useful use case, and "find path to > whom whitespace checks are set for 'trail' and 'space'" is not. > These comma-separated tokens augment WS_DEFAULT_RULE which has > 'trail' already, so you do not look for 'trail' in the first place. > > "I want submodules under subs/ that is in either group-doc or > group-code" is more reasonable and realistic use case, but that does > not need any crazy stuff. Either two separate pathspec elements, > > ":(attr:group-doc)subs/" ":(attr:group-code)/subs/" > > or if we are to do the "ORed collection of ANDed attrs", it would be > something like: > > ":(attr:group-doc):(attr:group-code)/subs/" > > no? Thanks for bringing me back to realities. I'll just drop support for values in the first series. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
Stefan Beller <[hidden email]> writes:
> I'll just drop support for values > in the first series. I do not think an exact string match to support :(attr:eol=crlf) is so bad. The "crazy stuff" aka over-engineering is when it goes beyond that, e.g. 'eol is set to one of these values" -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
In reply to this post by Junio C Hamano
On Mon, May 16, 2016 at 10:03 PM, Junio C Hamano <[hidden email]> wrote:
> When matching (i.e. the match_attrs() function), you would instead > do > > path = xmemdupz(name, namelen); > git_check_attr(path, item->attr_check); > > to grab values for only attributes that matter to you, instead of > calling git_all_attrs() [*2*]. > > After git_check_attr() returns, item->attr_check.check[0].attr would > be git_attr("VAR1") and item->attr_check.check[0].value would be > whatever setting the path has for the VAR1 attribute. You can use > your match_mode logic to compare it with the values .attr_match > expects. > > You do not necessarily have to have the same number of elements in > .attr_match and .attr_check.check by the way. .attr_match might say > > VAR1=VAL2 !VAR1 -VAR1 > > which may be always false if these are ANDed together, but in order > to evaluate it, you need only one git_attr_check_elem for VAR1. So for the matching we would need to get the order right, i.e. const char *inspect_name = git_attr_name(item.attr_match[i].attr); for (j=0; j < p.attr_check.check_nr; j++) { const char *cur_name = git_attr_name(p.attr_check.check[j].attr); if (!strcmp(inspect_name, cur_name)) break; // now compare .attr_match[i] with attr_check.check[j] This doesn't look cheap to me? Am I holding it wrong again? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
Stefan Beller <[hidden email]> writes:
> On Mon, May 16, 2016 at 10:03 PM, Junio C Hamano <[hidden email]> wrote: >> When matching (i.e. the match_attrs() function), you would instead >> do >> >> path = xmemdupz(name, namelen); >> git_check_attr(path, item->attr_check); >> >> to grab values for only attributes that matter to you, instead of >> calling git_all_attrs() [*2*]. >> >> After git_check_attr() returns, item->attr_check.check[0].attr would >> be git_attr("VAR1") and item->attr_check.check[0].value would be >> whatever setting the path has for the VAR1 attribute. You can use >> your match_mode logic to compare it with the values .attr_match >> expects. >> >> You do not necessarily have to have the same number of elements in >> .attr_match and .attr_check.check by the way. .attr_match might say >> >> VAR1=VAL2 !VAR1 -VAR1 >> >> which may be always false if these are ANDed together, but in order >> to evaluate it, you need only one git_attr_check_elem for VAR1. > The key phrase in the message you are reacting to is "not necessarily". It is not a crime to ask for the same attribute twice in a git_attr_check structure. $ git check-attr text text text -- path would stuff three instances of "text" in there and ask them for "path". The simple in-code callers that uses git_attr_check_initl() do rely on the order of the attributes it placed in attr_check structure (see e.g. how ll_merge() uses check[0].value and check[1].value to see the driver name and marker size), and that is perfectly kosher. Existing code is your friend. The mention of the possibility is purely as a hint useful for a possible enhancement in the far future. If we ever want to support something like this: ":(attr-expression (VAR1=VAL1 | VAR1=VAL2) & VAR2)" you can remember that you can put VAR1 and VAR2 in attr_check to grab values for VAR1 and VAR2 (even though VAR1 is mentioned twice in the expression), and use them in the evaluation you will perform. > So for the matching we would need to get the order right, i.e. > > const char *inspect_name = git_attr_name(item.attr_match[i].attr); > for (j=0; j < p.attr_check.check_nr; j++) { > const char *cur_name = git_attr_name(p.attr_check.check[j].attr); > if (!strcmp(inspect_name, cur_name)) > break; You do not strcmp() when you have attributes. They are interned so that you can compare their addresses. That makes it somewhat cheaper. Once you start "expression over random attributes", you'd need to map attr => value somehow. The format attr_check structure gives you, i.e. a list of <attr, value>, is aimed at compactness than random hashmap-like access. If the caller wants a hashmap-like access for performance purposes, the caller does that itself. Existing users do not need a hashmap-like access, because they know at which index in attr_check they placed request for what attribute. An array that can be indexed with a small integer is exactly what they want. > This doesn't look cheap to me? Am I holding it wrong again? By the way, I do not think during the entire discussion on this topic, you have never been in a situation to deserve the "holding it wrong" label (which implies "a ware is broken, but somehow the end user is blamed for using it incorrectly"). When you were wrong, you were simply wrong. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
Junio C Hamano <[hidden email]> writes:
> The mention of the possibility is purely as a hint useful for a > possible enhancement in the far future. If we ever want to support > something like this: > > ":(attr-expression (VAR1=VAL1 | VAR1=VAL2) & VAR2)" > > you can remember that you can put VAR1 and VAR2 in attr_check to > grab values for VAR1 and VAR2 (even though VAR1 is mentioned twice > in the expression), and use them in the evaluation you will perform. > >> So for the matching we would need to get the order right, i.e. >> >> const char *inspect_name = git_attr_name(item.attr_match[i].attr); >> for (j=0; j < p.attr_check.check_nr; j++) { >> const char *cur_name = git_attr_name(p.attr_check.check[j].attr); >> if (!strcmp(inspect_name, cur_name)) >> break; > > You do not strcmp() when you have attributes. They are interned so > that you can compare their addresses. That makes it somewhat > cheaper. > > Once you start "expression over random attributes", you'd need to > map attr => value somehow. The format attr_check structure gives > you, i.e. a list of <attr, value>, is aimed at compactness than > random hashmap-like access. If the caller wants a hashmap-like > access for performance purposes, the caller does that itself. To expand this a bit, I actually do not think hashmap-like access is necessary even in such an application. An implementation of the evaluator, at least a production-quality one, for the attr expression example shown above is unlikely to keep the expression in a single string "(VAR1=VAL1 | VAR1=VAL2) & VAR2". Instead it would use a parse tree with nodes that represent operators (e.g. OR, AND, "=", etc.) and terms (e.g. attribute reference VAR1, VAR2, and constatnts "VAL1", "VAL2") as its internal representation. And a node in such a parse tree that refers to an attribute VAR1 would unlikely to keep a "const char *" that is "VAR1". The node would have a field that stores an index into the attr_check.check[] array when the textual expression is "compiled" into a parse tree and the set of attributes the expression uses (hence it needs to ask the attributes API about, to prepare attr_check array) is identified. When "evaluating" the parse-tree, a node that refers to an attribute has an index into attr_check.check[] array, so there is no need for a loop like the one shown above at all. When "showing" the expression (for debugging purposes), it would grab the "index into the attr_check.check[] array" from the node, and the element in that array is an <attr, value> pair, so it can use git_attr_name() to obtain the attribute name. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
Free forum by Nabble | Edit this page |