[PATCH 0/7] submodule groups

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Junio C Hamano
Stefan Beller <[hidden email]> writes:

> So I wonder if we rather want to extend the pathspec magic to
> include properties of blobs (i.e. submodules):
>
>     git <command> . :(sub-label:label-sub0) :(exclude)*0
>
> would look much more powerful too me. Properties of blobs
> may also be interesting for otherwise. Imagine looking for huge files
> (in a bare repo, so you have to use Git and not your shell tools):
>
>   git ls-files . :(file-size:>1024k)

I somehow do not think this is a way normal people (read: end users)
would want to interact with Git.  Pathspec is about "paths" and
various ways to match them.  It is not about contents that happens
to be currently named by that path.  Don't tie types or sizes to it.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

>>   git ls-files . :(file-size:>1024k)
>
> I somehow do not think this is a way normal people (read: end users)
> would want to interact with Git.  Pathspec is about "paths" and
> various ways to match them.  It is not about contents that happens
> to be currently named by that path.  Don't tie types or sizes to it.

To clarify, think what that non-pathspec means when used like this:

    $ git diff :(size>1M)
    $ git log --follow :(size>1M)

Which side of comparison does the "size" thing apply?  Either, both,
randomly?  More importantly, what use case of users do these
commands serve?

That is why I said that pathspec should never consider anything but
the pathname string you see.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Stefan Beller-4
On Wed, May 11, 2016 at 4:48 PM, Junio C Hamano <[hidden email]> wrote:

> Junio C Hamano <[hidden email]> writes:
>
>>>   git ls-files . :(file-size:>1024k)
>>
>> I somehow do not think this is a way normal people (read: end users)
>> would want to interact with Git.  Pathspec is about "paths" and
>> various ways to match them.  It is not about contents that happens
>> to be currently named by that path.  Don't tie types or sizes to it.
>
> To clarify, think what that non-pathspec means when used like this:
>
>     $ git diff :(size>1M)
>     $ git log --follow :(size>1M)
>
> Which side of comparison does the "size" thing apply?  Either, both,
> randomly?  More importantly, what use case of users do these
> commands serve?
>
> That is why I said that pathspec should never consider anything but
> the pathname string you see.

I see. That is bad indeed. So I'll go back and finish the submodulespec
to present.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Junio C Hamano
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:

>     git submodule--helper matches-submodulespec sub0 ./.
> ./:(exclude)*0 *label-sub0
>
> which should test if the first argument (sub0) matches the submodulespec
> which follows.

This, according to that "OR'ed together" definition, asks to find a
submodule

    - whose path matches pathspec ./. ./:(exclude)*0; or
    - is labeled with label-sub0.

So I'd say it is natural sub0 matches if its path is at sub0 and has
a label label-sub0.

You could instead choose to use "AND'ed together" semantics, but
that would break expectation by those who expect "This OR that"
behaviour.  Unless you are willing to support --and/--or/(/) like
"git grep" does to express a way to combine hits from individual
terms, that is an inherent limitation.

I'd suggest not to over-engineer this.  Go back and imagine how
"/bin/ls" would work is a good sanity check to gauge what complexity
levels ordinary users would feel comfortable to handle.

"ls a b" would give union of what "ls a" and "ls b" would output,
there is no negation, and the users won't die from having to say "ls
[A-Za-qs-z]*" to exclude files whose names begin with 'r'.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> I'd suggest not to over-engineer this.  Go back and imagine how
> "/bin/ls" would work is a good sanity check to gauge what complexity
> levels ordinary users would feel comfortable to handle.
>
> "ls a b" would give union of what "ls a" and "ls b" would output,
> there is no negation, and the users won't die from having to say "ls
> [A-Za-qs-z]*" to exclude files whose names begin with 'r'.

Having said all that, there is one thing we may want to consider.

For negative pathspecs, we deliberately defined its semantics to
"see if the path matches with any positive pathspec element (and it
is a non-match if no positive one matches), and then exclude if it
matches with any negative pathspec element".

That is, when you are matching 'x.o' against three-element pathspec

        '*.c' ':(exclude)*.o' 'x.?'

you do not say "x.o does not match *.c, but it matches *.o so it is
to be excluded, ah, wait, x.? matches to revive it so it is a
match".  Instead "*.c does not and x.? does match, but *.o matches
so it is excluded".  IOW, the order of the pathspec elements given
on the command line does not matter.

Now we are adding two different criteria to specify inclusion based
on labels and names.

As implemented in the patch series under discussion, we are saying
that a submodule is included if

    - its path matches the pathspec (using the "matches one or more
      positive pathspec elements, and does not match any negaative
      pathspec element" as the definition of "matches"), OR

    - it is included in the specified set of *labels, OR

    - its name is specified by :name

There is no reason not to consider future enhancements to specify
exclusion base on these two new criretia.  A naive and easier to
implement enhancement would be to rewrite the above to (the latter
two item changes):


    - its path matches the pathspec (using the "matches one or more
      positive pathspec elements, and does not match any negaative
      pathspec element" as the definition of "matches"), OR

    - it is included in the specified set of *labels, and does not
      have any excluded *!labels, OR

    - its name is specified by :name, and is not among any excluded
      :!name.

but it does not have to be that way.  I suspect that it may make it
easier to understand and use if we OR'ed together only the positive
ones from three classes of criteria together and require at least
one of them to match, and then requiring none of the negative ones
to match.  That is, a module-spec with three elements:

     'sub*' '*label0' './:(exclude)*0'

with the implemented algorithm would choose submodules whose name
begins with 'sub' except the ones whose name ends with '0', OR those
with label0, but if we redefine the behaviour to "positive ones
together, and then exclude negative ones", then we would choose ones
whose name begins with 'sub' or ones with label0, and among these,
exclude those whose name ends with '0' (which is what your "test"
wanted to express).

The implementation of match_pathspec() does the "first positive ones
only and then negative ones" two-step process already, so I think
you could update its "int is_dir" argument to "unsigned flags",
introduce a "negative only" flag, and then do something like:

        for each path
                if (!(match_pathspec(ps, name, ..., 0) ||
                      has a "positive" label specified ||
                      has its name specified as a "postiive")
                        /* does not match any positive criteria */
                        continue;
                if (match_pathspec(ps, name, ..., NEGATIVE_ONLY) ||
                    has a "negative" label specified ||
                    has its name specified as a "negative")
                        /* explicitly excluded */
                        continue;
                /* included! */

That would open door to something like

     'sub*' '*label0' './:(exclude)*0' '*!label1'

to allow you to say "(those with label0 or whose path begins with
sub) minus (those with label1 or whose path ends with 0)".

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Junio C Hamano
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:

> If you have lots of submodules, you probably don't need all of them at once,
> but you have functional units. Some submodules are absolutely required,
> some are optional and only for very specific purposes.
>
> This patch series adds labels to submodules in the .gitmodules file.

I hate to bring this up in this thread, primarily because I cannot
see how to make it mesh well with the "submodule spec lets you
specify a group of submodule with labels", but for completeness of
discussion, I'll mention it anyway.

Instead of specifying "all files written in Perl" to work on by
giving a pathspec with three elements, e.g.

    git cmd -- \*.perl \*.pl \*.pm

I've often wondered if it would be a good idea to let attributes
file to specify "these paths form the group Perl" with something
like:

    *.pm        group=perl
    *.pl        group=perl
    *.perl      group=perl
    *.h         group=c
    *.c         group=c

and say

    git cmd -- ':(group=perl)'

instead.

The reason why I suspect that this may not work well with submodule
labels is because submodule labels (or any attribute we give via
.gitmodules to a submodule) are keyed by a submodule name, which is
the primary unchanging key (so that people can "mv" a submodule in
the context of the toplevel superproject without losing track of
submodule identity), not by paths to submodules, while the "group"
thing I want is merely a short-hand for pathspec elements and wants
to be keyed by paths.

But there may be somebody more clever than I who can come up with a
way to unify these two similar concepts without confusing end users.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Stefan Beller-4
On Wed, May 11, 2016 at 10:50 PM, Junio C Hamano <[hidden email]> wrote:

>
>     git cmd -- \*.perl \*.pl \*.pm
>
> I've often wondered if it would be a good idea to let attributes
> file to specify "these paths form the group Perl" with something
> like:
>
>     *.pm        group=perl
>     *.pl        group=perl
>     *.perl      group=perl
>     *.h         group=c
>     *.c         group=c
>
> and say
>
>     git cmd -- ':(group=perl)'
>
> instead.

How is that different to the file size example I gave earlier?
You could have a change which includes:

    -    *.pl    group=prolog
    +    *.pl   group=perl

What happens in the diff/log case then (just as a file can pass an
arbitrary file size within that change) ? These file attributes are as
arbitrary as sizes as well as submodule labels or names.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Junio C Hamano
In reply to this post by Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> The reason why I suspect that this may not work well with submodule
> labels is because submodule labels (or any attribute we give via
> .gitmodules to a submodule) are keyed by a submodule name, which is
> the primary unchanging key (so that people can "mv" a submodule in
> the context of the toplevel superproject without losing track of
> submodule identity), not by paths to submodules, while the "group"
> thing I want is merely a short-hand for pathspec elements and wants
> to be keyed by paths.
>
> But there may be somebody more clever than I who can come up with a
> way to unify these two similar concepts without confusing end users.

Thinking about this even more, if there is no requirement that
labels must be tied to submodule names, we just can scrap the idea
of "submodule labels" to group things and instead just use "path
labels", i.e. write the full path to the submodule and assign it a
label in .gitattributes and use it in place of where we used *label
in the patch series.  After all, an easy way to choose some among
many submodules logically is a subset of an easy way to choose some
among many paths.

The only reason why we added the submodule label to .gitmodules is
because we viewed it as submodule-specific thing and the "keyed by
name, not path" came as a consequence, not because any real "we must
key it by name because..." reason, I would think.

I know this is a huge departure from the design presented both at
the conceptual level and also at the implementation level, and that
is one of the reasons why I do not particularly want to like it, but
on the other hand, I am not bold enough to say that I will have a
good answer when later somebody asks "Why can we group only
submodules with labels, but not random group of paths (e.g. 'these
directories are about documentation')?"  And then, if we add path
labels to allow expressing groups of paths later, the follow-up
question would be "When should I use submodule labels and when
should I use path labels?  I can use path labels to group submodules
and say 'git submodule update -- :(group=docs)' can't I?".

And that makes me pause and step back from the "submodule labels"
idea.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Stefan Beller-4
On Thu, May 12, 2016 at 8:58 AM, Junio C Hamano <[hidden email]> wrote:

> Junio C Hamano <[hidden email]> writes:
>
>> The reason why I suspect that this may not work well with submodule
>> labels is because submodule labels (or any attribute we give via
>> .gitmodules to a submodule) are keyed by a submodule name, which is
>> the primary unchanging key (so that people can "mv" a submodule in
>> the context of the toplevel superproject without losing track of
>> submodule identity), not by paths to submodules, while the "group"
>> thing I want is merely a short-hand for pathspec elements and wants
>> to be keyed by paths.
>>
>> But there may be somebody more clever than I who can come up with a
>> way to unify these two similar concepts without confusing end users.
>
> Thinking about this even more, if there is no requirement that
> labels must be tied to submodule names, we just can scrap the idea
> of "submodule labels" to group things and instead just use "path
> labels", i.e. write the full path to the submodule and assign it a
> label in .gitattributes and use it in place of where we used *label
> in the patch series.  After all, an easy way to choose some among
> many submodules logically is a subset of an easy way to choose some
> among many paths.
>
> The only reason why we added the submodule label to .gitmodules is
> because we viewed it as submodule-specific thing and the "keyed by
> name, not path" came as a consequence, not because any real "we must
> key it by name because..." reason, I would think.
>
> I know this is a huge departure from the design presented both at
> the conceptual level and also at the implementation level, and that
> is one of the reasons why I do not particularly want to like it, but
> on the other hand, I am not bold enough to say that I will have a
> good answer when later somebody asks "Why can we group only
> submodules with labels, but not random group of paths (e.g. 'these
> directories are about documentation')?"  And then, if we add path
> labels to allow expressing groups of paths later, the follow-up
> question would be "When should I use submodule labels and when
> should I use path labels?  I can use path labels to group submodules
> and say 'git submodule update -- :(group=docs)' can't I?".
>
> And that makes me pause and step back from the "submodule labels"
> idea.
>

It sounds better at first (and I haven't thought further).
So if we were to go with this idea:

Label paths (or even pathspecs?) in the .gitattributes file.

I think it is important to keep the property of defining the labeling
in the tree, so you can follow upstream easier.

I tried coming up with an example for labels for generic paths,
but it looks like most of the time it is a substitution for a pathspec,
I did not find a convincing example which makes it easier to use.

`:(group=docs)` in the non submodule case could be expressed
as `Documentation/**`. Well maybe we also want to include README
and some other files which need to stay outside the Documentation
directory, so I can see how it may be useful.

We do not need a special labeling command. We do not
ship with a command which writes the .gitattributes or .gitignore
for you, and labels don't require this. So I could drop the patch
for "submodule add --label".

We can still keep the submodule.defaultGroup. (In the WIP I renamed
it to updateGroup as its only feature is to have it set during clone
and remebered for `git submodule update`)

When we allow labels to be generic path labels instead of submodule
labels, the user might be tempted to ask, why the submodules can
be specified but not the individual paths, i.e.

    git clone --init-submodule="(:group=docs)" ...

may strongly hint at:

    git clone --narrow="(:group=docs)" ...

would only get parts of the repository.

For the submodule case, this may add confusion as the user would
need to configure some properties in the .gitmodules file and some
in the .gitattributes file.

I think I'll try implementing a mock and see how much code it is for
a more fundamental pathspec extension.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/7] submodule groups

Junio C Hamano
Stefan Beller <[hidden email]> writes:

> We can still keep the submodule.defaultGroup. (In the WIP I renamed
> it to updateGroup as its only feature is to have it set during clone
> and remebered for `git submodule update`)

Yes, my idle speculation between "[submodule"x"] label=A" stored
in .gitmodules and "path/to/submodule  group=A" in .gitattributes
is completely orthogonal to the need for submodule.defaultGroup.

The only difference it brings in is how we evaluate what
submodule.defaultGroup names (i.e. via "submodule label" vs
"path label").

> When we allow labels to be generic path labels instead of submodule
> labels, the user might be tempted to ask, why the submodules can
> be specified but not the individual paths, i.e.
>
>     git clone --init-submodule="(:group=docs)" ...
>
> may strongly hint at:
>
>     git clone --narrow="(:group=docs)" ...
>
> would only get parts of the repository.

If we are to have a (proper) narrow clone, I do not think there is
any reason why the latter should not work as expected.

> For the submodule case, this may add confusion as the user would
> need to configure some properties in the .gitmodules file and some
> in the .gitattributes file.

I do not quite understand this comment.

A new reader who knows nothing about "grouping submodules" would not
have any trouble, I suspect, if all and the only grouping she learns
is "grouping paths" from the beginning.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
12