Strangeness with git-add and nested repositories

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Strangeness with git-add and nested repositories

Andrew J
I've noticed some strangeness with git-add and nested repos.

This tar.gz has a reproduction script, parent repo, and nested repo
that exhibits the issue:
https://drive.google.com/uc?export=download&id=0Bwjufq6oAZMfcGVxZ2dlMElEVlE

If you extract the archive and do the following (on linux please, I
haven't even tried this on Windows):
cd git-add-bug
./bad-git-add.sh
(Examine the script before running, please)

It will echo what I'm encountering to the screen, so it should be
straightforward to follow, but I will summarize here:
If I run a git-add command where one or more of the files live in
nested repositories, like so (command is shortened here, includes more
files in the script):
git add -v -f --
src/chromium/src/third_party/libFuzzer/src/FuzzerInterface.h testfile
As confirmed by git-status, this results in testfile being added, but
FuzzerInterface.h not being added.
FuzzerInterface.h is inside of a nested repository
(src/chromium/src/third_party/libFuzzer/src/), while testfile is in
the current main repository.

On the other hand, the following command:
git add -v -f -- src/chromium/src/third_party/libFuzzer/src/FuzzerInterface.h
Results in FuzzerInterface.h being added, as confirmed by git-status.
Excluding testfile from the git-add command seems to do the trick.

My expectation:
Both testfile and FuzzerInterface.h should be added if they are
specified in the git-add command, regardless of whether they are
specified along with another file that doesn't happen to live in a
nested repository ("testfile" in this example).

If someone could help me understand what's going on here, I'd appreciate it.

Thanks,

Andrew
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Strangeness with git-add and nested repositories

Stefan Beller-4
On Wed, Apr 27, 2016 at 1:31 AM, Andrew J <[hidden email]> wrote:

> I've noticed some strangeness with git-add and nested repos.
>
> This tar.gz has a reproduction script, parent repo, and nested repo
> that exhibits the issue:
> https://drive.google.com/uc?export=download&id=0Bwjufq6oAZMfcGVxZ2dlMElEVlE
>
> If you extract the archive and do the following (on linux please, I
> haven't even tried this on Windows):
> cd git-add-bug
> ./bad-git-add.sh
> (Examine the script before running, please)
>
> It will echo what I'm encountering to the screen, so it should be
> straightforward to follow, but I will summarize here:
> If I run a git-add command where one or more of the files live in
> nested repositories, like so (command is shortened here, includes more
> files in the script):
> git add -v -f --
> src/chromium/src/third_party/libFuzzer/src/FuzzerInterface.h testfile
> As confirmed by git-status, this results in testfile being added, but
> FuzzerInterface.h not being added.
> FuzzerInterface.h is inside of a nested repository
> (src/chromium/src/third_party/libFuzzer/src/), while testfile is in
> the current main repository.
>
> On the other hand, the following command:
> git add -v -f -- src/chromium/src/third_party/libFuzzer/src/FuzzerInterface.h
> Results in FuzzerInterface.h being added, as confirmed by git-status.
> Excluding testfile from the git-add command seems to do the trick.
>
> My expectation:
> Both testfile and FuzzerInterface.h should be added if they are
> specified in the git-add command, regardless of whether they are
> specified along with another file that doesn't happen to live in a
> nested repository ("testfile" in this example).

I would call it a bug. [This use case is interesting for working with submodules
(though your nested repositories do not seem to be submodules), so worth
looking at for me.]

>
> If someone could help me understand what's going on here, I'd appreciate it.

I think (pure speculation), that it the error is in the context
(repository) switching logic.
What happens if you alter the order, i.e. give testfile first and then
the files in the nested
repos?

    git add -- file path/to/subdir/file

should do internally IMHO:

    git add file
    git -C path-to-subdir add file

Thanks,
Stefan

>
> Thanks,
>
> Andrew
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Strangeness with git-add and nested repositories

Andrew J
Hi Stefan,

On Wed, Apr 27, 2016 at 9:08 AM, Stefan Beller <[hidden email]> wrote:
> I think (pure speculation), that it the error is in the context
> (repository) switching logic.
> What happens if you alter the order, i.e. give testfile first and then
> the files in the nested
> repos?

Interestingly, reversing the order produces the same result (the
testfile is added, the nested files are not).

I've also noticed that running something like 'git status testfile
nestedfiles' results in the nested files being omitted from the git
status output; I'd expect them to be printed by git-status as
untracked files. So it appears the problem is not isolated to git-add.

To give some context, my use case is that I have a parent project that
links to numerous chromium libraries, thus my parent project needs
access to many of chromium's headers at build time. I wanted to make
these headers available to other developers without them needing to
check out all of chromium.
So I add all the chromium headers to the parent project with something like:
find deps/chromium/src -name "*.h" | xargs git add --
I was weirded out to find that many of the header files weren't being
added, as I've already described.

I ultimately worked around this by doing:
find chromium/src -name "*.h" | xargs -n 1 git add
Since each file gets added separately, this is quite slow. So it'd be
nice if this little bug was fixed someday :)

As you probably know, Chromium is comprised of many hundreds of nested
repos, so that aided in manifesting this issue.

Thanks,
Andrew
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Strangeness with git-add and nested repositories

Stefan Beller-4
On Wed, Apr 27, 2016 at 11:10 PM, Andrew J <[hidden email]> wrote:

> Hi Stefan,
>
> On Wed, Apr 27, 2016 at 9:08 AM, Stefan Beller <[hidden email]> wrote:
>> I think (pure speculation), that it the error is in the context
>> (repository) switching logic.
>> What happens if you alter the order, i.e. give testfile first and then
>> the files in the nested
>> repos?
>
> Interestingly, reversing the order produces the same result (the
> testfile is added, the nested files are not).
>
> I've also noticed that running something like 'git status testfile
> nestedfiles' results in the nested files being omitted from the git
> status output; I'd expect them to be printed by git-status as
> untracked files. So it appears the problem is not isolated to git-add.

Looking at the code, git-add uses parse_pathspec, which is used my
most commands[https://github.com/git/git/blob/master/builtin/add.c#L364]
and I think that function needs to learn to cover paths for different
repositories.
Maybe we can pass in a flag, which allows parse_pathspec to get files from
the different repos and then it's up to each command how to deal with that
list of files.



>
> To give some context, my use case is that I have a parent project that
> links to numerous chromium libraries, thus my parent project needs
> access to many of chromium's headers at build time. I wanted to make
> these headers available to other developers without them needing to
> check out all of chromium.
> So I add all the chromium headers to the parent project with something like:
> find deps/chromium/src -name "*.h" | xargs git add --
> I was weirded out to find that many of the header files weren't being
> added, as I've already described.
>
> I ultimately worked around this by doing:
> find chromium/src -name "*.h" | xargs -n 1 git add
> Since each file gets added separately, this is quite slow. So it'd be
> nice if this little bug was fixed someday :)
>
> As you probably know, Chromium is comprised of many hundreds of nested
> repos, so that aided in manifesting this issue.
>
> Thanks,
> Andrew
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Strangeness with git-add and nested repositories

Junio C Hamano
In reply to this post by Stefan Beller-4
Stefan Beller <[hidden email]> writes:

> I think (pure speculation), that it the error is in the context
> (repository) switching logic.
> What happens if you alter the order, i.e. give testfile first and then
> the files in the nested
> repos?
>
>     git add -- file path/to/subdir/file
>
> should do internally IMHO:
>
>     git add file
>     git -C path-to-subdir add file

My undertanding of what _should_ happen in the world order as
currently defined (not necessarily implemented) is:

 * "git add -- A B" must work the same way as "git add -- B A" and
   "git add -- A; git add -- B"

 * "git add -- path/to/subdir/file", when any of path/, path/to/,
   path/to/subdir/ is a Git repository that is different from the
   current Git repository, must fail.

IOW, if 'path' is a repository (whether it is known as a submodule
to the repository whose working tree contains it, or it is an
untracked directory from the containing repository's point of view),
the index of the containing repository cannot get path/$anything in
it.  If you managed to do so, you found a bug [*1*].

path/.git/index can of course have "to/subdir/file" in it, and from
that point of view, "git -C path/to/subdir add file" may one day
become an improved world order.  It is just we haven't discussed
that possibility or reached concensus that it is a good idea.


[Footnote]

*1* Of course, some of the bugs in this class may fundamentally be
    unfixable and would fall into the same category as "doctor, it
    hurts when I do this--don't do it then".  For example, you may
    treat path/ as the top of the working tree of another repository
    whose git-dir is not at path/.git by arranging GIT_WORK_TREE and
    GIT_DIR environment variables, but you may do so only when you
    actually are accessing the contents of path/ as its own project.
    And when you are using the enclosing project (whose .git/ would
    sit next to path/), there is no way for "git add path/to/file"
    to know that everything under "path/" does not belong to the
    current repository and instead it is part of the project rooted
    at path/, which is an obvious example of "fundamentally
    unfixable" case.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Strangeness with git-add and nested repositories

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

>  * "git add -- path/to/subdir/file", when any of path/, path/to/,
>    path/to/subdir/ is a Git repository that is different from the
>    current Git repository, must fail.

If any of the leading directories in that long path is actually a
git repository that is different from the current one, it shouldn't
have added it.  IOW, adding the path is a bug.

"git add A/B/C" usually enforces this rule by

 - check A; if A is a separate repository (A/.git exists, etc.),
   do not add this path;

   - check A/B; if A/B is a separate repository, ignore;

     - check A/B/C; if A/B/C is a separate repository, ignore.
       otherwise add it.

When given multiple paths, e.g. "git add A/B/C A/D", it tries to
"optimize" things by first finding common leading directory (in this
case "A/") and doing something slightly different, and I think the
bug Andrew saw lies in that codepath.  It is likely that the code is
forgetting to make sure that there is no top of enclosed working
tree in the common leading directory part of the path.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Strangeness with git-add and nested repositories

Stefan Beller-4
In reply to this post by Junio C Hamano
On Thu, Apr 28, 2016 at 9:39 AM, Junio C Hamano <[hidden email]> wrote:

> Stefan Beller <[hidden email]> writes:
>
>> I think (pure speculation), that it the error is in the context
>> (repository) switching logic.
>> What happens if you alter the order, i.e. give testfile first and then
>> the files in the nested
>> repos?
>>
>>     git add -- file path/to/subdir/file
>>
>> should do internally IMHO:
>>
>>     git add file
>>     git -C path-to-subdir add file
>
> My undertanding of what _should_ happen in the world order as
> currently defined (not necessarily implemented) is:
>
>  * "git add -- A B" must work the same way as "git add -- B A" and
>    "git add -- A; git add -- B"

I agree.

>
>  * "git add -- path/to/subdir/file", when any of path/, path/to/,
>    path/to/subdir/ is a Git repository that is different from the
>    current Git repository, must fail.

I agree that this is the current expectation for the world order.
However I would like to propose to change that eventually.
(Once the submodule groups are there and we can treat
submodules as a special form of narrow checkout, we want to
have the feature of adding across submodules and even committing
across submodules/repos, I would think)

>
> IOW, if 'path' is a repository (whether it is known as a submodule
> to the repository whose working tree contains it, or it is an
> untracked directory from the containing repository's point of view),
> the index of the containing repository cannot get path/$anything in
> it.  If you managed to do so, you found a bug [*1*].
>
> path/.git/index can of course have "to/subdir/file" in it, and from
> that point of view, "git -C path/to/subdir add file" may one day
> become an improved world order.  It is just we haven't discussed
> that possibility or reached concensus that it is a good idea.
>
>
> [Footnote]
>
> *1* Of course, some of the bugs in this class may fundamentally be
>     unfixable and would fall into the same category as "doctor, it
>     hurts when I do this--don't do it then".  For example, you may
>     treat path/ as the top of the working tree of another repository
>     whose git-dir is not at path/.git by arranging GIT_WORK_TREE and
>     GIT_DIR environment variables, but you may do so only when you
>     actually are accessing the contents of path/ as its own project.
>     And when you are using the enclosing project (whose .git/ would
>     sit next to path/), there is no way for "git add path/to/file"
>     to know that everything under "path/" does not belong to the
>     current repository and instead it is part of the project rooted
>     at path/, which is an obvious example of "fundamentally
>     unfixable" case.
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Strangeness with git-add and nested repositories

Andrew J
In reply to this post by Junio C Hamano
On Thu, Apr 28, 2016 at 9:48 AM, Junio C Hamano <[hidden email]> wrote:
> Junio C Hamano <[hidden email]> writes:
> When given multiple paths, e.g. "git add A/B/C A/D", it tries to
> "optimize" things by first finding common leading directory (in this
> case "A/") and doing something slightly different, and I think the
> bug Andrew saw lies in that codepath.  It is likely that the code is
> forgetting to make sure that there is no top of enclosed working
> tree in the common leading directory part of the path.

Actually, the very simplest case succeeds in adding a file from a nested repo:
==
$ mkdir parent
$ cd parent
$ git init
Initialized empty Git repository in /home/ajohnson/parent/.git/
$ mkdir subrepo
$ cd subrepo
$ git init
Initialized empty Git repository in /home/ajohnson/parent/subrepo/.git/
$ touch subfile
$ git add subfile
$ git commit -m test
[master (root-commit) 7480a2f] test
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 subfile
$ cd ..
$ git add subrepo/subfile
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

        new file:   subrepo/subfile
==

It is this more complex case that fails to add the file from the nested repo:
==
$ mkdir parent
$ cd parent
$ git init
Initialized empty Git repository in /home/ajohnson/parent/.git/
$ mkdir subrepo
$ cd subrepo
$ git init
Initialized empty Git repository in /home/ajohnson/parent/subrepo/.git/
$ touch subfile
$ git add subfile
$ git commit -m test
[master (root-commit) 36adec4] test
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 subfile
$ cd ..
$ touch parentfile
$ git add parentfile subrepo/subfile
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

        new file:   parentfile

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        subrepo/
==

Given that the very simplest case of adding a single nested file
succeeds, I had assumed that the intended world order was to be able
to add files from nested repos, and that the more edgy case of adding
a mixture of files from the parent and nested repos failing was the
bug.

For my part, I'd *much* prefer to be able to add files from nested
repos, so all this "adding any files from nested repos is a bug" talk
is bumming me out!

Thanks,

Andrew
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Strangeness with git-add and nested repositories

Junio C Hamano
In reply to this post by Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> My undertanding of what _should_ happen in the world order as
> currently defined (not necessarily implemented) is:
>
>  * "git add -- A B" must work the same way as "git add -- B A" and
>    "git add -- A; git add -- B"
>
>  * "git add -- path/to/subdir/file", when any of path/, path/to/,
>    path/to/subdir/ is a Git repository that is different from the
>    current Git repository, must fail.
>
> IOW, if 'path' is a repository (whether it is known as a submodule
> to the repository whose working tree contains it, or it is an
> untracked directory from the containing repository's point of view),
> the index of the containing repository cannot get path/$anything in
> it.  If you managed to do so, you found a bug [*1*].
>
> path/.git/index can of course have "to/subdir/file" in it, and from
> that point of view, "git -C path/to/subdir add file" may one day
> become an improved world order.  It is just we haven't discussed
> that possibility or reached concensus that it is a good idea.

A cursory bisection suggests that this used to work up to at least
v2.7.4; this must be a recent regression.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Strangeness with git-add and nested repositories

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> A cursory bisection suggests that this used to work up to at least
> v2.7.4; this must be a recent regression.

No, false report.  This seems to be broken from ages ago.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html