Quantcast

Not going beyond symbolic links

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Not going beyond symbolic links

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> A symlink to us is just a different kind of blob, and by definition a blob
> is the leaf level of a tree structure that represents the working tree in
> the index. There won't be anything hanging below it, and when adding
> things to the index we should not dereference the symlink to see where it
> leads to.
>
> Traditionally we have been loose about this check, and the normal "git
> add" and "git update-index" codepath is still forever broken, and we
> allow:
>
>       $ mkdir dir
>       $ >dir/file
>       $ ln -s dir symlink
>       $ git add symlink/file
>
> but some codepaths that matter more (because they do larger damage
> unattended, as opposed to the above command sequence that can be avoided
> by user education a bit more easily), such as "git apply" and "git
> read-tree", have been corrected using has_symlink_leading_path() since mid
> 2007.  We would need to follow through c40641b (Optimize symlink/directory
> detection, 2008-05-09) and further fix "git add" and "git update-index"
> codepaths to forbid the above command sequence.

I started to revisit this issue and patched "git update-index --add"
and "git add" so far.  Patches follow.

I think we should also check the following and fix them if any of them is
broken.  Under the same "dir/file exists but symlink points at dir"
scenario:

 * "git rm symlink/file" --- should fail without removing dir/file;

 * "git ls-tree $commit -- symlink/file" should *not* fail if $commit does
   have "symlink/file" in it (iow, we cannot add the logic to get_pathspec());

 * "git ls-files --exclude-standard -o -- symlink/file" should not talk
   about "symlink/file".

If we reword the paragraph I quoted at the beginning of this message
slightly:

        A gitlink to is is just a leaf level of a tree structure that
        represents the working tree in the index.  There won't be anything
        hanging below it.

We would need a similar check to stop at module boundary, just like the
helper function we use for these patches, has_symlink_leading_path(),
stops at a symlink.

So without further ado...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 1/2] update-index: refuse to add working tree items beyond symlinks

Junio C Hamano
When "sym" is a symbolic link that is inside the working tree, and it
points at a directory "dir" that has "path" in it, "update-index --add
sym/path" used to mistakenly add "sym/path" as if "sym" were a normal
directory.

"git apply", "git diff" and "git merge" have been taught about this issue
some time ago, but "update-index" and "add" have been left ignorant for
too long.

Signed-off-by: Junio C Hamano <[hidden email]>
---
 builtin-update-index.c     |    5 ++++-
 t/t0055-beyond-symlinks.sh |   20 ++++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletions(-)
 create mode 100755 t/t0055-beyond-symlinks.sh

diff --git a/builtin-update-index.c b/builtin-update-index.c
index 38eb53c..434cb8e 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -194,6 +194,10 @@ static int process_path(const char *path)
  int len;
  struct stat st;
 
+ len = strlen(path);
+ if (has_symlink_leading_path(len, path))
+ return error("'%s' is beyond a symbolic link", path);
+
  /*
  * First things first: get the stat information, to decide
  * what to do about the pathname!
@@ -201,7 +205,6 @@ static int process_path(const char *path)
  if (lstat(path, &st) < 0)
  return process_lstat_error(path, errno);
 
- len = strlen(path);
  if (S_ISDIR(st.st_mode))
  return process_directory(path, len, &st);
 
diff --git a/t/t0055-beyond-symlinks.sh b/t/t0055-beyond-symlinks.sh
new file mode 100755
index 0000000..eb11dd7
--- /dev/null
+++ b/t/t0055-beyond-symlinks.sh
@@ -0,0 +1,20 @@
+#!/bin/sh
+
+test_description='update-index refuses to add beyond symlinks'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+ >a &&
+ mkdir b &&
+ ln -s b c &&
+ >c/d &&
+ git update-index --add a b/d
+'
+
+test_expect_success 'update-index --add beyond symlinks' '
+ test_must_fail git update-index --add c/d &&
+ ! ( git ls-files | grep c/d )
+'
+
+test_done
--
1.6.0.rc1.64.g61192

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 2/2] add: refuse to add working tree items beyond symlinks

Junio C Hamano
In reply to this post by Junio C Hamano
This is the same fix for the issue of adding "sym/path" when "sym" is a
symblic link that points at a directory "dir" with "path" in it.

Signed-off-by: Junio C Hamano <[hidden email]>
---
 builtin-add.c              |   12 +++++++++++-
 dir.c                      |    6 +++++-
 t/t0055-beyond-symlinks.sh |    7 ++++++-
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index fc3f96e..81b64d7 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -153,6 +153,16 @@ static const char **validate_pathspec(int argc, const char **argv, const char *p
 {
  const char **pathspec = get_pathspec(prefix, argv);
 
+ if (pathspec) {
+ const char **p;
+ for (p = pathspec; *p; p++) {
+ if (has_symlink_leading_path(strlen(*p), *p)) {
+ int len = prefix ? strlen(prefix) : 0;
+ die("'%s' is beyond a symbolic link", *p + len);
+ }
+ }
+ }
+
  return pathspec;
 }
 
@@ -278,7 +288,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
  fprintf(stderr, "Maybe you wanted to say 'git add .'?\n");
  return 0;
  }
- pathspec = get_pathspec(prefix, argv);
+ pathspec = validate_pathspec(argc, argv, prefix);
 
  /*
  * If we are adding new files, we need to scan the working
diff --git a/dir.c b/dir.c
index 29d1d5b..ae7046f 100644
--- a/dir.c
+++ b/dir.c
@@ -727,8 +727,12 @@ static void free_simplify(struct path_simplify *simplify)
 
 int read_directory(struct dir_struct *dir, const char *path, const char *base, int baselen, const char **pathspec)
 {
- struct path_simplify *simplify = create_simplify(pathspec);
+ struct path_simplify *simplify;
 
+ if (has_symlink_leading_path(strlen(path), path))
+ return dir->nr;
+
+ simplify = create_simplify(pathspec);
  read_directory_recursive(dir, path, base, baselen, 0, simplify);
  free_simplify(simplify);
  qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name);
diff --git a/t/t0055-beyond-symlinks.sh b/t/t0055-beyond-symlinks.sh
index eb11dd7..b29c37a 100755
--- a/t/t0055-beyond-symlinks.sh
+++ b/t/t0055-beyond-symlinks.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-test_description='update-index refuses to add beyond symlinks'
+test_description='update-index and add refuse to add beyond symlinks'
 
 . ./test-lib.sh
 
@@ -17,4 +17,9 @@ test_expect_success 'update-index --add beyond symlinks' '
  ! ( git ls-files | grep c/d )
 '
 
+test_expect_success 'add beyond symlinks' '
+ test_must_fail git add c/d &&
+ ! ( git ls-files | grep c/d )
+'
+
 test_done
--
1.6.0.rc1.64.g61192

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Linus Torvalds-3
In reply to this post by Junio C Hamano


On Mon, 4 Aug 2008, Junio C Hamano wrote:
>
> I started to revisit this issue and patched "git update-index --add"
> and "git add" so far.  Patches follow.

Patches look good to me, but did you check the performance impact?

The rewritten 'has_symlink_leading_path()' should do ok, but it migth
still be a huge performance downside to check all the paths for things
like "git add -u".

I didn't test, though.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Junio C Hamano
Linus Torvalds <[hidden email]> writes:

> On Mon, 4 Aug 2008, Junio C Hamano wrote:
>>
>> I started to revisit this issue and patched "git update-index --add"
>> and "git add" so far.  Patches follow.
>
> Patches look good to me, but did you check the performance impact?
>
> The rewritten 'has_symlink_leading_path()' should do ok, but it migth
> still be a huge performance downside to check all the paths for things
> like "git add -u".

Not yet.

I think this is a necessary "correctness" thing to do regardless of the
performance impact, and adding the logic to stop at submodule boundary
(aka gitlinks) should come before optimization.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Linus Torvalds-3


On Mon, 4 Aug 2008, Junio C Hamano wrote:

> >
> > The rewritten 'has_symlink_leading_path()' should do ok, but it migth
> > still be a huge performance downside to check all the paths for things
> > like "git add -u".
>
> Not yet.
>
> I think this is a necessary "correctness" thing to do regardless of the
> performance impact, and adding the logic to stop at submodule boundary
> (aka gitlinks) should come before optimization.

Well, "performance" is a feature too, and it's not correct to say that "X
should be fixed before optimization". If "X" slows things down, the
question should be whether it really needs fixing..

Yes, we find symlinks when we do _new_ files, but is it really so bad to
assume that existing directories that we have already added to the index
are stable? It can easily be seen as a feature too that you can force git
to ignore the symlink and see it as a real directory.

So that's why it would be interesting to hear about the performance
impact. Because this is definitely not a black-and-white "one behavior is
wrong and one behavior is right".

                        Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Johannes Schindelin
Hi,

On Mon, 4 Aug 2008, Linus Torvalds wrote:

> On Mon, 4 Aug 2008, Junio C Hamano wrote:
> > >
> > > The rewritten 'has_symlink_leading_path()' should do ok, but it
> > > migth still be a huge performance downside to check all the paths
> > > for things like "git add -u".
> >
> > Not yet.
> >
> > I think this is a necessary "correctness" thing to do regardless of
> > the performance impact, and adding the logic to stop at submodule
> > boundary (aka gitlinks) should come before optimization.
>
> Well, "performance" is a feature too, and it's not correct to say that
> "X should be fixed before optimization". If "X" slows things down, the
> question should be whether it really needs fixing..
>
> Yes, we find symlinks when we do _new_ files, but is it really so bad to
> assume that existing directories that we have already added to the index
> are stable? It can easily be seen as a feature too that you can force git
> to ignore the symlink and see it as a real directory.

Actually, whatever you want, it needs fixing.

I vividly remember being quite pissed by Git replacing a symbolic link in
my working directory with a directory, and instead of updating the files
which were technically outside of the repository, Git populated that newly
created directory.

However, please note that Junio's patch affects git-add, AFAIR, not
git-update-index.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Linus Torvalds-3


On Tue, 5 Aug 2008, Johannes Schindelin wrote:
>
> I vividly remember being quite pissed by Git replacing a symbolic link in
> my working directory with a directory, and instead of updating the files
> which were technically outside of the repository, Git populated that newly
> created directory.

Well, that can cut both ways. For example, I vividly remember a time in
the distant past when harddisks were tiny, and I didn't have insanely
high-end hardware, and I was building the X server, but had to split
things up over two partitions because each individual partition was
too full.

IOW, sometimes you may _want_ to use symlinks that way, even within one
project - with a symlink allowing you to move parts of it around
"transparently".

Of course, these days under Linux we can just use bind mounts, so the use
of symlinks to stitch together two or more different trees is fairly
old-fashioned, but is still the only option on some systems or if you
don't have root.

(These days harddisks are also generally so big that it never happens. But
on my EeePC laptop, I still end up with two filesystems, 4GB  and 8GB
each. So it's not inconceivable to be in that kind of situation even
today).

                        Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Junio C Hamano
In reply to this post by Linus Torvalds-3
Linus Torvalds <[hidden email]> writes:

> ... Because this is definitely not a black-and-white "one behavior is
> wrong and one behavior is right".

I wish I could agree with you that this is a feature, but 16a4c61
(read-tree -m -u: avoid getting confused by intermediate symlinks.,
2007-05-10) and 64cab59 (apply: do not get confused by symlinks in the
middle, 2007-05-11) came from real world breakage cases and the root cause
was that we were too lenient to allow such a "feature" that pretends the
symlink not to be there.

Right now, we are being careful only while branch switching and patch
application, but the codepaths that add directly to the index (add and
update-index) are not fixed (or "still has the feature").

I do not see a clean way to keep such a "feature" without hurting users
who suffered the bugs these two commits from May 2007 fixed.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

David Lang
On Mon, 4 Aug 2008, Junio C Hamano wrote:

> Linus Torvalds <[hidden email]> writes:
>
>> ... Because this is definitely not a black-and-white "one behavior is
>> wrong and one behavior is right".
>
> I wish I could agree with you that this is a feature, but 16a4c61
> (read-tree -m -u: avoid getting confused by intermediate symlinks.,
> 2007-05-10) and 64cab59 (apply: do not get confused by symlinks in the
> middle, 2007-05-11) came from real world breakage cases and the root cause
> was that we were too lenient to allow such a "feature" that pretends the
> symlink not to be there.
>
> Right now, we are being careful only while branch switching and patch
> application, but the codepaths that add directly to the index (add and
> update-index) are not fixed (or "still has the feature").
>
> I do not see a clean way to keep such a "feature" without hurting users
> who suffered the bugs these two commits from May 2007 fixed.

config option?

I think a command line is too much work for too little value, but if the
check could be ignored based on a config option without costing too much
it may be reasonable.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Junio C Hamano
In reply to this post by Johannes Schindelin
Johannes Schindelin <[hidden email]> writes:

> However, please note that Junio's patch affects git-add, AFAIR, not
> git-update-index.

Really?  I thought I added a test case to cover the plumbing as well...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Junio C Hamano
In reply to this post by Linus Torvalds-3
Linus Torvalds <[hidden email]> writes:

> IOW, sometimes you may _want_ to use symlinks that way, even within one
> project - with a symlink allowing you to move parts of it around
> "transparently".

While I admit that I have managed a large directory split across
partitions grafted via symlinks in pre-git days myself, ever since you
started "tracking" symbolic links with 8ae0a8c (git and symlinks as
tracked content, 2005-05-05), you have pretty much been committed to
"track" symbolic links.

This goes even before that commit. The readdir() loop done in show-files.c
with 8695c8b (Add "show-files" command to show the list of managed (or
non-managed) files., 2005-04-11) does not dereference symbolic links
pointing at a directory elsewhere, which we still have as read_directory()
in dir.c without much change in the basic structure.

I would give some leeway to other people who made comments in this thread,
who may not be so familiar with the low-level git codebase, but I have to
say that if you claim that dereferencing symbolic links in the middle is a
feature, you are not being completely honest, you haven't thought through
the issues, and/or you simply forgot the details.  I'd suspect most likely
it is the last one ;-).

The thing is, the "feature" is not very well supported, even without the
fixes from last night.  If you have a symlink "sym" that points at "dir"
that has "file" in it, and if neither "sym" nor "dir/file" are tracked,
you can "git add sym/file" to add it (I called it a bug).

However:

 (1) starting from the same condition, "git add ." does _not_ add it (you
     get the symbolic link "sym" added to your index instead, as well as
     "dir/file");

 (2) after you add "sym/file" through the bug, if you say "git add .", it
     will be removed from the index and you will instead have "sym" and
     "dir/file" (with an ancient git before 1.5.0, you will get "unable to
     add sym" error instead).

 (3) after you add "sym/file", "git diff" will immediately notice that you
     have removed it (this is a fairly recent fix; 1.5.4.X doesn't notice
     it).

You cannot have it as a reliably usable feature without a major surgery,
and this is fundamental. You simply cannot have it both ways without
telling git which symlink is "tracked" and which are only there for
storage sizing.

If you seriously want to claim that we support such a feature, you would
at least need to:

 (0) have a way for the user to say, "the project tree may have a
     directory D, but I do not want to check it out as a directory because
     my partition is too small.  Whenever you need to create a directory
     there and hang a tree underneath, instead create a symlink that
     points at /export/large/D instead".  Most likely this information
     would go to .git/config;

 (1) whereever we run "create leading directories", we notice and honor
     the above configuration (mostly entry.c::create_directories() called
     from entry.c::checkout_entry());

 (2) whenever we need to check out a file to path D, instead of
     recursively remove everything under it, we remove the symlink and
     deposit the file there (mostly entry.c::remove_subtree() and
     unpack-trees.c::verify_absent());

 (3) whenever we traverse working tree using readdir(), notice that the
     symbolic link we are looking at is the funny "pointing elsewhere but
     this is really a directory" specified in (0) and recurse into the
     directory pointed by it (dir.c::read_directory_recursive() but there
     may be others)

 (4) and we teach has_symlink_leading_path() to special case such a path
     you configured in (0).

I personally do not think adding these to support such a "feature" is such
a high priority, and I do not think it is honest to claim we support such
a feature without doing any of the above.  The current reality is that our
symlink support is still broken in corner cases, and being able to easily
add "sym/path" via "git add" and "git update-index --add" is one of them.




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Johannes Schindelin
In reply to this post by Junio C Hamano
Hi,

On Mon, 4 Aug 2008, Junio C Hamano wrote:

> Johannes Schindelin <[hidden email]> writes:
>
> > However, please note that Junio's patch affects git-add, AFAIR, not
> > git-update-index.
>
> Really?  I thought I added a test case to cover the plumbing as well...

Right, I missed that one.  I kinda hoped that only porcelain is affected.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Dmitry Potapov
In reply to this post by Junio C Hamano
On Mon, Aug 04, 2008 at 11:11:11PM -0700, Junio C Hamano wrote:
>
> The thing is, the "feature" is not very well supported, even without the
> fixes from last night.  If you have a symlink "sym" that points at "dir"
> that has "file" in it, and if neither "sym" nor "dir/file" are tracked,
> you can "git add sym/file" to add it (I called it a bug).

Well, I actually used this feature less than a year ago and did not have
any problem with that. Not that it is very important for me now, but I had
makefiles in a separate build directory, and I used a symlink to move
building to a tmpfs partion, which sped up building slightly.

Obviously, if a symlink points to a directory inside of the repository
and then you use "git add sym/file", it is definitely a mistake. OTOH,
let's consider the following situation:

git init
mkdir newdir
touch newdir/foo
git add newdir/foo
git commit -m 'add foo'
mv newdir /tmp/
ln -s /tmp/newdir
touch newdir/bar
git add newdir/bar
git commit -m 'add bar'
git ls-files

And you can see:
newdir/bar
newdir/foo

Git does exactly what I want here!

Anyway, I more concern with performance impact of your patch. If it
is noticeable, then it would be nice to have an option to turn this
check off.

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Linus Torvalds-3
In reply to this post by Junio C Hamano


On Mon, 4 Aug 2008, Junio C Hamano wrote:
>
> While I admit that I have managed a large directory split across
> partitions grafted via symlinks in pre-git days myself, ever since you
> started "tracking" symbolic links with 8ae0a8c (git and symlinks as
> tracked content, 2005-05-05), you have pretty much been committed to
> "track" symbolic links.

Yes, but my point is:

 - IF the cost is exorbitant (which was my question that triggered this:
   it sure as h*ll _would_ have been too high back in the days of the
   original symlink-matching code) ...

 - ... then there really are valid cases that say that we could just call
   it a feature.

That's really my whole argument. Saying that "ok, we will assume that
existing paths that git already knows about are stable" is not really an
odd feature. It's one we have lived with for years, and it's one that is
actually pretty dang trivial to work with. Yes, it can cause unexpected
behaviour, but if you hit it, that unexpected behavior is actually not
_that_ hard to work around.

This is all I'm arguing for. People claim that there are 'correctness'
issues. I say that 'performance issues' are important, and we can and we
_should_ take performance issues very seriously. So seriously that we'll
make performance a _feature_ if required.

                        Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Junio C Hamano
In reply to this post by Dmitry Potapov
Dmitry Potapov <[hidden email]> writes:

> Obviously, if a symlink points to a directory inside of the repository
> and then you use "git add sym/file", it is definitely a mistake. OTOH,
> let's consider the following situation:
>
> git init
> mkdir newdir
> touch newdir/foo
> git add newdir/foo
> git commit -m 'add foo'
> mv newdir /tmp/
> ln -s /tmp/newdir
> touch newdir/bar
> git add newdir/bar
> git commit -m 'add bar'
> git ls-files
>
> And you can see:
> newdir/bar
> newdir/foo
>
> Git does exactly what I want here!

Now, do this after the above sequence:

  git diff
  git add . && git ls-files

and tell me what you see.  Does it show exactly what you want here?

I've already told you what kind of changes you would need to make if you
really want to claim this is a feature.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Junio C Hamano
In reply to this post by Linus Torvalds-3
Linus Torvalds <[hidden email]> writes:

> On Mon, 4 Aug 2008, Junio C Hamano wrote:
>>
>> I started to revisit this issue and patched "git update-index --add"
>> and "git add" so far.  Patches follow.
>
> Patches look good to me, but did you check the performance impact?
>
> The rewritten 'has_symlink_leading_path()' should do ok, but it migth
> still be a huge performance downside to check all the paths for things
> like "git add -u".

I couldn't quite measure much meaningful performance impact.

My test repository was the kernel tree at v2.6.27-rc2-20-g685d87f, without
any build products nor editor temporary crufts.

By the way, if anybody wants to reproduce this, be careful with the tests
that run "rm -f .git/index" before adding everything.  After doing so, if
you check the result with "git diff --stat HEAD", you will notice many
missing files --- I almost got a heart attack before inspecting this file:

        $ cat arch/powerpc/.gitignore
        include

Yes, it excludes 261 already tracked files.  Is it intended?  I dunno.

The file is there since 06f2138 ([POWERPC] Add files build to .gitignore,
2006-11-26).  Not that having tracked files in an entirely ignored
directory is a bug, but the .gitignore entry seems to me an Oops waiting
to happen.  There are three commits that affect this directory that is
entirely ignored after that entry is added:

    b5b9309 (remove unnecessary <linux/hdreg.h> includes, 2008-08-05)
    9c4cb82 (powerpc: Remove use of CONFIG_PPC_MERGE, 2008-08-02)
    b8b572e (powerpc: Move include files to arch/powerpc/include/asm, 2008-08-01)

Anyhow, back on topic.  Here are the numbers.

Test #1: With fully up-to-date .git/index, "git add .", best of 5 runs

* v1.6.0-rc2
0.20user 0.20system 0:00.41elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3622minor)pagefaults 0swaps

* v1.6.0-rc1-73-g725b060 (with patch)
0.22user 0.19system 0:00.41elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3635minor)pagefaults 0swaps


Test #2: After "rm -f .git/index", "git add .", best of 5 runs

* v1.6.0-rc2
1.81user 0.55system 0:02.51elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+92855minor)pagefaults 0swaps

* v1.6.0-rc1-73-g725b060 (with patch)
1.76user 0.56system 0:02.58elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+92852minor)pagefaults 0swaps


Test #3: same as #2, after dropping cache with "echo 3 > /proc/sys/vm/drop_caches".
(Yes, I have slow disks).

* v1.6.0-rc2
3.43user 2.29system 2:02.45elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (29611major+63244minor)pagefaults 0swaps

* v1.6.0-rc1-73-g725b060 (with patch)
3.55user 2.33system 1:54.77elapsed 5%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (29616major+63236minor)pagefaults 0swaps

Test #4: same as #1, "strace -c -e lstat"

* v1.6.0-rc2
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000247           0     23993           lstat

* v1.6.0-rc1-73-g725b060 (with patch)
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000261           0     23993           lstat

Test #5: same as #2, "strace -c -e lstat"

* v1.6.0-rc2
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000639           0     23993           lstat

* v1.6.0-rc1-73-g725b060 (with patch)
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000427           0     23993           lstat

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Junio C Hamano
Junio C Hamano <[hidden email]> writes:

> Linus Torvalds <[hidden email]> writes:
>
>> On Mon, 4 Aug 2008, Junio C Hamano wrote:
>>>
>>> I started to revisit this issue and patched "git update-index --add"
>>> and "git add" so far.  Patches follow.
>>
>> Patches look good to me, but did you check the performance impact?
>>
>> The rewritten 'has_symlink_leading_path()' should do ok, but it migth
>> still be a huge performance downside to check all the paths for things
>> like "git add -u".
>
> I couldn't quite measure much meaningful performance impact.
>
> My test repository was the kernel tree at v2.6.27-rc2-20-g685d87f, without
> any build products nor editor temporary crufts.
>
> By the way, if anybody wants to reproduce this, be careful with the tests
> that run "rm -f .git/index" before adding everything.  After doing so, if
> you check the result with "git diff --stat HEAD", you will notice many
> missing files --- I almost got a heart attack before inspecting this file:
>
> $ cat arch/powerpc/.gitignore
>         include
>
> Yes, it excludes 261 already tracked files.  Is it intended?  I dunno.
> ...

Seeing that you applied my arch/powerpc/.gitignore patch to the kernel
(Yaay, I now have a short-log entry in the kernel history ;-), you have
seen the message with some benchmarks I am replying to as well?

I actually was expecting to see measurable slowdown by checking leading
symbolic link more often, but I couldn't, which means either there is not
much downside that the bugfix would hurt real-life usage, or I was too
incompetent to come up with an example to show that the bugfix actually
hurts the performance.

If the former, I think it is a very good news (If the latter, well, we
have a bigger problem :-<), as the affected codepaths are exactly the ones
that need to be fixed when somebody tries to add files in a submodule
directory by mistake, which is an issue (iirc) Dscho had a separate patch
to fix earlier.  Tracked symbolic links may be rare, but it seems that
many people are using submodules, and that case cannot be hand-waved
around by calling it is a feature.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Not going beyond symbolic links

Linus Torvalds-3


On Fri, 8 Aug 2008, Junio C Hamano wrote:
>
> Seeing that you applied my arch/powerpc/.gitignore patch to the kernel
> (Yaay, I now have a short-log entry in the kernel history ;-), you have
> seen the message with some benchmarks I am replying to as well?

Yes, I just felt that closed the discussion.

I only brought up the issue in the first place because I worried about the
performance impact. And I was unhappy about how that worry was dismissed
as being less important than some specious "correctness" issue (for the
last 3+ years, performance has mattered a _lot_, and the claimed big
"correctness" issue has not mattered one whit).

The thing is, sometimes "pi = 3.14" is (a) infinitely faster than the
"correct" answer and (b) the difference between the "correct" and the
"wrong" answer is meaningless. And this is why I get upset when somebody
dismisses performance issues based on "correctness".

The thing is, some specious value of "correctness" is often irrelevant
because it doesn't matter. While performance almost _always_ matters. And
I absolutely _detest_ the fact that people so often dismiss performance
concerns so readily.

But once the performance numbers are in and they don't show any issues, I
think that simply settles the original query, and I'm happy.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loading...