|
Junio C Hamano <[hidden email]> writes:
> A symlink to us is just a different kind of blob, and by definition a blob > is the leaf level of a tree structure that represents the working tree in > the index. There won't be anything hanging below it, and when adding > things to the index we should not dereference the symlink to see where it > leads to. > > Traditionally we have been loose about this check, and the normal "git > add" and "git update-index" codepath is still forever broken, and we > allow: > > $ mkdir dir > $ >dir/file > $ ln -s dir symlink > $ git add symlink/file > > but some codepaths that matter more (because they do larger damage > unattended, as opposed to the above command sequence that can be avoided > by user education a bit more easily), such as "git apply" and "git > read-tree", have been corrected using has_symlink_leading_path() since mid > 2007. We would need to follow through c40641b (Optimize symlink/directory > detection, 2008-05-09) and further fix "git add" and "git update-index" > codepaths to forbid the above command sequence. I started to revisit this issue and patched "git update-index --add" and "git add" so far. Patches follow. I think we should also check the following and fix them if any of them is broken. Under the same "dir/file exists but symlink points at dir" scenario: * "git rm symlink/file" --- should fail without removing dir/file; * "git ls-tree $commit -- symlink/file" should *not* fail if $commit does have "symlink/file" in it (iow, we cannot add the logic to get_pathspec()); * "git ls-files --exclude-standard -o -- symlink/file" should not talk about "symlink/file". If we reword the paragraph I quoted at the beginning of this message slightly: A gitlink to is is just a leaf level of a tree structure that represents the working tree in the index. There won't be anything hanging below it. We would need a similar check to stop at module boundary, just like the helper function we use for these patches, has_symlink_leading_path(), stops at a symlink. So without further ado... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
When "sym" is a symbolic link that is inside the working tree, and it
points at a directory "dir" that has "path" in it, "update-index --add sym/path" used to mistakenly add "sym/path" as if "sym" were a normal directory. "git apply", "git diff" and "git merge" have been taught about this issue some time ago, but "update-index" and "add" have been left ignorant for too long. Signed-off-by: Junio C Hamano <[hidden email]> --- builtin-update-index.c | 5 ++++- t/t0055-beyond-symlinks.sh | 20 ++++++++++++++++++++ 2 files changed, 24 insertions(+), 1 deletions(-) create mode 100755 t/t0055-beyond-symlinks.sh diff --git a/builtin-update-index.c b/builtin-update-index.c index 38eb53c..434cb8e 100644 --- a/builtin-update-index.c +++ b/builtin-update-index.c @@ -194,6 +194,10 @@ static int process_path(const char *path) int len; struct stat st; + len = strlen(path); + if (has_symlink_leading_path(len, path)) + return error("'%s' is beyond a symbolic link", path); + /* * First things first: get the stat information, to decide * what to do about the pathname! @@ -201,7 +205,6 @@ static int process_path(const char *path) if (lstat(path, &st) < 0) return process_lstat_error(path, errno); - len = strlen(path); if (S_ISDIR(st.st_mode)) return process_directory(path, len, &st); diff --git a/t/t0055-beyond-symlinks.sh b/t/t0055-beyond-symlinks.sh new file mode 100755 index 0000000..eb11dd7 --- /dev/null +++ b/t/t0055-beyond-symlinks.sh @@ -0,0 +1,20 @@ +#!/bin/sh + +test_description='update-index refuses to add beyond symlinks' + +. ./test-lib.sh + +test_expect_success setup ' + >a && + mkdir b && + ln -s b c && + >c/d && + git update-index --add a b/d +' + +test_expect_success 'update-index --add beyond symlinks' ' + test_must_fail git update-index --add c/d && + ! ( git ls-files | grep c/d ) +' + +test_done -- 1.6.0.rc1.64.g61192 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Junio C Hamano
This is the same fix for the issue of adding "sym/path" when "sym" is a
symblic link that points at a directory "dir" with "path" in it. Signed-off-by: Junio C Hamano <[hidden email]> --- builtin-add.c | 12 +++++++++++- dir.c | 6 +++++- t/t0055-beyond-symlinks.sh | 7 ++++++- 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/builtin-add.c b/builtin-add.c index fc3f96e..81b64d7 100644 --- a/builtin-add.c +++ b/builtin-add.c @@ -153,6 +153,16 @@ static const char **validate_pathspec(int argc, const char **argv, const char *p { const char **pathspec = get_pathspec(prefix, argv); + if (pathspec) { + const char **p; + for (p = pathspec; *p; p++) { + if (has_symlink_leading_path(strlen(*p), *p)) { + int len = prefix ? strlen(prefix) : 0; + die("'%s' is beyond a symbolic link", *p + len); + } + } + } + return pathspec; } @@ -278,7 +288,7 @@ int cmd_add(int argc, const char **argv, const char *prefix) fprintf(stderr, "Maybe you wanted to say 'git add .'?\n"); return 0; } - pathspec = get_pathspec(prefix, argv); + pathspec = validate_pathspec(argc, argv, prefix); /* * If we are adding new files, we need to scan the working diff --git a/dir.c b/dir.c index 29d1d5b..ae7046f 100644 --- a/dir.c +++ b/dir.c @@ -727,8 +727,12 @@ static void free_simplify(struct path_simplify *simplify) int read_directory(struct dir_struct *dir, const char *path, const char *base, int baselen, const char **pathspec) { - struct path_simplify *simplify = create_simplify(pathspec); + struct path_simplify *simplify; + if (has_symlink_leading_path(strlen(path), path)) + return dir->nr; + + simplify = create_simplify(pathspec); read_directory_recursive(dir, path, base, baselen, 0, simplify); free_simplify(simplify); qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name); diff --git a/t/t0055-beyond-symlinks.sh b/t/t0055-beyond-symlinks.sh index eb11dd7..b29c37a 100755 --- a/t/t0055-beyond-symlinks.sh +++ b/t/t0055-beyond-symlinks.sh @@ -1,6 +1,6 @@ #!/bin/sh -test_description='update-index refuses to add beyond symlinks' +test_description='update-index and add refuse to add beyond symlinks' . ./test-lib.sh @@ -17,4 +17,9 @@ test_expect_success 'update-index --add beyond symlinks' ' ! ( git ls-files | grep c/d ) ' +test_expect_success 'add beyond symlinks' ' + test_must_fail git add c/d && + ! ( git ls-files | grep c/d ) +' + test_done -- 1.6.0.rc1.64.g61192 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Junio C Hamano
On Mon, 4 Aug 2008, Junio C Hamano wrote: > > I started to revisit this issue and patched "git update-index --add" > and "git add" so far. Patches follow. Patches look good to me, but did you check the performance impact? The rewritten 'has_symlink_leading_path()' should do ok, but it migth still be a huge performance downside to check all the paths for things like "git add -u". I didn't test, though. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
Linus Torvalds <[hidden email]> writes:
> On Mon, 4 Aug 2008, Junio C Hamano wrote: >> >> I started to revisit this issue and patched "git update-index --add" >> and "git add" so far. Patches follow. > > Patches look good to me, but did you check the performance impact? > > The rewritten 'has_symlink_leading_path()' should do ok, but it migth > still be a huge performance downside to check all the paths for things > like "git add -u". Not yet. I think this is a necessary "correctness" thing to do regardless of the performance impact, and adding the logic to stop at submodule boundary (aka gitlinks) should come before optimization. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
On Mon, 4 Aug 2008, Junio C Hamano wrote: > > > > The rewritten 'has_symlink_leading_path()' should do ok, but it migth > > still be a huge performance downside to check all the paths for things > > like "git add -u". > > Not yet. > > I think this is a necessary "correctness" thing to do regardless of the > performance impact, and adding the logic to stop at submodule boundary > (aka gitlinks) should come before optimization. Well, "performance" is a feature too, and it's not correct to say that "X should be fixed before optimization". If "X" slows things down, the question should be whether it really needs fixing.. Yes, we find symlinks when we do _new_ files, but is it really so bad to assume that existing directories that we have already added to the index are stable? It can easily be seen as a feature too that you can force git to ignore the symlink and see it as a real directory. So that's why it would be interesting to hear about the performance impact. Because this is definitely not a black-and-white "one behavior is wrong and one behavior is right". Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
Hi,
On Mon, 4 Aug 2008, Linus Torvalds wrote: > On Mon, 4 Aug 2008, Junio C Hamano wrote: > > > > > > The rewritten 'has_symlink_leading_path()' should do ok, but it > > > migth still be a huge performance downside to check all the paths > > > for things like "git add -u". > > > > Not yet. > > > > I think this is a necessary "correctness" thing to do regardless of > > the performance impact, and adding the logic to stop at submodule > > boundary (aka gitlinks) should come before optimization. > > Well, "performance" is a feature too, and it's not correct to say that > "X should be fixed before optimization". If "X" slows things down, the > question should be whether it really needs fixing.. > > Yes, we find symlinks when we do _new_ files, but is it really so bad to > assume that existing directories that we have already added to the index > are stable? It can easily be seen as a feature too that you can force git > to ignore the symlink and see it as a real directory. Actually, whatever you want, it needs fixing. I vividly remember being quite pissed by Git replacing a symbolic link in my working directory with a directory, and instead of updating the files which were technically outside of the repository, Git populated that newly created directory. However, please note that Junio's patch affects git-add, AFAIR, not git-update-index. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
On Tue, 5 Aug 2008, Johannes Schindelin wrote: > > I vividly remember being quite pissed by Git replacing a symbolic link in > my working directory with a directory, and instead of updating the files > which were technically outside of the repository, Git populated that newly > created directory. Well, that can cut both ways. For example, I vividly remember a time in the distant past when harddisks were tiny, and I didn't have insanely high-end hardware, and I was building the X server, but had to split things up over two partitions because each individual partition was too full. IOW, sometimes you may _want_ to use symlinks that way, even within one project - with a symlink allowing you to move parts of it around "transparently". Of course, these days under Linux we can just use bind mounts, so the use of symlinks to stitch together two or more different trees is fairly old-fashioned, but is still the only option on some systems or if you don't have root. (These days harddisks are also generally so big that it never happens. But on my EeePC laptop, I still end up with two filesystems, 4GB and 8GB each. So it's not inconceivable to be in that kind of situation even today). Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Linus Torvalds-3
Linus Torvalds <[hidden email]> writes:
> ... Because this is definitely not a black-and-white "one behavior is > wrong and one behavior is right". I wish I could agree with you that this is a feature, but 16a4c61 (read-tree -m -u: avoid getting confused by intermediate symlinks., 2007-05-10) and 64cab59 (apply: do not get confused by symlinks in the middle, 2007-05-11) came from real world breakage cases and the root cause was that we were too lenient to allow such a "feature" that pretends the symlink not to be there. Right now, we are being careful only while branch switching and patch application, but the codepaths that add directly to the index (add and update-index) are not fixed (or "still has the feature"). I do not see a clean way to keep such a "feature" without hurting users who suffered the bugs these two commits from May 2007 fixed. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
On Mon, 4 Aug 2008, Junio C Hamano wrote:
> Linus Torvalds <[hidden email]> writes: > >> ... Because this is definitely not a black-and-white "one behavior is >> wrong and one behavior is right". > > I wish I could agree with you that this is a feature, but 16a4c61 > (read-tree -m -u: avoid getting confused by intermediate symlinks., > 2007-05-10) and 64cab59 (apply: do not get confused by symlinks in the > middle, 2007-05-11) came from real world breakage cases and the root cause > was that we were too lenient to allow such a "feature" that pretends the > symlink not to be there. > > Right now, we are being careful only while branch switching and patch > application, but the codepaths that add directly to the index (add and > update-index) are not fixed (or "still has the feature"). > > I do not see a clean way to keep such a "feature" without hurting users > who suffered the bugs these two commits from May 2007 fixed. config option? I think a command line is too much work for too little value, but if the check could be ignored based on a config option without costing too much it may be reasonable. David Lang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Johannes Schindelin
Johannes Schindelin <[hidden email]> writes:
> However, please note that Junio's patch affects git-add, AFAIR, not > git-update-index. Really? I thought I added a test case to cover the plumbing as well... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Linus Torvalds-3
Linus Torvalds <[hidden email]> writes:
> IOW, sometimes you may _want_ to use symlinks that way, even within one > project - with a symlink allowing you to move parts of it around > "transparently". While I admit that I have managed a large directory split across partitions grafted via symlinks in pre-git days myself, ever since you started "tracking" symbolic links with 8ae0a8c (git and symlinks as tracked content, 2005-05-05), you have pretty much been committed to "track" symbolic links. This goes even before that commit. The readdir() loop done in show-files.c with 8695c8b (Add "show-files" command to show the list of managed (or non-managed) files., 2005-04-11) does not dereference symbolic links pointing at a directory elsewhere, which we still have as read_directory() in dir.c without much change in the basic structure. I would give some leeway to other people who made comments in this thread, who may not be so familiar with the low-level git codebase, but I have to say that if you claim that dereferencing symbolic links in the middle is a feature, you are not being completely honest, you haven't thought through the issues, and/or you simply forgot the details. I'd suspect most likely it is the last one ;-). The thing is, the "feature" is not very well supported, even without the fixes from last night. If you have a symlink "sym" that points at "dir" that has "file" in it, and if neither "sym" nor "dir/file" are tracked, you can "git add sym/file" to add it (I called it a bug). However: (1) starting from the same condition, "git add ." does _not_ add it (you get the symbolic link "sym" added to your index instead, as well as "dir/file"); (2) after you add "sym/file" through the bug, if you say "git add .", it will be removed from the index and you will instead have "sym" and "dir/file" (with an ancient git before 1.5.0, you will get "unable to add sym" error instead). (3) after you add "sym/file", "git diff" will immediately notice that you have removed it (this is a fairly recent fix; 1.5.4.X doesn't notice it). You cannot have it as a reliably usable feature without a major surgery, and this is fundamental. You simply cannot have it both ways without telling git which symlink is "tracked" and which are only there for storage sizing. If you seriously want to claim that we support such a feature, you would at least need to: (0) have a way for the user to say, "the project tree may have a directory D, but I do not want to check it out as a directory because my partition is too small. Whenever you need to create a directory there and hang a tree underneath, instead create a symlink that points at /export/large/D instead". Most likely this information would go to .git/config; (1) whereever we run "create leading directories", we notice and honor the above configuration (mostly entry.c::create_directories() called from entry.c::checkout_entry()); (2) whenever we need to check out a file to path D, instead of recursively remove everything under it, we remove the symlink and deposit the file there (mostly entry.c::remove_subtree() and unpack-trees.c::verify_absent()); (3) whenever we traverse working tree using readdir(), notice that the symbolic link we are looking at is the funny "pointing elsewhere but this is really a directory" specified in (0) and recurse into the directory pointed by it (dir.c::read_directory_recursive() but there may be others) (4) and we teach has_symlink_leading_path() to special case such a path you configured in (0). I personally do not think adding these to support such a "feature" is such a high priority, and I do not think it is honest to claim we support such a feature without doing any of the above. The current reality is that our symlink support is still broken in corner cases, and being able to easily add "sym/path" via "git add" and "git update-index --add" is one of them. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Junio C Hamano
Hi,
On Mon, 4 Aug 2008, Junio C Hamano wrote: > Johannes Schindelin <[hidden email]> writes: > > > However, please note that Junio's patch affects git-add, AFAIR, not > > git-update-index. > > Really? I thought I added a test case to cover the plumbing as well... Right, I missed that one. I kinda hoped that only porcelain is affected. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Junio C Hamano
On Mon, Aug 04, 2008 at 11:11:11PM -0700, Junio C Hamano wrote:
> > The thing is, the "feature" is not very well supported, even without the > fixes from last night. If you have a symlink "sym" that points at "dir" > that has "file" in it, and if neither "sym" nor "dir/file" are tracked, > you can "git add sym/file" to add it (I called it a bug). Well, I actually used this feature less than a year ago and did not have any problem with that. Not that it is very important for me now, but I had makefiles in a separate build directory, and I used a symlink to move building to a tmpfs partion, which sped up building slightly. Obviously, if a symlink points to a directory inside of the repository and then you use "git add sym/file", it is definitely a mistake. OTOH, let's consider the following situation: git init mkdir newdir touch newdir/foo git add newdir/foo git commit -m 'add foo' mv newdir /tmp/ ln -s /tmp/newdir touch newdir/bar git add newdir/bar git commit -m 'add bar' git ls-files And you can see: newdir/bar newdir/foo Git does exactly what I want here! Anyway, I more concern with performance impact of your patch. If it is noticeable, then it would be nice to have an option to turn this check off. Dmitry -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Junio C Hamano
On Mon, 4 Aug 2008, Junio C Hamano wrote: > > While I admit that I have managed a large directory split across > partitions grafted via symlinks in pre-git days myself, ever since you > started "tracking" symbolic links with 8ae0a8c (git and symlinks as > tracked content, 2005-05-05), you have pretty much been committed to > "track" symbolic links. Yes, but my point is: - IF the cost is exorbitant (which was my question that triggered this: it sure as h*ll _would_ have been too high back in the days of the original symlink-matching code) ... - ... then there really are valid cases that say that we could just call it a feature. That's really my whole argument. Saying that "ok, we will assume that existing paths that git already knows about are stable" is not really an odd feature. It's one we have lived with for years, and it's one that is actually pretty dang trivial to work with. Yes, it can cause unexpected behaviour, but if you hit it, that unexpected behavior is actually not _that_ hard to work around. This is all I'm arguing for. People claim that there are 'correctness' issues. I say that 'performance issues' are important, and we can and we _should_ take performance issues very seriously. So seriously that we'll make performance a _feature_ if required. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Dmitry Potapov
Dmitry Potapov <[hidden email]> writes:
> Obviously, if a symlink points to a directory inside of the repository > and then you use "git add sym/file", it is definitely a mistake. OTOH, > let's consider the following situation: > > git init > mkdir newdir > touch newdir/foo > git add newdir/foo > git commit -m 'add foo' > mv newdir /tmp/ > ln -s /tmp/newdir > touch newdir/bar > git add newdir/bar > git commit -m 'add bar' > git ls-files > > And you can see: > newdir/bar > newdir/foo > > Git does exactly what I want here! Now, do this after the above sequence: git diff git add . && git ls-files and tell me what you see. Does it show exactly what you want here? I've already told you what kind of changes you would need to make if you really want to claim this is a feature. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
In reply to this post by Linus Torvalds-3
Linus Torvalds <[hidden email]> writes:
> On Mon, 4 Aug 2008, Junio C Hamano wrote: >> >> I started to revisit this issue and patched "git update-index --add" >> and "git add" so far. Patches follow. > > Patches look good to me, but did you check the performance impact? > > The rewritten 'has_symlink_leading_path()' should do ok, but it migth > still be a huge performance downside to check all the paths for things > like "git add -u". I couldn't quite measure much meaningful performance impact. My test repository was the kernel tree at v2.6.27-rc2-20-g685d87f, without any build products nor editor temporary crufts. By the way, if anybody wants to reproduce this, be careful with the tests that run "rm -f .git/index" before adding everything. After doing so, if you check the result with "git diff --stat HEAD", you will notice many missing files --- I almost got a heart attack before inspecting this file: $ cat arch/powerpc/.gitignore include Yes, it excludes 261 already tracked files. Is it intended? I dunno. The file is there since 06f2138 ([POWERPC] Add files build to .gitignore, 2006-11-26). Not that having tracked files in an entirely ignored directory is a bug, but the .gitignore entry seems to me an Oops waiting to happen. There are three commits that affect this directory that is entirely ignored after that entry is added: b5b9309 (remove unnecessary <linux/hdreg.h> includes, 2008-08-05) 9c4cb82 (powerpc: Remove use of CONFIG_PPC_MERGE, 2008-08-02) b8b572e (powerpc: Move include files to arch/powerpc/include/asm, 2008-08-01) Anyhow, back on topic. Here are the numbers. Test #1: With fully up-to-date .git/index, "git add .", best of 5 runs * v1.6.0-rc2 0.20user 0.20system 0:00.41elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+3622minor)pagefaults 0swaps * v1.6.0-rc1-73-g725b060 (with patch) 0.22user 0.19system 0:00.41elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+3635minor)pagefaults 0swaps Test #2: After "rm -f .git/index", "git add .", best of 5 runs * v1.6.0-rc2 1.81user 0.55system 0:02.51elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+92855minor)pagefaults 0swaps * v1.6.0-rc1-73-g725b060 (with patch) 1.76user 0.56system 0:02.58elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+92852minor)pagefaults 0swaps Test #3: same as #2, after dropping cache with "echo 3 > /proc/sys/vm/drop_caches". (Yes, I have slow disks). * v1.6.0-rc2 3.43user 2.29system 2:02.45elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (29611major+63244minor)pagefaults 0swaps * v1.6.0-rc1-73-g725b060 (with patch) 3.55user 2.33system 1:54.77elapsed 5%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (29616major+63236minor)pagefaults 0swaps Test #4: same as #1, "strace -c -e lstat" * v1.6.0-rc2 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000247 0 23993 lstat * v1.6.0-rc1-73-g725b060 (with patch) % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000261 0 23993 lstat Test #5: same as #2, "strace -c -e lstat" * v1.6.0-rc2 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000639 0 23993 lstat * v1.6.0-rc1-73-g725b060 (with patch) % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000427 0 23993 lstat -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
Junio C Hamano <[hidden email]> writes:
> Linus Torvalds <[hidden email]> writes: > >> On Mon, 4 Aug 2008, Junio C Hamano wrote: >>> >>> I started to revisit this issue and patched "git update-index --add" >>> and "git add" so far. Patches follow. >> >> Patches look good to me, but did you check the performance impact? >> >> The rewritten 'has_symlink_leading_path()' should do ok, but it migth >> still be a huge performance downside to check all the paths for things >> like "git add -u". > > I couldn't quite measure much meaningful performance impact. > > My test repository was the kernel tree at v2.6.27-rc2-20-g685d87f, without > any build products nor editor temporary crufts. > > By the way, if anybody wants to reproduce this, be careful with the tests > that run "rm -f .git/index" before adding everything. After doing so, if > you check the result with "git diff --stat HEAD", you will notice many > missing files --- I almost got a heart attack before inspecting this file: > > $ cat arch/powerpc/.gitignore > include > > Yes, it excludes 261 already tracked files. Is it intended? I dunno. > ... Seeing that you applied my arch/powerpc/.gitignore patch to the kernel (Yaay, I now have a short-log entry in the kernel history ;-), you have seen the message with some benchmarks I am replying to as well? I actually was expecting to see measurable slowdown by checking leading symbolic link more often, but I couldn't, which means either there is not much downside that the bugfix would hurt real-life usage, or I was too incompetent to come up with an example to show that the bugfix actually hurts the performance. If the former, I think it is a very good news (If the latter, well, we have a bigger problem :-<), as the affected codepaths are exactly the ones that need to be fixed when somebody tries to add files in a submodule directory by mistake, which is an issue (iirc) Dscho had a separate patch to fix earlier. Tracked symbolic links may be rare, but it seems that many people are using submodules, and that case cannot be hand-waved around by calling it is a feature. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
On Fri, 8 Aug 2008, Junio C Hamano wrote: > > Seeing that you applied my arch/powerpc/.gitignore patch to the kernel > (Yaay, I now have a short-log entry in the kernel history ;-), you have > seen the message with some benchmarks I am replying to as well? Yes, I just felt that closed the discussion. I only brought up the issue in the first place because I worried about the performance impact. And I was unhappy about how that worry was dismissed as being less important than some specious "correctness" issue (for the last 3+ years, performance has mattered a _lot_, and the claimed big "correctness" issue has not mattered one whit). The thing is, sometimes "pi = 3.14" is (a) infinitely faster than the "correct" answer and (b) the difference between the "correct" and the "wrong" answer is meaningless. And this is why I get upset when somebody dismisses performance issues based on "correctness". The thing is, some specious value of "correctness" is often irrelevant because it doesn't matter. While performance almost _always_ matters. And I absolutely _detest_ the fact that people so often dismiss performance concerns so readily. But once the performance numbers are in and they don't show any issues, I think that simply settles the original query, and I'm happy. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
| Powered by Nabble | Edit this page |
