|
Hi,
This is prompted by a recent lkml discussion[1] which is also backed up by old stuff threads like[2] where Theodore Ts'o writes about spd_readdir. As explained at the links, ext3&4 seem to show suboptimal performance if you do the common pattern of lstat()'ing everything returned by readdir() and the corresponding inodes must be fetched from disk. They suggest (and spd_readdir does just this, in an LD_PRELOADable format) sorting the entries by inode. Let me emphasise: inodes fetched from disk. This does not matter at all for the cached case. Probably this also applies to us in the search for untracked files, but right now I don't care about that too much because I have git-status configured not to show them. However, a similar trick could be applied to lstat() across the index, which *is* a big part of the work required to accurately display repository status in git-status and git-diff. The dirty hack below uses the inode field in the index to sort the index entries prior to the real work of refresh_index. I have been able to measure a significant (in the statistical sense) speedup using measurements like ( for i in $(seq 1 30); do sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches' /usr/bin/time -f "time %U %S %E" ~/dev/git/git-status 2>&1 | grep time done ) | tee results-patched In numbers, across 30 trials for unpatched and patched git, my time needed to get a cache-cold 'git status' went from 4.24s to 3.97s on average, at p=0.006. Note that this only has an effect if your directory has the inodes all jumbled. I tried doing bigger trials, but I do not have a realistic work repository bigger than git.git. You can look at the lstat() order you are currently getting with strace -e lstat -v git status 2>&1 | egrep -o 'st_ino=[0-9]+' With the patch it should be in good sorted order. Note, however, that at the very end it also runs lstat() on .git/refs/*; those will still be out of order. So I'd be interested to hear success (or non-success) stories from people who are actively working with larger repos, perhaps linux-2.6 or your favourite corporate repo. The lkml posts also seem to say that it only matters on ext3&4, not on btrfs, but perhaps other FSes also suffer in this area. I'd also be interested to hear from the refresh_index experts whether my change works in principle, or is already flawed in some major way. You will note I sort by (inode,stage) to handle the search forward for the highest stage entry, but maybe this is not the only twist. (As written it is also not thread-safe.) I did run the test suite and it passes, so it can't be *that* bad. Footnotes: [1] https://lkml.org/lkml/2012/2/29/210 [2] http://lkml.indiana.edu/hypermail/linux/kernel/0802.2/1076.html ---- 8< ---- diff --git i/read-cache.c w/read-cache.c index 274e54b..b0d1942 100644 --- i/read-cache.c +++ w/read-cache.c @@ -1092,10 +1092,28 @@ static void show_file(const char * fmt, const char * name, int in_porcelain, printf(fmt, name); } +static struct index_state *istate_for_cmp; + +int cmp_by_inode(const void *a, const void *b) +{ + struct cache_entry *ca, *cb; + + ca = istate_for_cmp->cache[*(const int *)a]; + cb = istate_for_cmp->cache[*(const int *)b]; + + if (ca->ce_ino < cb->ce_ino) + return -1; + if (ca->ce_ino > cb->ce_ino) + return 1; + if (ce_stage(ca) < ce_stage(cb)) + return -1; + return 1; +} + int refresh_index(struct index_state *istate, unsigned int flags, const char **pathspec, char *seen, const char *header_msg) { - int i; + int i, j; int has_errors = 0; int really = (flags & REFRESH_REALLY) != 0; int allow_unmerged = (flags & REFRESH_UNMERGED) != 0; @@ -1110,18 +1128,28 @@ int refresh_index(struct index_state *istate, unsigned int flags, const char **p const char *typechange_fmt; const char *added_fmt; const char *unmerged_fmt; + int *by_inode_idx; modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n"); deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n"); typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n"); added_fmt = (in_porcelain ? "A\t%s\n" : "%s needs update\n"); unmerged_fmt = (in_porcelain ? "U\t%s\n" : "%s: needs merge\n"); - for (i = 0; i < istate->cache_nr; i++) { + + by_inode_idx = xmalloc(istate->cache_nr * sizeof(int)); + for (i = 0; i < istate->cache_nr; i++) + by_inode_idx[i] = i; + istate_for_cmp = istate; + qsort(by_inode_idx, istate->cache_nr, sizeof(int), cmp_by_inode); + + for (j = 0; j < istate->cache_nr; j++) { struct cache_entry *ce, *new; int cache_errno = 0; int changed = 0; int filtered = 0; + i = by_inode_idx[j]; + ce = istate->cache[i]; if (ignore_submodules && S_ISGITLINK(ce->ce_mode)) continue; -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
Thomas Rast <[hidden email]> writes:
> The dirty hack below uses the inode field in the index to sort the index > entries prior to the real work of refresh_index. I think we do something similar in fsck to touch loose object files in clusters, and would not be surprised if you see performance boost by running lstat(2) in clusters for the purpose of refresh_index(). Right now, preload_index() partitions the index entries by offset and hands them out to worker threads, but it _might_ make sense to use fewer number of threads and give them entries sorted by the inum. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html |
| Powered by Nabble | Edit this page |
