Consistent terminology: cached/staged/index

classic Classic list List threaded Threaded
65 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Felipe Contreras
On Mon, Feb 14, 2011 at 6:04 PM, Michael J Gruber
<[hidden email]> wrote:

> Felipe Contreras venit, vidit, dixit 14.02.2011 17:00:
>> Except 'git branch', 'git tag', 'git remote', 'git stash', and 'git
>> submodule'. In fact, every logical object in git seems to have their
>> own command, except the stage.
>
> Yes, remote, stash and submodule are the ones with the different
> subcommand handling I mentioned: the subcommand is the verb, and
> specified undashed.
>
> We have other commands with double-dashed (i.e. option) subcommands,
> such as "brach --set-upstream", and others single-dashed, such as "tag -v".
>
> Note that branch, tag and stash are verbs as well as nouns.

So is stage.

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Junio C Hamano
In reply to this post by Miles Bader-2
Miles Bader <[hidden email]> writes:

> Michael J Gruber <[hidden email]> writes:
>> Short options should really not be "wasted" easily. "-s" named after "to
>> stage" is really problematic, as outlined in this thread.
>
> Er, but the point is that this is _such_ a common operation, that a
> short option for it would not be "wasted" at all.

True, but I am afraid "-c" is not it, as it would certainly be confusing
to users who know what "diff" does before they learn "git diff".

And I'd like to also keep "-i" open for "ignore case", which I actually
wished the other day while reviewing a topic.  Unlike "-c", I might
implement it myself not in a distant future when I find time.

Using "-I" (as an abbreviation for "index-only") is tempting, though.

Both "-i" and "-I" are GNU extensions, and the latter traditionally was
useful primarily to ignore cruft left in the file with use of "$Id$", but
we actively discourage its use in git controlled projects, so taking it
over might not be such a big issue.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Miles Bader-2
On Tue, Feb 15, 2011 at 2:12 AM, Junio C Hamano <[hidden email]> wrote:

> Miles Bader <[hidden email]> writes:
>> Michael J Gruber <[hidden email]> writes:
>>> Short options should really not be "wasted" easily. "-s" named after "to
>>> stage" is really problematic, as outlined in this thread.
>>
>> Er, but the point is that this is _such_ a common operation, that a
>> short option for it would not be "wasted" at all.
>
> True, but I am afraid "-c" is not it, as it would certainly be confusing
> to users who know what "diff" does before they learn "git diff".

Er...?

Here we were talking about using "-s" (inspired by "--staged"), which
I suggested because you earlier objected to "-c"...

-miles

--
Cat is power.  Cat is peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Piotr Krukowiecki
In reply to this post by Junio C Hamano
On Sun, Feb 13, 2011 at 11:58 PM, Junio C Hamano <[hidden email]> wrote:
> Jonathan Nieder <[hidden email]> writes:
[...]

Thanks for the explanation.

My point is:
1. using multiple terms is confusing
2. using not descriptive terms is confusing (or at least increases learning
   curve)

Ideally only one should be used - the rest should be obsoleted/hidden from
end user.

Example from git-status:

   - 'git status' outputs <<use "git reset HEAD <file>..." to unstage>>
     But in the man page there is nothing about staging!

   - the output does not mention "index" at all - only files tracked,
     untracked, to be committed

   - man page talks about index or index file exclusively, e.g.:
     "differences between the index file and the current HEAD commit",
     "updated in index", "added to index"


In other places "index" is called "staging area" and act of updating the index
is called "staging in the index".

I ask: why do we need the "index" term at all?

   - instead of "index" use "staging" and "staging area"
   - instead of "listed in index" use "staged" or "tracked"

What is used internally is one thing, but what the end user (not git developer)
sees does not have to be related.

(I'm not sure about the "tracked vs staged" - maybe we should again get rid of
one of them, at least in some cases.)

In fact it's not that important how it is called, as long as it meets the
points from the beginning of the mail.


As you can see I'm advocating for the use of the "staging" term after all.
I'm new to git and a non-native English speaker. "Staging" seems most clear of
all of the terms. You may find it differently, but please take into
consideration that you are accustomed to it.

"Staging" gives me the feeling of changing states - from working tree to
real commit - which I believe is the purpose of it.


"Caching" means something used e.g. to improve performance. You can read the
cache, update it using original item - but the cache is just a function of the
original content.

Probably most common place when users meet "cache" is browser cache. You
clear the cache, you set the limit of cache size, but you don't expect it to
be important. Definitely unlike "cache" in git.


I didn't like the "index" at all. At first I could not understand why did you
have chosen such name. Additionally in many places it's called "index file".
It increased the confusion - why would I care if it's a file or not?

Now I see you can understand it as indexing files that should be managed by git,
or indexing changes to be introduced. But I still like  "staging" better.


I've updated docs for several basic commands to see how would it feel to have
"staging" area instead of "index file" - and it's not bad IMO. It was basically
automatic search&replace, so the result can be improved.



-- 8< --
From: Piotr Krukowiecki <[hidden email]>
Date: Mon, 14 Feb 2011 23:20:07 +0100
Subject: [PATCH] Changed index term to staging area

---
 Documentation/git-add.txt    |   66 +++++++++++++++++++++---------------------
 Documentation/git-apply.txt  |   40 ++++++++++++------------
 Documentation/git-commit.txt |   14 ++++----
 Documentation/git-diff.txt   |   22 +++++++-------
 Documentation/git-status.txt |   22 +++++++-------
 5 files changed, 82 insertions(+), 82 deletions(-)

diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
index a03448f..54a50b7 100644
--- a/Documentation/git-add.txt
+++ b/Documentation/git-add.txt
@@ -3,7 +3,7 @@ git-add(1)

 NAME
 ----
-git-add - Add file contents to the index
+git-add - Add file contents to the staging area

 SYNOPSIS
 --------
@@ -15,23 +15,23 @@ SYNOPSIS

 DESCRIPTION
 -----------
-This command updates the index using the current content found in
+This command updates the staging area using the current content found in
 the working tree, to prepare the content staged for the next commit.
 It typically adds the current content of existing paths as a whole,
 but with some options it can also be used to add content with
 only part of the changes made to the working tree files applied, or
 remove paths that do not exist in the working tree anymore.

-The "index" holds a snapshot of the content of the working tree, and it
+The "staging area" holds a snapshot of the content of the working tree, and it
 is this snapshot that is taken as the contents of the next commit.  Thus
 after making any changes to the working directory, and before running
 the commit command, you must use the `add` command to add any new or
-modified files to the index.
+modified files to the staging area.

 This command can be performed multiple times before a commit.  It only
 adds the content of the specified file(s) at the time the add command is
 run; if you want subsequent changes included in the next commit, then
-you must run `git add` again to add the new content to the index.
+you must run `git add` again to add the new content to the staging area.

 The `git status` command can be used to obtain a summary of which
 files have changes that are staged for the next commit.
@@ -72,39 +72,39 @@ OPTIONS
 -i::
 --interactive::
  Add modified contents in the working tree interactively to
- the index. Optional path arguments may be supplied to limit
+ the staging area. Optional path arguments may be supplied to limit
  operation to a subset of the working tree. See ``Interactive
  mode'' for details.

 -p::
 --patch::
- Interactively choose hunks of patch between the index and the
- work tree and add them to the index. This gives the user a chance
+ Interactively choose hunks of patch between the staging area and the
+ work tree and add them to the staging area. This gives the user a chance
  to review the difference before adding modified contents to the
- index.
+ staging area.
 +
 This effectively runs `add --interactive`, but bypasses the
 initial command menu and directly jumps to the `patch` subcommand.
 See ``Interactive mode'' for details.

 -e, \--edit::
- Open the diff vs. the index in an editor and let the user
+ Open the diff vs. the staging area in an editor and let the user
  edit it.  After the editor was closed, adjust the hunk headers
- and apply the patch to the index.
+ and apply the patch to the staging area.
 +
 The intent of this option is to pick and choose lines of the patch to
 apply, or even to modify the contents of lines to be staged. This can be
 quicker and more flexible than using the interactive hunk selector.
 However, it is easy to confuse oneself and create a patch that does not
-apply to the index. See EDITING PATCHES below.
+apply to the staging area. See EDITING PATCHES below.

 -u::
 --update::
  Only match <filepattern> against already tracked files in
- the index rather than the working tree. That means that it
+ the staging area rather than the working tree. That means that it
  will never stage new files, but that it will stage modified
  new contents of tracked files and that it will remove files
- from the index if the corresponding files in the working tree
+ from the staging area if the corresponding files in the working tree
  have been removed.
 +
 If no <filepattern> is given, default to "."; in other words,
@@ -114,21 +114,21 @@ subdirectories.
 -A::
 --all::
  Like `-u`, but match <filepattern> against files in the
- working tree in addition to the index. That means that it
+ working tree in addition to the staging area. That means that it
  will find new files as well as staging modified content and
  removing files that are no longer in the working tree.

 -N::
 --intent-to-add::
  Record only the fact that the path will be added later. An entry
- for the path is placed in the index with no content. This is
+ for the path is placed in the staging area with no content. This is
  useful for, among other things, showing the unstaged content of
  such files with `git diff` and committing them with `git commit
  -a`.

 --refresh::
  Don't add the file(s), but only refresh their stat()
- information in the index.
+ information in the staging area.

 --ignore-errors::
  If some files could not be added because of errors indexing
@@ -205,8 +205,8 @@ The main command loop has 6 subcommands (plus help
and quit).

 status::

-   This shows the change between HEAD and index (i.e. what will be
-   committed if you say `git commit`), and between index and
+   This shows the change between HEAD and staging area (i.e. what will be
+   committed if you say `git commit`), and between staging area and
    working tree files (i.e. what you could stage further before
    `git commit` using `git add`) for each path.  A sample output
    looks like this:
@@ -219,11 +219,11 @@ status::
 +
 It shows that foo.png has differences from HEAD (but that is
 binary so line count cannot be shown) and there is no
-difference between indexed copy and the working tree
+difference between staged copy and the working tree
 version (if the working tree version were also different,
 'binary' would have been shown in place of 'nothing').  The
 other file, git-add{litdd}interactive.perl, has 403 lines added
-and 35 lines deleted if you commit what is in the index, but
+and 35 lines deleted if you commit what is in the staging area, but
 working tree file has further modifications (one addition and
 one deletion).

@@ -254,7 +254,7 @@ Update>> -2
 ------------
 +
 After making the selection, answer with an empty line to stage the
-contents of working tree files for selected paths in the index.
+contents of working tree files for selected paths in the staging area.

 revert::

@@ -265,12 +265,12 @@ revert::
 add untracked::

   This has a very similar UI to 'update' and
-  'revert', and lets you add untracked paths to the index.
+  'revert', and lets you add untracked paths to the staging area.

 patch::

   This lets you choose one path out of a 'status' like selection.
-  After choosing the path, it presents the diff between the index
+  After choosing the path, it presents the diff between the staging area
   and the working tree file and asks you if you want to stage
   the change of each hunk.  You can say:

@@ -290,12 +290,12 @@ patch::
        ? - print help
 +
 After deciding the fate for all hunks, if there is any hunk
-that was chosen, the index is updated with the selected hunks.
+that was chosen, the staging area is updated with the selected hunks.

 diff::

   This lets you review what will be committed (i.e. between
-  HEAD and index).
+  HEAD and staging area).


 EDITING PATCHES
@@ -303,10 +303,10 @@ EDITING PATCHES

 Invoking `git add -e` or selecting `e` from the interactive hunk
 selector will open a patch in your editor; after the editor exits, the
-result is applied to the index. You are free to make arbitrary changes
+result is applied to the staging area. You are free to make arbitrary changes
 to the patch, but note that some changes may have confusing results, or
 even result in a patch that cannot be applied.  If you want to abort the
-operation entirely (i.e., stage nothing new in the index), simply delete
+operation entirely (i.e., stage nothing new in the staging area), simply delete
 all lines of the patch. The list below describes some common things you
 may see in a patch, and which editing operations make sense on them.

@@ -327,13 +327,13 @@ Modified content is represented by "-" lines
(removing the old content)
 followed by "{plus}" lines (adding the replacement content). You can
 prevent staging the modification by converting "-" lines to " ", and
 removing "{plus}" lines. Beware that modifying only half of the pair is
-likely to introduce confusing changes to the index.
+likely to introduce confusing changes to the staging area.
 --

 There are also more complex operations that can be performed. But beware
-that because the patch is applied only to the index and not the working
-tree, the working tree will appear to "undo" the change in the index.
-For example, introducing a new line into the index that is in neither
+that because the patch is applied only to the staging area and not the working
+tree, the working tree will appear to "undo" the change in the staging area.
+For example, introducing a new line into the staging area that is in neither
 the HEAD nor the working tree will stage the new line for commit, but
 the line will appear to be reverted in the working tree.

@@ -342,7 +342,7 @@ Avoid using these constructs, or do so with extreme caution.
 --
 removing untouched content::

-Content which does not differ between the index and working tree may be
+Content which does not differ between the staging area and working tree may be
 shown on context lines, beginning with a " " (space).  You can stage
 context lines for removal by converting the space to a "-". The
 resulting working tree file will appear to re-add the content.
diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
index 881652f..9b5a037 100644
--- a/Documentation/git-apply.txt
+++ b/Documentation/git-apply.txt
@@ -3,16 +3,16 @@ git-apply(1)

 NAME
 ----
-git-apply - Apply a patch to files and/or to the index
+git-apply - Apply a patch to files and/or to the staging area


 SYNOPSIS
 --------
 [verse]
-'git apply' [--stat] [--numstat] [--summary] [--check] [--index]
+'git apply' [--stat] [--numstat] [--summary] [--check] [--staged]
   [--apply] [--no-add] [--build-fake-ancestor=<file>] [-R | --reverse]
   [--allow-binary-replacement | --binary] [--reject] [-z]
-  [-p<n>] [-C<n>] [--inaccurate-eof] [--recount] [--cached]
+  [-p<n>] [-C<n>] [--inaccurate-eof] [--recount] [--staged-only]
   [--ignore-space-change | --ignore-whitespace ]
   [--whitespace=(nowarn|warn|fix|error|error-all)]
   [--exclude=<path>] [--include=<path>] [--directory=<root>]
@@ -21,8 +21,8 @@ SYNOPSIS
 DESCRIPTION
 -----------
 Reads the supplied diff output (i.e. "a patch") and applies it to files.
-With the `--index` option the patch is also applied to the index, and
-with the `--cache` option the patch is only applied to the index.
+With the `--staged` option the patch is also applied to the staging area, and
+with the `--staged-only` option the patch is only applied to the staging area.
 Without these options, the command applies the patch only to files,
 and does not require them to be in a git repository.

@@ -55,32 +55,32 @@ OPTIONS

 --check::
  Instead of applying the patch, see if the patch is
- applicable to the current working tree and/or the index
- file and detects errors.  Turns off "apply".
+ applicable to the current working tree and/or the staging
+ area and detects errors.  Turns off "apply".

---index::
+--staged::
  When `--check` is in effect, or when applying the patch
  (which is the default when none of the options that
  disables it is in effect), make sure the patch is
- applicable to what the current index file records.  If
+ applicable to what the current staging area records.  If
  the file to be patched in the working tree is not
  up-to-date, it is flagged as an error.  This flag also
- causes the index file to be updated.
+ causes the staging area to be updated.

---cached::
+--staged-only::
  Apply a patch without touching the working tree. Instead take the
- cached data, apply the patch, and store the result in the index
- without using the working tree. This implies `--index`.
+ sttaged data, apply the patch, and store the result in the staging area
+ without using the working tree. This implies `--staged`.

 --build-fake-ancestor=<file>::
- Newer 'git diff' output has embedded 'index information'
+ Newer 'git diff' output has embedded 'staging area information'
  for each blob to help identify the original version that
  the patch applies to.  When this flag is given, and if
  the original versions of the blobs are available locally,
- builds a temporary index containing those blobs.
+ builds a temporary staging area containing those blobs.
 +
-When a pure mode change is encountered (which has no index information),
-the information is read from the current index instead.
+When a pure mode change is encountered (which has no staging area information),
+the information is read from the current staging area instead.

 -R::
 --reverse::
@@ -236,13 +236,13 @@ Submodules
 If the patch contains any changes to submodules then 'git apply'
 treats these changes as follows.

-If `--index` is specified (explicitly or implicitly), then the submodule
-commits must match the index exactly for the patch to apply.  If any
+If `--staged` is specified (explicitly or implicitly), then the submodule
+commits must match the staging area exactly for the patch to apply.  If any
 of the submodules are checked-out, then these check-outs are completely
 ignored, i.e., they are not required to be up-to-date or clean and they
 are not updated.

-If `--index` is not specified, then the submodule commits in the patch
+If `--staged` is not specified, then the submodule commits in the patch
 are ignored and only the absence or presence of the corresponding
 subdirectory is checked and (if possible) updated.

diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index b586c0f..728b2cf 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -16,26 +16,26 @@ SYNOPSIS

 DESCRIPTION
 -----------
-Stores the current contents of the index in a new commit along
+Stores the current contents of the staging area in a new commit along
 with a log message from the user describing the changes.

 The content to be added can be specified in several ways:

 1. by using 'git add' to incrementally "add" changes to the
-   index before using the 'commit' command (Note: even modified
+   staging area before using the 'commit' command (Note: even modified
    files must be "added");

 2. by using 'git rm' to remove files from the working tree
-   and the index, again before using the 'commit' command;
+   and the staging area, again before using the 'commit' command;

 3. by listing files as arguments to the 'commit' command, in which
-   case the commit will ignore changes staged in the index, and instead
+   case the commit will ignore changes staged in the staging area, and instead
    record the current content of the listed files (which must already
    be known to git);

 4. by using the -a switch with the 'commit' command to automatically
    "add" changes from all known files (i.e. all files that are already
-   listed in the index) and to automatically "rm" files in the index
+   tracked) and to automatically "rm" tracked files
    that have been removed from the working tree, and then perform the
    actual commit;

@@ -273,8 +273,8 @@ EXAMPLES
 --------
 When recording your own work, the contents of modified files in
 your working tree are temporarily stored to a staging area
-called the "index" with 'git add'.  A file can be
-reverted back, only in the index but not in the working tree,
+ with 'git add'.  A file can be
+reverted back, only in the staging area but not in the working tree,
 to that of the last commit with `git reset HEAD -- <file>`,
 which effectively reverts 'git add' and prevents the changes to
 this file from participating in the next commit.  After building
diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 4910510..eab118a 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -10,29 +10,29 @@ SYNOPSIS
 --------
 [verse]
 'git diff' [options] [<commit>] [--] [<path>...]
-'git diff' [options] --cached [<commit>] [--] [<path>...]
+'git diff' [options] --staged [<commit>] [--] [<path>...]
 'git diff' [options] <commit> <commit> [--] [<path>...]
-'git diff' [options] [--no-index] [--] <path> <path>
+'git diff' [options] [--not-staged] [--] <path> <path>

 DESCRIPTION
 -----------
-Show changes between the working tree and the index or a tree, changes
-between the index and a tree, changes between two trees, or changes
+Show changes between the working tree and the staging area or a tree, changes
+between the staging area and a tree, changes between two trees, or changes
 between two files on disk.

 'git diff' [--options] [--] [<path>...]::

  This form is to view the changes you made relative to
- the index (staging area for the next commit).  In other
+ the staging area for the next commit.  In other
  words, the differences are what you _could_ tell git to
- further add to the index but you still haven't.  You can
+ further add to the staging area but you still haven't.  You can
  stage these changes by using linkgit:git-add[1].
 +
 If exactly two paths are given and at least one points outside
 the current repository, 'git diff' will compare the two files /
-directories. This behavior can be forced by --no-index.
+directories. This behavior can be forced by --not-staged.

-'git diff' [--options] --cached [<commit>] [--] [<path>...]::
+'git diff' [--options] --staged [<commit>] [--] [<path>...]::

  This form is to view the changes you staged for the next
  commit relative to the named <commit>.  Typically you
@@ -40,7 +40,7 @@ directories. This behavior can be forced by --no-index.
  do not give <commit>, it defaults to HEAD.
  If HEAD does not exist (e.g. unborned branches) and
  <commit> is not given, it shows all staged changes.
- --staged is a synonym of --cached.
+ --cached is a synonym of --staged, will be removed in version 2.0
(or whatever).

 'git diff' [--options] <commit> [--] [<path>...]::

@@ -102,12 +102,12 @@ Various ways to check your working tree::
 +
 ------------
 $ git diff            <1>
-$ git diff --cached   <2>
+$ git diff --staged   <2>
 $ git diff HEAD       <3>
 ------------
 +
 <1> Changes in the working tree not yet staged for the next commit.
-<2> Changes between the index and your last commit; what you
+<2> Changes between the staging area and your last commit; what you
 would be committing if you run "git commit" without "-a" option.
 <3> Changes in the working tree since your last commit; what you
 would be committing if you run "git commit -a"
diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index dae190a..65aa798 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -12,9 +12,9 @@ SYNOPSIS

 DESCRIPTION
 -----------
-Displays paths that have differences between the index file and the
+Displays paths that have differences between the staging area and the
 current HEAD commit, paths that have differences between the working
-tree and the index file, and paths in the working tree that are not
+tree and the staging area, and paths in the working tree that are not
 tracked by git (and are not ignored by linkgit:gitignore[5]). The first
 are what you _would_ commit by running `git commit`; the second and
 third are what you _could_ commit by running 'git add' before running
@@ -91,7 +91,7 @@ In short-format, the status of each path is shown as

 where `PATH1` is the path in the `HEAD`, and ` -> PATH2` part is
 shown only when `PATH1` corresponds to a different path in the
-index/worktree (i.e. the file is renamed). The 'XY' is a two-letter
+staging area/worktree (i.e. the file is renamed). The 'XY' is a two-letter
 status code.

 The fields (including the `->`) are separated from each other by a
@@ -102,7 +102,7 @@ interior special characters backslash-escaped.

 For paths with merge conflicts, `X` and 'Y' show the modification
 states of each side of the merge. For paths that do not have merge
-conflicts, `X` shows the status of the index, and `Y` shows the status
+conflicts, `X` shows the status of the staging area, and `Y` shows the status
 of the work tree.  For untracked paths, `XY` are `??`.  Other status
 codes can be interpreted as follows:

@@ -119,13 +119,13 @@ Ignored files are not listed.
     X          Y     Meaning
     -------------------------------------------------
               [MD]   not updated
-    M        [ MD]   updated in index
-    A        [ MD]   added to index
-    D         [ M]   deleted from index
-    R        [ MD]   renamed in index
-    C        [ MD]   copied in index
-    [MARC]           index and work tree matches
-    [ MARC]     M    work tree changed since index
+    M        [ MD]   updated in staging area
+    A        [ MD]   added to staging area
+    D         [ M]   deleted from staging area
+    R        [ MD]   renamed in staging area
+    C        [ MD]   copied in staging area
+    [MARC]           staging area and work tree matches
+    [ MARC]     M    work tree changed since staging area
     [ MARC]     D    deleted in work tree
     -------------------------------------------------
     D           D    unmerged, both deleted
--
1.7.4.1.26.g00e6e
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Junio C Hamano
In reply to this post by Miles Bader-2
Miles Bader <[hidden email]> writes:

> Er...?
>
> Here we were talking about using "-s" (inspired by "--staged"), which
> I suggested because you earlier objected to "-c"...

Not _we were_, but _you_ were.

I actually was hoping that it was obvious that -s is a no-starter from the
messages so far in this thread, as neither --cached nor its more
descriptive spelling --index-only has character 's' anywhere in it, and we
have been keeping --staged as a low-key synonym for a reason.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Jonathan Nieder-2
In reply to this post by Piotr Krukowiecki
Hi again,

Piotr Krukowiecki wrote:

> In other places "index" is called "staging area" and act of updating the index
> is called "staging in the index".
>
> I ask: why do we need the "index" term at all?
>
>    - instead of "index" use "staging" and "staging area"
>    - instead of "listed in index" use "staged" or "tracked"

Unlike "staging area", the word "index" is unfamiliar and opaque.  So
there is a sense that there is something to learn.

When people talk about the staging area I tend to get confused.  I
think there's an idea that because it sounds more concrete, there is
less to explain --- or maybe I am just wired the wrong way.

There is a .git/index file, with a well defined file format.  And
there is an in-core copy of the index, too.  It contains:

 - mode and blob name for paths as requested by the user with
   "git add"

 - competing versions for paths whose proposed content is
   uncertain during a merge

 - stat(2) information to speed up comparison with the worktree

There are some other pieces, too --- "intent-to-add" entries added
with "git add -N", cached tree names for unmodified subtrees to
speed up "git commit", and so on.  But the 3 pieces listed above are
the main thing.

"Staging area" only describes the first.

All that said, I am not against formulations like "content of the next
commit" that might be more concrete from a user's point of view.

[...]
>  --refresh::
>   Don't add the file(s), but only refresh their stat()
> - information in the index.
> + information in the staging area.

git add/update-index --refresh are precisely meant for _not_ changing
the content of the next commit, so this particular change seems
confusing.

Hoping that is clearer.  Thanks for caring.
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Miles Bader-2
In reply to this post by Junio C Hamano
On Tue, Feb 15, 2011 at 7:59 AM, Junio C Hamano <[hidden email]> wrote:
> I actually was hoping that it was obvious that -s is a no-starter from the
> messages so far in this thread, as neither --cached nor its more
> descriptive spelling --index-only has character 's' anywhere in it, and we
> have been keeping --staged as a low-key synonym for a reason.

It was not at all obvious.  Even if you like --cached more than
--staged, there's a difference between advocating "--staged", and
using "-s" as a short-option for the operation which --cached /
--staged invoke.

Short option names are often a compromise, because clearly there are
often conflicts.  That _doesn't_ mean that one should simply not have
a short option, when a "perfect" choice cannot be found.  If a
"perfect" short-option isn't available, then usually one turns to
somewhat less perfect choices, trying to at least find some heuristic
that can make them easier to memorize -- because in the end, short
options must be memorized (and if they are truly common operations,
this isn't generally difficult; it's memorizing _rarely_ used short
options that's hard).

Of the various choices, "-s" does at least have such a heuristic
connection to an appropriate long option ("-i" is arguably worse than
-s, because it doesn't have any such connection...).  Can you suggest
something better?

[BTW, isn't the name "--index-only" something of a misnomer?  If
something is called "--XXX-only", that implies that the default
operation uses "XXX + something else" instead of XXX, but that
otherwise they are the same.  However in fact the difference in
behavior resulting from --cached is more subtle: it changes _both_
sides of the diff (default: worktree<->index; --cached: index<->HEAD).
 The names --cached and --staged actually capture this well -- they
basically say "the default is worktree changes, and --cached/--staged
diffs cached/staged changes instead" -- but the name "--index-only"
does not.]

-Miles

--
Cat is power.  Cat is peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Junio C Hamano
Miles Bader <[hidden email]> writes:

> [BTW, isn't the name "--index-only" something of a misnomer?  If
> something is called "--XXX-only", that implies that the default
> operation uses "XXX + something else" instead of XXX, but that
> otherwise they are the same.  However in fact the difference in
> behavior resulting from --cached is more subtle: it changes _both_
> sides of the diff (default: worktree<->index; --cached: index<->HEAD).

Not really.

There are three entities involved: a tree-ish, the index, and the working
tree.  Because the index is a singleton, when you say "compare the index
with...", you only have two choices, either compare it against a tree-ish,
or compare it with the working tree.  If you want to do the latter, you
just use the command without --cached nor tree-ish.

The --cached form defaults to HEAD only because --cached mode is about
comparing the index against a tree-ish (think about "diff --cached HEAD^").

The same thing for --index-only.  The moment you said "compare the index
with...", there are only two other things to compare it against and that
is the only reason why you do not have to write HEAD.

This is a tangent, but the natural patch-flow is for you to prepare your
change in the working tree, add the changes to the index, and then build a
tree out of the index into a commit.

That is why "diff" shows changes in the working tree relative to what is
in the index, "diff --cached [<tree-ish>]" shows changes in the index
relative to the tree-ish (defaulting to HEAD).  The natural flow of the
development determines the natural direction of comparison between these
entities.

It does not make sense to compare in the other direction (i.e. how is the
index different compared to the working tree) _unless_ you are
contemplating to revert some changes you have made, and -R is there
exactly for that reason (here I am responding to the idea some people had
in an earlier incarnation of this thread of saying "diff INDEX HEAD",
"diff HEAD WORKTREE" etc., using pseudo <ref> syntax, and explaining why
it is not such a good idea---and why this is a tangent).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Pete Harlan
In reply to this post by Jonathan Nieder-2
On 02/14/2011 03:19 PM, Jonathan Nieder wrote:

> Hi again,
>
> Piotr Krukowiecki wrote:
>
>> In other places "index" is called "staging area" and act of updating the index
>> is called "staging in the index".
>>
>> I ask: why do we need the "index" term at all?
>>
>>    - instead of "index" use "staging" and "staging area"
>>    - instead of "listed in index" use "staged" or "tracked"
>
> Unlike "staging area", the word "index" is unfamiliar and opaque.  So
> there is a sense that there is something to learn.
>
> When people talk about the staging area I tend to get confused.  I
> think there's an idea that because it sounds more concrete, there is
> less to explain --- or maybe I am just wired the wrong way.
>
> There is a .git/index file, with a well defined file format.  And
> there is an in-core copy of the index, too.  It contains:
>
>  - mode and blob name for paths as requested by the user with
>    "git add"
>
>  - competing versions for paths whose proposed content is
>    uncertain during a merge
>
>  - stat(2) information to speed up comparison with the worktree
>
> There are some other pieces, too --- "intent-to-add" entries added
> with "git add -N", cached tree names for unmodified subtrees to
> speed up "git commit", and so on.  But the 3 pieces listed above are
> the main thing.

Thank you for that explanation.

> "Staging area" only describes the first.

...which to me means only that "staging area" isn't enough to fully
describe what Git can do.

From the user's perspective, merge conflict resolution is a separate
process from staging a commit; where does Git's usability benefit from
blending the two concepts by referring (in command syntax and
manpages) to their common internal data structure?

One of Git's charms is the simplicity of blobs, trees, commits and
tags and how those ingredients prove tremendously useful in developing
software.  And I don't think anyone can use Git well without fully
understanding what those structures are (and are not).

But I believe the rest of Git's internals are in a different category.
Regardless of how elegant the solution may be, as a user I can use Git
well without knowing _how_ Git can tell that foo.c contains staged and
unstaged changes.  Nor do I need to know how it knows that bar.c is in
conflict.  I don't need to know precisely how it implements its packed
object database to use it effectively.

Part of the issue could be that one intimately familiar with Git's
internals may find a process oriented interface irritating ("Why must
it say 'staging area' when it's just updating the index?"), while one
unfamiliar with the internals has the opposite reaction ("Why must it
make me use the internal name of the staging area?").

Someone suggested using a different top-level name for Git to allow
for completely rewriting the interface.  I expect that it's this
difference of perspective that makes that appear necessary.  I believe
that a rewrite is the wrong approach, but I believe that abstractions
like "staging area" move the user-interface a little more toward the
user and that there's value in that.

--Pete

> All that said, I am not against formulations like "content of the next
> commit" that might be more concrete from a user's point of view.
>
> [...]
>>  --refresh::
>>   Don't add the file(s), but only refresh their stat()
>> - information in the index.
>> + information in the staging area.
>
> git add/update-index --refresh are precisely meant for _not_ changing
> the content of the next commit, so this particular change seems
> confusing.
>
> Hoping that is clearer.  Thanks for caring.
> Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Jonathan Nieder-2
Hi Pete,

Pete Harlan wrote:

> Part of the issue could be that one intimately familiar with Git's
> internals may find a process oriented interface irritating ("Why must
> it say 'staging area' when it's just updating the index?")

No, no.  I agree there's a problem to solve here.  The current
documentation for git (e.g., the user manual) has a nice, coherent,
user-oriented narrative about trees, commits, and blobs, and meanwhile
it is hard to find a clear story about the index.

Such a story would have to describe the conflict resolution process.
When you encounter a merge conflict, how do you resolve it?  The best
I can do for now is to point to the user manual[1].

http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#conflict-resolution

I even think it is okay to say "The index is a sort of staging area
for your next commit".  Because that is true.  But it is not the full
story, so if one wants to give the index a new name --- which is a
costly thing to do, anyway --- then I do not think "the staging area"
works.

I feel bad to only be presenting complications instead of an alternate
solution.  I do consider workflow oriented explanations very useful.
I've been giving technical explanations in this thread as background
for future storytelling, in the hope that someone more talented than I
am can digest it into a good narrative.

Jonathan

[1] Maybe the process is overdesigned.  After all, what would we lose
by saying

 - an unmerged path justs gets an "unmerged" flag set, meaning that
   flag is not ready for commit yet
 - to get the copy from the common ancestor, use
        git show $(git merge-base HEAD MERGE_HEAD):path/to/file
 - to get the copy from HEAD, use
        git show HEAD:path/to/file
 - likewise to get the copy from MERGE_HEAD

And while I can give answers about why that is a bad interface
(recomputing the merge base is a waste of time; in a recursive merge
the merge base is not a real commit; if there were renames, the copy
from HEAD could be HEAD:other/path and it is hard to find what
other/path is), are those answers enough to justify learning this new
trick?

So we need a better story.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Piotr Krukowiecki-2
In reply to this post by Jonathan Nieder-2
On Tue, Feb 15, 2011 at 12:19 AM, Jonathan Nieder <[hidden email]> wrote:

> Hi again,
>
> Piotr Krukowiecki wrote:
>>  --refresh::
>>       Don't add the file(s), but only refresh their stat()
>> -     information in the index.
>> +     information in the staging area.
>
> git add/update-index --refresh are precisely meant for _not_ changing
> the content of the next commit, so this particular change seems
> confusing.

If there is no staging - no commit, then you're right. But then you don't
have to mention index at all:

  --refresh::
       Don't add the file(s), but only refresh their stat()
       information.

I completely agree with Pete Harlan - for normal user git internals are
not relevant - index is just part of git. How or where the stat information is
refreshed does not matter.

In the same way you don't write that it's done by function refresh_index().


> Hoping that is clearer.  Thanks for caring.
> Jonathan

Thanks for explanation.


--
Piotrek
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Jonathan Nieder-2
Piotr Krukowiecki wrote:
>> Piotr Krukowiecki wrote:

>>>  --refresh::
>>>       Don't add the file(s), but only refresh their stat()
>>> -     information in the index.
>>> +     information in the staging area.
[...]
> If there is no staging - no commit, then you're right. But then you don't
> have to mention index at all:
>
>   --refresh::
>        Don't add the file(s), but only refresh their stat()
>        information.

Yes, that sounds like an improvement.  Though I'd suggest something
like:

  --refresh::
        Don't add the files' content and mode, but refresh their stat(2)
        information if it is out of date.  For example, you'd want to
        do this after restoring a repository from backup, to link up
        the stat index details with the proper files.

The exact wording could use tweaking, but hopefully the idea is clear
(to explain what the option is actually used for).

> index is just part of git. How or where the stat information is
> refreshed does not matter.

I agree with that.  That this is (1) specific to that index, so the
operation needs to be repeated if you use GIT_INDEX_FILE to work with
a second index and (2) has as its only purpose speeding up operations
that compare the index to the worktree are relevant, though.

Anyway, I don't want to argue.  Many of the places pointed out in
the manual could use help.  It could even involve inserting the
phrase "a staging area".

Hopefully I have made clear why excising the word "index" from git
vocabulary (like the word "current directory cache" was eventually
eliminated over time in the past) does not seem like a good idea when
we don't even have a good alternative for it.  As the original post
mentioned, using three terms in documentation for fundamentally the
same thing is going to get confusing after a while.  Why not just use
one ("the index")?

Sorry for the ramble.
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Drew Northup
In reply to this post by Pete Harlan

On Sun, 2011-02-13 at 19:09 -0800, Pete Harlan wrote:

> On 02/13/2011 02:58 PM, Junio C Hamano wrote:
> >> --staged
> >> ~~~~~~~~
> >> diff takes --staged, but that is only to support some people's habits.
> > The term "stage" comes from "staging area", a term people used to explain
> > the concept of the index by saying "The index holds set of contents to be
> > made into the next commit; it is _like_ the staging area".
> >
> > My feeling is that "to stage" is primarily used, outside "git" circle, as
> > a logistics term.  If you find it easier to visualize the concept of the
> > index with "staging area" ("an area where troops and equipment in transit
> > are assembled before a military operation", you may find it easier to say
> > "stage this path ('git add path')", instead of "adding to the set of
> > contents...".
>
> FWIW, when teaching Git I have found that users immediately understand
> "staging area", while "index" and "cache" confuse them.
>
> "Index" means to them a numerical index into a data structure.
> "Cache" is a local copy of something that exists remotely.  Neither
> word describes the concept correctly from a user's perspective.

According to the dictionary (actually, more than one) "cache" is a
hidden storage space. I'm pretty sure that's the sense most global and
therefore most appropriate to thinking about Git. (It certainly
describes correctly what web browser cache and on-CPU cache is doing.)
One would only think the definition you gave applied if they didn't know
that squirrels "cache" nuts. I don't think that the problem is the
idiom.

> I learned long ago to type "index" and "cached", but when talking (and
> thinking) about Git I find "the staging area" gets the point across
> very clearly and moves Git from interesting techie-tool to
> world-dominating SCM territory.  I'm surprised that that experience
> isn't universal.

Perhaps that helps you associate it with other SCM/VCS software, but it
didn't help me. When I realized that the "index" is called that BECAUSE
IT IS AN INDEX (of content/data states for a pending commit operation)
the sky cleared and the sun came out.

In all reality the closest thing Git has to an actual staging area is
all of the objects in .git/objects only recorded by the index itself.
Git-stored objects not compressed into pack files could technically be
described as "cached" using the standard definition--they aren't visible
in the working directory. Unfortunately this probably just muddies the
water for all too many users.

So, in summary--the index is real, objects "cached" pending
commit/cleanup/packing are real; any "staging area" is a rhetorical
combination of the two. Given that rhetorical device may not work in all
languages (as Junio mentioned earlier) I don't recommend that we rely on
it.

--
-Drew Northup
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Felipe Contreras
On Thu, Feb 17, 2011 at 1:11 AM, Drew Northup <[hidden email]> wrote:

>
> On Sun, 2011-02-13 at 19:09 -0800, Pete Harlan wrote:
>> On 02/13/2011 02:58 PM, Junio C Hamano wrote:
>> >> --staged
>> >> ~~~~~~~~
>> >> diff takes --staged, but that is only to support some people's habits.
>> > The term "stage" comes from "staging area", a term people used to explain
>> > the concept of the index by saying "The index holds set of contents to be
>> > made into the next commit; it is _like_ the staging area".
>> >
>> > My feeling is that "to stage" is primarily used, outside "git" circle, as
>> > a logistics term.  If you find it easier to visualize the concept of the
>> > index with "staging area" ("an area where troops and equipment in transit
>> > are assembled before a military operation", you may find it easier to say
>> > "stage this path ('git add path')", instead of "adding to the set of
>> > contents...".
>>
>> FWIW, when teaching Git I have found that users immediately understand
>> "staging area", while "index" and "cache" confuse them.
>>
>> "Index" means to them a numerical index into a data structure.
>> "Cache" is a local copy of something that exists remotely.  Neither
>> word describes the concept correctly from a user's perspective.
>
> According to the dictionary (actually, more than one) "cache" is a
> hidden storage space. I'm pretty sure that's the sense most global and
> therefore most appropriate to thinking about Git. (It certainly
> describes correctly what web browser cache and on-CPU cache is doing.)
> One would only think the definition you gave applied if they didn't know
> that squirrels "cache" nuts. I don't think that the problem is the
> idiom.

Not really. If a squirrel "caches" nuts, it means a squirrel is
putting them in a hidden place to save them for future use. So, in the
future, if said squirrel wants a nut, it doesn't have to look for it
in the trees, just go to the cache. So the cache makes it easier to
access whatever your want.

IOW; if you don't cache something, you would have more trouble getting
it, but you still can.

That's not what Git is doing. Git is not putting changes in a place so
the can be more easily accessed in the future. It is using a temporary
device that allows the commit to be built through an extended period
of time. It's not a cache.

>> I learned long ago to type "index" and "cached", but when talking (and
>> thinking) about Git I find "the staging area" gets the point across
>> very clearly and moves Git from interesting techie-tool to
>> world-dominating SCM territory.  I'm surprised that that experience
>> isn't universal.
>
> Perhaps that helps you associate it with other SCM/VCS software, but it
> didn't help me. When I realized that the "index" is called that BECAUSE
> IT IS AN INDEX (of content/data states for a pending commit operation)
> the sky cleared and the sun came out.

That's not an index. An index is a guide of pointers to something
else. It allows you to find whatever you are looking for by looking in
small table of pointers instead of looking through all the samples.

IOW; if you don't index something, you would have more trouble finding
it, but you still can.

That's not what Git is doing.

> In all reality the closest thing Git has to an actual staging area is
> all of the objects in .git/objects only recorded by the index itself.
> Git-stored objects not compressed into pack files could technically be
> described as "cached" using the standard definition--they aren't visible
> in the working directory. Unfortunately this probably just muddies the
> water for all too many users.

That's irrelevant. You can implement the same functionality in many
other ways. How it is implement doesn't matter, what matters is what
the user experiences.

> So, in summary--the index is real, objects "cached" pending
> commit/cleanup/packing are real; any "staging area" is a rhetorical
> combination of the two. Given that rhetorical device may not work in all
> languages (as Junio mentioned earlier) I don't recommend that we rely on
> it.

Branches and tags are "rthetorical" devices as well. But behind scenes
they are just refs. Shall we disregard 'branch' and 'tag'?

No. What Git does behind scenes is irrelevant to the user. What
matters is what the device does, not how it is implemented; the
implementation might change. "Stage" is the perfect word; both verb
and a noun that express a temporary space where things are prepared
for their final form.

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Felipe Contreras
In reply to this post by Jonathan Nieder-2
On Tue, Feb 15, 2011 at 1:19 AM, Jonathan Nieder <[hidden email]> wrote:
> When people talk about the staging area I tend to get confused.  I
> think there's an idea that because it sounds more concrete, there is
> less to explain --- or maybe I am just wired the wrong way.

I don't like the phrase "staging area". A "stage" already has an area.
You put things on the stage. Sometimes there are multiple stages.

> There is a .git/index file, with a well defined file format.  And
> there is an in-core copy of the index, too.  It contains:
>
>  - mode and blob name for paths as requested by the user with
>   "git add"

A commit stage.

>  - competing versions for paths whose proposed content is
>   uncertain during a merge

Multiple commit stages.

>  - stat(2) information to speed up comparison with the worktree

If only a subset of the files are there, it's an 'index', if not, then
I'd say it's a 'registry'. Anyway, it's something the user shouldn't
care about.

Cheers.

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Jonathan Nieder-2
Hi Felipe et al,

Felipe Contreras wrote:
> On Tue, Feb 15, 2011 at 1:19 AM, Jonathan Nieder <[hidden email]> wrote:

>>  - mode and blob name for paths as requested by the user with
>>   "git add"
>
> A commit stage.
>
>>  - competing versions for paths whose proposed content is
>>   uncertain during a merge
>
> Multiple commit stages.
>
>>  - stat(2) information to speed up comparison with the worktree
>
> If only a subset of the files are there, it's an 'index', if not, then
> I'd say it's a 'registry'.

These terms you suggest aren't the established ones (as I'm sure you
know).  Just as with everyday language, there is some resistance to
moving to new terms that have not been established for a while.  In
everyday language, many terms gained popularity by

 - appearing in some document that people read for another reason
 - describing the notion they are meant to describe clearly (or
   having some other feature that makes them likeable)

This is how "staging area" has been gaining popularity, I think ---
some (out-of-tree) documentation that is good for other reasons uses
it, and it really does seem to be a clearer term than "index" for
"place where the next commit is being prepared".  Unfortunately, I do
not think it is a clearer term than "index" for "the git index, which
contains stat() information and pointers to blobs that either belong
in the next commit or are participating in a merge conflict".  So it
does not seem to justify rewriting everything to use it.

Which suggests one way forward --- if you believe you have terms that
do describe those concepts clearly, one way to promote them is to
write some good, clear (out-of-tree, to begin with) documentation
using them.  Presumably this documentation would also mention that
other people use other terms to avoid confusing the reader.

Hope that helps,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Miles Bader-2
Jonathan Nieder <[hidden email]> writes:
> This is how "staging area" has been gaining popularity, I think ---
> some (out-of-tree) documentation that is good for other reasons uses
> it, and it really does seem to be a clearer term than "index" for
> "place where the next commit is being prepared".

Also "magit" uses the label "Staging area:" for the list of files to be
committed -- and the key-binding to add a file to that list is "s"...

-Miles

--
Christian, n. One who follows the teachings of Christ so long as they are not
inconsistent with a life of sin.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Felipe Contreras
In reply to this post by Jonathan Nieder-2
On Sat, Feb 26, 2011 at 11:51 PM, Jonathan Nieder <[hidden email]> wrote:
> These terms you suggest aren't the established ones (as I'm sure you
> know).  Just as with everyday language, there is some resistance to
> moving to new terms that have not been established for a while.  In
> everyday language, many terms gained popularity by
>
>  - appearing in some document that people read for another reason
>  - describing the notion they are meant to describe clearly (or
>   having some other feature that makes them likeable)

There's always resistance, but 1.8 is supposed to contain stuff as "if
git was written from scratch". I think this makes sense as one of
them.

> This is how "staging area" has been gaining popularity, I think ---
> some (out-of-tree) documentation that is good for other reasons uses
> it, and it really does seem to be a clearer term than "index" for
> "place where the next commit is being prepared".  Unfortunately, I do
> not think it is a clearer term than "index" for "the git index, which
> contains stat() information and pointers to blobs that either belong
> in the next commit or are participating in a merge conflict".  So it
> does not seem to justify rewriting everything to use it.

Why should the users care about the stat() information? Or how the
merge conflicts are being tracked? That's plumbing, not porcelain.

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Jonathan Nieder-2
Hi,

Felipe Contreras wrote:
[out of order for convenience]

> Why should the users care about the stat() information? Or how the
> merge conflicts are being tracked?

The second question is very easy to answer (depending on what "how"
means, of course).  Because people integrating changes from multiple
places need to be able to resolve a conflicted merge.

> That's plumbing, not porcelain.

I don't disagree.  The analogy is almost perfect.

And the thing is, in the real world, people know about plumbing.  They
don't care about the details, but they know there are these things
called pipes, and that water tends to flow downward, and that if one
of them freezes, it will burst.  This knowledge is useful.

Likewise, it is useful to know:

 - After you use "cp -a" to copy a repository, the first operation
   you perform is going to be slower.  The cached stat() information
   is stale.

 - Until you run "git add", there is only one copy of your data, in
   the worktree.  After you run "git add", there are two copies.
   Once you run "git commit", that second copy will last at least
   as long as your commit does.

   So there is some chance of recovery from fat-finger mistakes,
   even before a commit.

 - During a merge, you can mark your progress by collapsing index
   entries with 'git add'.  "git diff" will show the state of the
   merge.  You can read the competing versions of a file with
   "git show :2:path/to/file" and "git show :3:path/to/file".

 - Index-only operations tend to be faster, since

    (1) the cached blobs are not changing, so we can save time
        stat(2)-ing and read(2)-ing files
    (2) blobs are compressed: less I/O.  Longstanding blobs are
        in pack files: good caching and I/O patterns.

   So you can speed up your slow "git grep" by using
   "git grep --cached".

 - When scripting, you can use a temporary index file to avoid
   affecting the remembered worktree state.

But so what?  I have nothing against clearer terms.  I am just saying
that (1) we should be explaining these things somewhere and (2) a
global s/index/only one of the things the index does/ is a bad idea,
because it would make the documentation *wrong*.

> There's always resistance, but 1.8 is supposed to contain stuff as "if
> git was written from scratch".

I thought 1.8 was supposed to provide an opportunity to correct some
long-known mistakes that we had been holding back on for backward
compatibility reasons.  That doesn't mean we should forget the cost of
change.

Thanks for your work, and hope that helps.
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: Consistent terminology: cached/staged/index

Junio C Hamano
In reply to this post by Felipe Contreras
Felipe Contreras <[hidden email]> writes:

> There's always resistance, but 1.8 is supposed to contain stuff as "if
> git was written from scratch".

Yes, the 1.8.0 is indeed an opportunity to rethink, based on the wisdom we
have gained over the years since the current git was written.

If there has already been a clear consensus that we would have done
something differently if we knew better, it is an opportunity to first
discuss if there is a way to correct these earlier mistakes in a way that
does not have to introduce incompatibility, and if it is not feasible,
discuss a plan to ease incompatible changes in without hurting existing
users too much.

A new discussion or proposal is fine, but you should be able to see that
an effort to start building consensus from now is very much outside the
scope of the discussion for the 1.8.0 we have been having.

Besides, taking what other people said already in the thread also into
account, it looks to me that what you are advocating is too premature to
be called a consensus yet.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
1234