[PATCH] Use line buffering for standard output

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Use line buffering for standard output

Anders Melchiorsen
Normally, stdout is fully buffered, unless it refers to a terminal
device. This gives problems when fork() is in play: the buffer is
cloned and output appears twice.

By always setting stdout to line buffering, we make the output work
identically for all output devices.

Signed-off-by: Anders Melchiorsen <[hidden email]>
---

On #git, blix mentioned that running git clone through a pipe made it
output the "Initialized empty" line twice. This seems to be due to
bad interactions between fork() and buffered stdio.

Rather than putting in flushing at all the right places, this
sledgehammer fix simply reverts to line buffering for all output devices.


Anders.


 git.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/git.c b/git.c
index 37b1d76..040b900 100644
--- a/git.c
+++ b/git.c
@@ -421,6 +421,12 @@ int main(int argc, const char **argv)
  int done_alias = 0;
 
  /*
+ * Use line buffering, even if we do not have interactive
+ * output. Full buffering mixes badly with fork().
+ */
+ setvbuf(stdout, NULL, _IOLBF, 0);
+
+ /*
  * Take the basename of argv[0] as the command
  * name, and the dirname as the default exec_path
  * if we don't have anything better.
--
1.5.6.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use line buffering for standard output

Junio C Hamano
Anders Melchiorsen <[hidden email]> writes:

> Normally, stdout is fully buffered, unless it refers to a terminal
> device. This gives problems when fork() is in play: the buffer is
> cloned and output appears twice.

I thought we have been careful to flush() them before we fork but perhaps
there are recent changes that were not careful.  Wouldn't it be better to
fix them first, independent of this change?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use line buffering for standard output

Linus Torvalds-3
In reply to this post by Anders Melchiorsen


On Sun, 3 Aug 2008, Anders Melchiorsen wrote:
>
> Normally, stdout is fully buffered, unless it refers to a terminal
> device. This gives problems when fork() is in play: the buffer is
> cloned and output appears twice.
>
> By always setting stdout to line buffering, we make the output work
> identically for all output devices.

Please don't.

This is a huge peformance issue for things like

        git log -p > file

where we really want it to be fully buffered.

So please just find the place where we do a fork() without flushing
pending output...

(We really shouldn't have all that many "fork()" calls left, I thought -
the Windows stuff means that most of it should be abstracted away. So it's
not like we're talking about hundreds of sites, there should be just a
couple).

                Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use line buffering for standard output

Anders Melchiorsen
Linus Torvalds <[hidden email]> writes:

> On Sun, 3 Aug 2008, Anders Melchiorsen wrote:
>>
>> By always setting stdout to line buffering, we make the output work
>> identically for all output devices.
>
> Please don't.
>
> This is a huge peformance issue for things like
>
> git log -p > file
>
> where we really want it to be fully buffered.
>
> So please just find the place where we do a fork() without flushing
> pending output...

Sure. The sledgehammer approach was partly to get some advice on the
proper solution. I now realize that you have generally been careful
about this, and so a single flush should be enough.

Below are two alternative proposals, one local and one global. Both of
them fix the problem for me, but maybe you were even thinking about a
third place?

For the run-command.c one, I was not sure whether to put it inside or
outside the ifdef, and I also was not sure whether to add it for
start_command(). Not having other testcases, and not knowing Windows,
this is the way it ended up.


Cheers,
Anders.



From: Anders Melchiorsen <[hidden email]>
Date: Mon, 4 Aug 2008 00:21:49 +0200
Subject: [PATCH] Flush stdout in init-db

Before this change, clone outputs "Initialized empty ..." twice
if output is piped.

Signed-off-by: Anders Melchiorsen <[hidden email]>
---
 builtin-init-db.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/builtin-init-db.c b/builtin-init-db.c
index baf0d09..954c7e9 100644
--- a/builtin-init-db.c
+++ b/builtin-init-db.c
@@ -315,11 +315,13 @@ int init_db(const char *template_dir, unsigned int flags)
  git_config_set("receive.denyNonFastforwards", "true");
  }
 
- if (!(flags & INIT_DB_QUIET))
+ if (!(flags & INIT_DB_QUIET)) {
  printf("%s%s Git repository in %s/\n",
        reinit ? "Reinitialized existing" : "Initialized empty",
        shared_repository ? " shared" : "",
        get_git_dir());
+ fflush(stdout);
+ }
 
  return 0;
 }



From: Anders Melchiorsen <[hidden email]>
Date: Mon, 4 Aug 2008 00:35:40 +0200
Subject: [PATCH] Flush standard output in start_async

This prevents double output in case stdout is redirected.

Signed-off-by: Anders Melchiorsen <[hidden email]>
---
 run-command.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/run-command.c b/run-command.c
index a3b28a6..67be079 100644
--- a/run-command.c
+++ b/run-command.c
@@ -304,6 +304,9 @@ int start_async(struct async *async)
  async->out = pipe_out[0];
 
 #ifndef __MINGW32__
+ /* Flush output before fork() to avoid cloning the buffer */
+ fflush(stdout);
+
  async->pid = fork();
  if (async->pid < 0) {
  error("fork (async) failed: %s", strerror(errno));
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use line buffering for standard output

Junio C Hamano
Anders Melchiorsen <[hidden email]> writes:

> From: Anders Melchiorsen <[hidden email]>
> Date: Mon, 4 Aug 2008 00:35:40 +0200
> Subject: [PATCH] Flush standard output in start_async
>
> This prevents double output in case stdout is redirected.
>
> Signed-off-by: Anders Melchiorsen <[hidden email]>
> ---
>  run-command.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/run-command.c b/run-command.c
> index a3b28a6..67be079 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -304,6 +304,9 @@ int start_async(struct async *async)
>   async->out = pipe_out[0];
>  
>  #ifndef __MINGW32__
> + /* Flush output before fork() to avoid cloning the buffer */
> + fflush(stdout);
> +
>   async->pid = fork();
>   if (async->pid < 0) {
>   error("fork (async) failed: %s", strerror(errno));

I think this with s/stdout/NULL/ would be a reasonable thing to do.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH] Flush output in start_async

Anders Melchiorsen
This prevents double output in case stdout is redirected.

Signed-off-by: Anders Melchiorsen <[hidden email]>
---
 run-command.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/run-command.c b/run-command.c
index a3b28a6..6af83c5 100644
--- a/run-command.c
+++ b/run-command.c
@@ -304,6 +304,9 @@ int start_async(struct async *async)
  async->out = pipe_out[0];
 
 #ifndef __MINGW32__
+ /* Flush stdio before fork() to avoid cloning buffers */
+ fflush(NULL);
+
  async->pid = fork();
  if (async->pid < 0) {
  error("fork (async) failed: %s", strerror(errno));
--
1.5.6.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use line buffering for standard output

Linus Torvalds-3
In reply to this post by Junio C Hamano


On Sun, 3 Aug 2008, Junio C Hamano wrote:

> Anders Melchiorsen <[hidden email]> writes:
> >  
> >  #ifndef __MINGW32__
> > + /* Flush output before fork() to avoid cloning the buffer */
> > + fflush(stdout);
> > +
> >   async->pid = fork();
> >   if (async->pid < 0) {
> >   error("fork (async) failed: %s", strerror(errno));
>
> I think this with s/stdout/NULL/ would be a reasonable thing to do.

Agreed, I think that's the right thing to do.

There's another fork there in start_command(), I suspect we should do it
there too: it's a "generic" path, so it should try to be safe.

The other ones look ok from a quick scan. I don't know the imap-send.c
code, but it's from outside people who hopefully know what they were
doing. The other ones don't seem to be using stdio before the fork (except
for things like "die()" ;)

There is a "fork()" in a _comment_ in builtin-ls-tree.c, and that one
definitely should have a fflush(NULL) in front of it. But it _is_ just a
comment, and rather than addign a fflush() there, it would probably be
better to turn it into a "start_command()" or something like that.

                        Linus.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[PATCH] Add output flushing before fork()

Anders Melchiorsen
This adds fflush(NULL) before fork() in start_command(), to keep
the generic interface safe.

A remaining use of fork() with no flushing is in a comment in
show_tree(). Rewrite that comment to use start_command().

Signed-off-by: Anders Melchiorsen <[hidden email]>
---

This fixes up the remaining two spots that Linus suggested.


 builtin-ls-tree.c |   13 ++++++-------
 run-command.c     |    1 +
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin-ls-tree.c b/builtin-ls-tree.c
index d25767a..cb61717 100644
--- a/builtin-ls-tree.c
+++ b/builtin-ls-tree.c
@@ -66,17 +66,16 @@ static int show_tree(const unsigned char *sha1, const char *base, int baselen,
  /*
  * Maybe we want to have some recursive version here?
  *
- * Something like:
+ * Something similar to this incomplete example:
  *
  if (show_subprojects(base, baselen, pathname)) {
- if (fork()) {
- chdir(base);
- exec ls-tree;
- }
- waitpid();
+ struct child_process ls_tree;
+
+ ls_tree.dir = base;
+ ls_tree.argv = ls-tree;
+ start_command(&ls_tree);
  }
  *
- * ..or similar..
  */
  type = commit_type;
  } else if (S_ISDIR(mode)) {
diff --git a/run-command.c b/run-command.c
index 6af83c5..bbb9c77 100644
--- a/run-command.c
+++ b/run-command.c
@@ -68,6 +68,7 @@ int start_command(struct child_process *cmd)
  trace_argv_printf(cmd->argv, "trace: run_command:");
 
 #ifndef __MINGW32__
+ fflush(NULL);
  cmd->pid = fork();
  if (!cmd->pid) {
  if (cmd->no_stdin)
--
1.5.6.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html