git-svn now work with crlf convertion enabled.

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

git-svn now work with crlf convertion enabled.

Alexander Litvinov
Make git-svn works with crlf (or any other) file content convertion enabled.

When we modify file content SVN cant apply its delta to it. To fix this
situation I take full file content from SVN as next revision. This is
dump and slow but it works.
---
 git-svn.perl |   34 +++++++++++++++++++---------------
 1 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index cf6dbbc..606a177 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -28,6 +28,7 @@ sub fatal (@) { print STDERR "@_\n"; exit 1 }
 require SVN::Core; # use()-ing this causes segfaults for me... *shrug*
 require SVN::Ra;
 require SVN::Delta;
+require SVN::Client;
 if ($SVN::Core::VERSION lt '1.1.0') {
  fatal "Need SVN::Core 1.1.0 or better (got $SVN::Core::VERSION)";
 }
@@ -3075,6 +3076,7 @@ sub new {
  my $self = SVN::Delta::Editor->new;
  bless $self, $class;
  $self->{c} = $git_svn->{last_commit} if exists $git_svn->{last_commit};
+ $self->{url} = $git_svn->{url};
  $self->{empty} = {};
  $self->{dir_prop} = {};
  $self->{file_prop} = {};
@@ -3214,30 +3216,32 @@ sub change_file_prop {
 
 sub apply_textdelta {
  my ($self, $fb, $exp) = @_;
- my $fh = IO::File->new_tmpfile;
- $fh->autoflush(1);
- # $fh gets auto-closed() by SVN::TxDelta::apply(),
- # (but $base does not,) so dup() it for reading in close_file
- open my $dup, '<&', $fh or croak $!;
+
  my $base = IO::File->new_tmpfile;
  $base->autoflush(1);
  if ($fb->{blob}) {
  print $base 'link ' if ($fb->{mode_a} == 120000);
  my $size = $::_repository->cat_blob($fb->{blob}, $base);
  die "Failed to read object $fb->{blob}" if ($size < 0);
-
- if (defined $exp) {
- seek $base, 0, 0 or croak $!;
- my $got = ::md5sum($base);
- die "Checksum mismatch: $fb->{path} $fb->{blob}\n",
-    "expected: $exp\n",
-    "     got: $got\n" if ($got ne $exp);
- }
  }
  seek $base, 0, 0 or croak $!;
- $fb->{fh} = $dup;
+
+ my $fh = IO::File->new_tmpfile;
+ $fh->autoflush(1);
+
+ $fb->{fh} = $fh;
  $fb->{base} = $base;
- [ SVN::TxDelta::apply($base, $fh, undef, $fb->{path}, $fb->{pool}) ];
+
+ my $url = $self->{url};
+ $url =~ s/\/$//;
+ $url .= '/';
+ $url .= $fb->{path};
+
+ my $rev = $self->{file_prop}->{$fb->{path}}->{'svn:entry:committed-rev'};
+ die ("Can't find $fb->{path} revision") unless defined $rev;
+
+ my $ctx = SVN::Client->new();
+ $ctx->cat($fh, $url, $rev);
 }
 
 sub close_file {
--
1.5.6.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: git-svn now work with crlf convertion enabled.

Alexander Litvinov
> Make git-svn works with crlf (or any other) file content convertion
> enabled.
>
> When we modify file content SVN cant apply its delta to it. To fix this
> situation I take full file content from SVN as next revision. This is
> dump and slow but it works.

Sorry for the noise.

git-svn fetch files with this patch but I have found that git-svn use
git-hash-object and provide file name to store into stdin. As far as file is
a temp file git-hash-object can't correctly apply crlf convertion for the
file.

As a conclusion: git-svn does not apply crlf convertion on files being stored
into git repo. This make my patch useless.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Dmitry Potapov
On Thu, Jul 31, 2008 at 12:57:48PM +0700, Alexander Litvinov wrote:
>
> git-svn fetch files with this patch but I have found that git-svn use
> git-hash-object and provide file name to store into stdin. As far as file is
> a temp file git-hash-object can't correctly apply crlf convertion for the
> file.

It does not look to be true. I did the following test:

mkdir hash_test
cd hash_test
git init

cat <<\=== > hash_test.pl
#!/usr/bin/env perl

use File::Temp qw/tempfile/;

my ($tmp_fh, $tmp_filename) = File::Temp::tempfile(UNLINK => 1);
print $tmp_fh "Hi\r\n";
$tmp_fh->flush;
system ("echo $tmp_filename | git hash-object --stdin-paths");
===

git config core.autocrlf true
perl hash_test.pl
git config core.autocrlf false
perl hash_test.pl


and the output was
b14df6442ea5a1b382985a6549b85d435376c351
ea6b6afbc2cbed0eb8c0f7561286ab72f349416c

which means that the autocrlf conversion is done for temporary
files created by perl. (I tested it on Linux and Windows/Cygwin).

In any case, I believe the right solution should be adding a
new option to git-hash-object to disable any conversion.

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

[RFC] hash-object --no-filters

Dmitry Potapov
Hi All,

I am tryint to add the --no-filters option. It is useful for git-svn
and other importers that want to add file as-is without being affected
by any filter (in particular, autocrlf). Though, the patch below works,
I am not happy with the hackish way of passing no-filter requirement
to the index_fd() function. So, I wonder what would be preferable:
- to change 'write_object' to be flags (bit 0: write_object,
  bit 1: no-filters )
- to add some global the no_filters flag to environment.c, which can
  be checked inside of convert_to_git(), so it may be used in the
  future in some other cases (though I don't see where else it can
  be useful).

Another question: currently git hash-object --input imply no filters.
I don't know if it was done intentionally (it can be argued in both
ways). I don't think it is reasonable now to change this behavior,
so I want to add just one line to documentation, so there will be
no surprise among users.

Dmitry

-- 8< --
From: Dmitry Potapov <[hidden email]>
Date: Thu, 31 Jul 2008 21:10:26 +0400
Subject: [PATCH] hash-object --no-filters

The --no-filters option makes git hash-object to work as there were no
input filters. This option is useful for importers such as git-svn to
put new version of files as is even if autocrlf is set.
---
 Documentation/git-hash-object.txt |    6 ++++++
 hash-object.c                     |    7 ++++++-
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-hash-object.txt b/Documentation/git-hash-object.txt
index ac928e1..69a17c7 100644
--- a/Documentation/git-hash-object.txt
+++ b/Documentation/git-hash-object.txt
@@ -35,6 +35,12 @@ OPTIONS
 --stdin-paths::
  Read file names from stdin instead of from the command-line.
 
+--no-filters::
+ If this option is given then the file is hashed as is ignoring
+ all filters specified in the configuration, including crlf
+ conversion. If the file is read from standard input then no
+ filters is always implied.
+
 Author
 ------
 Written by Junio C Hamano <[hidden email]>
diff --git a/hash-object.c b/hash-object.c
index 46c06a9..1e7fe8a 100644
--- a/hash-object.c
+++ b/hash-object.c
@@ -8,6 +8,8 @@
 #include "blob.h"
 #include "quote.h"
 
+static unsigned no_filters;
+
 static void hash_object(const char *path, enum object_type type, int write_object)
 {
  int fd;
@@ -16,7 +18,8 @@ static void hash_object(const char *path, enum object_type type, int write_objec
  fd = open(path, O_RDONLY);
  if (fd < 0 ||
     fstat(fd, &st) < 0 ||
-    index_fd(sha1, fd, &st, write_object, type, path))
+    ((no_filters ? st.st_mode &= ~S_IFREG : 0),
+     index_fd(sha1, fd, &st, write_object, type, path)))
  die(write_object
     ? "Unable to add %s to database"
     : "Unable to hash %s", path);
@@ -104,6 +107,8 @@ int main(int argc, char **argv)
  die("Multiple --stdin arguments are not supported");
  hashstdin = 1;
  }
+ else if (!strcmp(argv[i], "--no-filters"))
+ no_filters = 1;
  else
  usage(hash_object_usage);
  }
--
1.6.0.rc1.32.gc84cb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Alexander Litvinov
In reply to this post by Dmitry Potapov
> It does not look to be true. I did the following test:
...
> which means that the autocrlf conversion is done for temporary
> files created by perl. (I tested it on Linux and Windows/Cygwin).
>
> In any case, I believe the right solution should be adding a
> new option to git-hash-object to disable any conversion.

My bad, I did not append full thoughts. git-hash-object DOES autocrlf
convertion but  it cant do it correctly. All it can do - is to autodetect
text files. My setup has .git/info/attributes file where all files but .cpp
and .h are binary. While .cpp and .h are text files. In this case
git-hash-object do not know the real file name as far as git-svn use
temporary files.

I dont think that disabling convertion is a good way. I really want to convert
my files. Possible solution is to pass two file names to git-hash-object: the
real file with content and the proposed file name in the working directory.
In this case git-hash-object will be able to make correct convertion.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Junio C Hamano
Alexander Litvinov <[hidden email]> writes:

> I dont think that disabling convertion is a good way. I really want to convert
> my files. Possible solution is to pass two file names to git-hash-object: the
> real file with content and the proposed file name in the working directory.
> In this case git-hash-object will be able to make correct convertion.

I think the optional parameter to say "pretend the content is from this
path" makes sense even for (and especially for) hashing --stdin.




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Dmitry Potapov
On Fri, Aug 1, 2008 at 9:09 AM, Junio C Hamano <[hidden email]> wrote:
> Alexander Litvinov <[hidden email]> writes:
>
>> I dont think that disabling convertion is a good way. I really want to convert
>> my files. Possible solution is to pass two file names to git-hash-object: the
>> real file with content and the proposed file name in the working directory.
>> In this case git-hash-object will be able to make correct convertion.
>
> I think the optional parameter to say "pretend the content is from this
> path" makes sense even for (and especially for) hashing --stdin.

git-svn uses git hash-object --stdin-paths, which means that it reads
filenames from the  standard input, so one optional parameter cannot
help here. Also, I am not sure how it can be useful for --stdin, which
does not convert anything (it uses index_pipe, which does not call
convert_to_git).

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Dmitry Potapov
In reply to this post by Alexander Litvinov
On Fri, Aug 1, 2008 at 7:23 AM, Alexander Litvinov
<[hidden email]> wrote:
>
> I dont think that disabling convertion is a good way. I really want to convert
> my files.

To being able to synchronize efficiently in both ways, you need to store
files exactly as they were received from SVN then there will be no
problem with applying binary delta patch. All CRLF conversion should be
done on checkout and checkin from/to Git repository.

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Junio C Hamano
"Dmitry Potapov" <[hidden email]> writes:

> On Fri, Aug 1, 2008 at 7:23 AM, Alexander Litvinov
> <[hidden email]> wrote:
>>
>> I dont think that disabling convertion is a good way. I really want to convert
>> my files.
>
> To being able to synchronize efficiently in both ways, you need to store
> files exactly as they were received from SVN then there will be no
> problem with applying binary delta patch. All CRLF conversion should be
> done on checkout and checkin from/to Git repository.

Ahh,... if that is the philosophy, perhaps we can teach --stdin-paths to
optionally open the file itself and use index_pipe() like --stdin codepath
does?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Dmitry Potapov
On Fri, Aug 1, 2008 at 12:08 PM, Junio C Hamano <[hidden email]> wrote:

> "Dmitry Potapov" <[hidden email]> writes:
>>
>> To being able to synchronize efficiently in both ways, you need to store
>> files exactly as they were received from SVN then there will be no
>> problem with applying binary delta patch. All CRLF conversion should be
>> done on checkout and checkin from/to Git repository.
>
> Ahh,... if that is the philosophy, perhaps we can teach --stdin-paths to
> optionally open the file itself and use index_pipe() like --stdin codepath
> does?

It is possible to do in this way, but it less efficient, because it uses
index_pipe, which does not know the actual size, so it reallocates the buffer
as it reads data from the descriptor, while index_fd uses xmap() instead.
So I sent another solution yesterday:
http://article.gmane.org/gmane.comp.version-control.git/90968

It is a bit hackish because I unset S_IFREG bit in st_mode to disable
conversion. In fact, my question what would be a better way to tell index_fd
to not do any conversion. If you think that it is better to use index_pipe,
which does not any conversion than I will redo my patch to use it instead.

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Alexander Litvinov
In reply to this post by Junio C Hamano
> To being able to synchronize efficiently in both ways, you need to store
> files exactly as they were received from SVN then there will be no
> problem with applying binary delta patch. All CRLF conversion should be
> done on checkout and checkin from/to Git repository.

Sorry I have lost the mind flow here.

1. We 'fetch' files from svn as is. Yes, we know that svn use delta to rebuild
original file.
2. We commit file to git. Right here we use git-hash-object. As I understand
we _have_ to do convertion CRLF->LF here.
3. In some days we will checkout file from git and wil do LF->CRLF convertion.

I thought this is a right workflow.
- We could store original file too at step 2 somwhow to be able to use delta
at step 1.
- We can't skip convertion at step 2. Overwise git will store files with CRLF.

Am I wrong ?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Alexander Litvinov
In reply to this post by Dmitry Potapov
> git-svn uses git hash-object --stdin-paths, which means that it reads
> filenames from the  standard input, so one optional parameter cannot
> help here.

We could add some parameter ti git-hash-object to tell that we will pass two
lines per each file: real file name and proposed file name in workdir.

In this case git-hash-object will be able to do proper convertion.

The main proble is the tracking original file from svn. Propably we could use
some special dir in worktree to store original file. Or we could make special
branch to track that files and second one to store converted files.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Dmitry Potapov
In reply to this post by Alexander Litvinov
On Fri, Aug 1, 2008 at 3:11 PM, Alexander Litvinov
<[hidden email]> wrote:

>> To being able to synchronize efficiently in both ways, you need to store
>> files exactly as they were received from SVN then there will be no
>> problem with applying binary delta patch. All CRLF conversion should be
>> done on checkout and checkin from/to Git repository.
>
> Sorry I have lost the mind flow here.
>
> 1. We 'fetch' files from svn as is. Yes, we know that svn use delta to rebuild
> original file.
> 2. We commit file to git. Right here we use git-hash-object. As I understand
> we _have_ to do convertion CRLF->LF here.

No, you should do any conversion here. There are two reasons for that:
1. If you do then you will not be able to apply binary patches later.
2. You do not really need it if the SVN repository has correct eol settings,
because all files that have svn:eol-style set to either 'native' or 'LF'
will have LF. Those that do not have svn:eol-style or have it to another
value should not be subject to CRLF conversion at all.

So, I believe all files received from SVN should be stored as is. Import is
not about creating new commits, it is about getting history from another
repository as it is.

> 3. In some days we will checkout file from git and wil do LF->CRLF convertion.

It is done only for files that do not have CRLF already.

>
> I thought this is a right workflow.
> - We could store original file too at step 2 somwhow to be able to use delta
> at step 1.
> - We can't skip convertion at step 2. Overwise git will store files with CRLF.

It is okay for Git to store CRLF, because you want to treat them as
binary files.  If you want them being treated as text, you should change
svn:eol-style to 'native' for those files in SVN and then new versions
of these files will have the right ending. It is how SVN client works.

The only problem is how to synchronize the SVN view which files are binary
and which are text and what Git thinks about them.

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Junio C Hamano
In reply to this post by Dmitry Potapov
"Dmitry Potapov" <[hidden email]> writes:

> On Fri, Aug 1, 2008 at 12:08 PM, Junio C Hamano <[hidden email]> wrote:
>> "Dmitry Potapov" <[hidden email]> writes:
>>>
>>> To being able to synchronize efficiently in both ways, you need to store
>>> files exactly as they were received from SVN then there will be no
>>> problem with applying binary delta patch. All CRLF conversion should be
>>> done on checkout and checkin from/to Git repository.
>>
>> Ahh,... if that is the philosophy, perhaps we can teach --stdin-paths to
>> optionally open the file itself and use index_pipe() like --stdin codepath
>> does?
>
> It is possible to do in this way, but it less efficient, because it uses
> index_pipe, which does not know the actual size, so it reallocates the buffer
> as it reads data from the descriptor, while index_fd uses xmap() instead.
> So I sent another solution yesterday:
> http://article.gmane.org/gmane.comp.version-control.git/90968
>
> It is a bit hackish because...

Ok, earlier I was confused who was proposing what for what purpose, but
that one was not just "a bit hackish" but an unacceptable hack ;-)

Perhaps you would want to do the s/write_object/flags/ conversion, like
this?

--

 cache.h     |    9 ++++++---
 sha1_file.c |   15 +++++++++------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/cache.h b/cache.h
index 2475de9..39975fb 100644
--- a/cache.h
+++ b/cache.h
@@ -390,9 +390,12 @@ extern int ie_match_stat(const struct index_state *, struct cache_entry *, struc
 extern int ie_modified(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
 
 extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
-extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, enum object_type type, const char *path);
-extern int index_pipe(unsigned char *sha1, int fd, const char *type, int write_object);
-extern int index_path(unsigned char *sha1, const char *path, struct stat *st, int write_object);
+
+#define HASH_OBJECT_DO_CREATE 01
+#define HASH_OBJECT_LITERALLY 02
+extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int flags, enum object_type type, const char *path);
+extern int index_pipe(unsigned char *sha1, int fd, const char *type, int flags);
+extern int index_path(unsigned char *sha1, const char *path, struct stat *st, int flags);
 extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st);
 
 #define REFRESH_REALLY 0x0001 /* ignore_valid */
diff --git a/sha1_file.c b/sha1_file.c
index e281c14..5def648 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2353,10 +2353,11 @@ int has_sha1_file(const unsigned char *sha1)
  return has_loose_object(sha1);
 }
 
-int index_pipe(unsigned char *sha1, int fd, const char *type, int write_object)
+int index_pipe(unsigned char *sha1, int fd, const char *type, int flags)
 {
  struct strbuf buf;
  int ret;
+ int write_object = flags & HASH_OBJECT_DO_CREATE;
 
  strbuf_init(&buf, 0);
  if (strbuf_read(&buf, fd, 4096) < 0) {
@@ -2375,9 +2376,11 @@ int index_pipe(unsigned char *sha1, int fd, const char *type, int write_object)
  return ret;
 }
 
-int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
+int index_fd(unsigned char *sha1, int fd, struct stat *st, int flags,
      enum object_type type, const char *path)
 {
+ int write_object = flags & HASH_OBJECT_DO_CREATE;
+ int hash_literally = flags & HASH_OBJECT_LITERALLY;
  size_t size = xsize_t(st->st_size);
  void *buf = NULL;
  int ret, re_allocated = 0;
@@ -2392,7 +2395,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
  /*
  * Convert blobs to git internal format
  */
- if ((type == OBJ_BLOB) && S_ISREG(st->st_mode)) {
+ if (!hash_literally && (type == OBJ_BLOB) && S_ISREG(st->st_mode)) {
  struct strbuf nbuf;
  strbuf_init(&nbuf, 0);
  if (convert_to_git(path, buf, size, &nbuf,
@@ -2416,7 +2419,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
  return ret;
 }
 
-int index_path(unsigned char *sha1, const char *path, struct stat *st, int write_object)
+int index_path(unsigned char *sha1, const char *path, struct stat *st, int flags)
 {
  int fd;
  char *target;
@@ -2428,7 +2431,7 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, int write
  if (fd < 0)
  return error("open(\"%s\"): %s", path,
      strerror(errno));
- if (index_fd(sha1, fd, st, write_object, OBJ_BLOB, path) < 0)
+ if (index_fd(sha1, fd, st, flags, OBJ_BLOB, path) < 0)
  return error("%s: failed to insert into database",
      path);
  break;
@@ -2441,7 +2444,7 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, int write
  return error("readlink(\"%s\"): %s", path,
              errstr);
  }
- if (!write_object)
+ if (!(flags & HASH_OBJECT_DO_CREATE))
  hash_sha1_file(target, len, blob_type, sha1);
  else if (write_sha1_file(target, len, blob_type, sha1))
  return error("%s: failed to insert into database",



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Dmitry Potapov
On Fri, Aug 01, 2008 at 12:42:44PM -0700, Junio C Hamano wrote:
>
> Ok, earlier I was confused who was proposing what for what purpose, but
> that one was not just "a bit hackish" but an unacceptable hack ;-)

Thanks for correct my wording ;-)

>
> Perhaps you would want to do the s/write_object/flags/ conversion, like
> this?

Yes, it was my prefered choice to change these index_xx functions.

I have applied your patch and then corrected mine to use flags.
See below.

I wonder if something should be done about other places where index_xx
functions are called. I have looked at them and all they use either 0 or
1 (boolean expression which will be evaluated to 0 or 1), so they should
work as is, but I can correct them to use HASH_OBJECT_DO_CREATE instead
of 1 if it helps with readability.

-- 8< --

From: Dmitry Potapov <[hidden email]>
Date: Thu, 31 Jul 2008 21:10:26 +0400
Subject: [PATCH] hash-object --no-filters

The --no-filters option makes git hash-object to work as there were no
input filters. This option is useful for importers such as git-svn to
put new version of files as is even if autocrlf is set.

Signed-off-by: Dmitry Potapov <[hidden email]>
---
 Documentation/git-hash-object.txt |    6 ++++++
 hash-object.c                     |   28 +++++++++++++++-------------
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-hash-object.txt b/Documentation/git-hash-object.txt
index ac928e1..69a17c7 100644
--- a/Documentation/git-hash-object.txt
+++ b/Documentation/git-hash-object.txt
@@ -35,6 +35,12 @@ OPTIONS
 --stdin-paths::
  Read file names from stdin instead of from the command-line.
 
+--no-filters::
+ If this option is given then the file is hashed as is ignoring
+ all filters specified in the configuration, including crlf
+ conversion. If the file is read from standard input then no
+ filters is always implied.
+
 Author
 ------
 Written by Junio C Hamano <[hidden email]>
diff --git a/hash-object.c b/hash-object.c
index 46c06a9..2dd7283 100644
--- a/hash-object.c
+++ b/hash-object.c
@@ -8,7 +8,7 @@
 #include "blob.h"
 #include "quote.h"
 
-static void hash_object(const char *path, enum object_type type, int write_object)
+static void hash_object(const char *path, enum object_type type, int flags)
 {
  int fd;
  struct stat st;
@@ -16,23 +16,23 @@ static void hash_object(const char *path, enum object_type type, int write_objec
  fd = open(path, O_RDONLY);
  if (fd < 0 ||
     fstat(fd, &st) < 0 ||
-    index_fd(sha1, fd, &st, write_object, type, path))
- die(write_object
+    index_fd(sha1, fd, &st, flags, type, path))
+ die((flags & HASH_OBJECT_DO_CREATE)
     ? "Unable to add %s to database"
     : "Unable to hash %s", path);
  printf("%s\n", sha1_to_hex(sha1));
  maybe_flush_or_die(stdout, "hash to stdout");
 }
 
-static void hash_stdin(const char *type, int write_object)
+static void hash_stdin(const char *type, int flags)
 {
  unsigned char sha1[20];
- if (index_pipe(sha1, 0, type, write_object))
+ if (index_pipe(sha1, 0, type, flags))
  die("Unable to add stdin to database");
  printf("%s\n", sha1_to_hex(sha1));
 }
 
-static void hash_stdin_paths(const char *type, int write_objects)
+static void hash_stdin_paths(const char *type, int flags)
 {
  struct strbuf buf, nbuf;
 
@@ -45,7 +45,7 @@ static void hash_stdin_paths(const char *type, int write_objects)
  die("line is badly quoted");
  strbuf_swap(&buf, &nbuf);
  }
- hash_object(buf.buf, type_from_string(type), write_objects);
+ hash_object(buf.buf, type_from_string(type), flags);
  }
  strbuf_release(&buf);
  strbuf_release(&nbuf);
@@ -58,7 +58,7 @@ int main(int argc, char **argv)
 {
  int i;
  const char *type = blob_type;
- int write_object = 0;
+ int flags = 0;
  const char *prefix = NULL;
  int prefix_length = -1;
  int no_more_flags = 0;
@@ -80,7 +80,7 @@ int main(int argc, char **argv)
  prefix_length =
  prefix ? strlen(prefix) : 0;
  }
- write_object = 1;
+ flags |= HASH_OBJECT_DO_CREATE;
  }
  else if (!strcmp(argv[i], "--")) {
  no_more_flags = 1;
@@ -104,6 +104,8 @@ int main(int argc, char **argv)
  die("Multiple --stdin arguments are not supported");
  hashstdin = 1;
  }
+ else if (!strcmp(argv[i], "--no-filters"))
+ flags |= HASH_OBJECT_LITERALLY;
  else
  usage(hash_object_usage);
  }
@@ -116,21 +118,21 @@ int main(int argc, char **argv)
  }
 
  if (hashstdin) {
- hash_stdin(type, write_object);
+ hash_stdin(type, flags);
  hashstdin = 0;
  }
  if (0 <= prefix_length)
  arg = prefix_filename(prefix, prefix_length,
       arg);
- hash_object(arg, type_from_string(type), write_object);
+ hash_object(arg, type_from_string(type), flags);
  no_more_flags = 1;
  }
  }
 
  if (stdin_paths)
- hash_stdin_paths(type, write_object);
+ hash_stdin_paths(type, flags);
 
  if (hashstdin)
- hash_stdin(type, write_object);
+ hash_stdin(type, flags);
  return 0;
 }
--
1.6.0.rc1.33.gb756f

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Junio C Hamano
Dmitry Potapov <[hidden email]> writes:

> I have applied your patch and then corrected mine to use flags.
> See below.
>
> I wonder if something should be done about other places where index_xx
> functions are called. I have looked at them and all they use either 0 or
> 1 (boolean expression which will be evaluated to 0 or 1), so they should
> work as is, but I can correct them to use HASH_OBJECT_DO_CREATE instead
> of 1 if it helps with readability.

Even though the patch was not compile tested, I did check the existing
call sites are giving only 0 or 1, but I think converting these "please
write -- I give you 1" callers to pass the bitmask would be a sane thing
to do.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] git-svn now work with crlf convertion enabled.

Dmitry Potapov
On Fri, Aug 01, 2008 at 03:14:15PM -0700, Junio C Hamano wrote:
>
> Even though the patch was not compile tested, I did check the existing
> call sites are giving only 0 or 1, but I think converting these "please
> write -- I give you 1" callers to pass the bitmask would be a sane thing
> to do.

Here it goes. It turned out that there are only two places that actually
needs correction, while two others use '0'. I have run 'make test' and
it's passed the tests.

-- 8< --
From: Dmitry Potapov <[hidden email]>
Date: Sat, 2 Aug 2008 02:56:45 +0400
Subject: [PATCH] convert index_path callers to use bitmask instead of 1

Signed-off-by: Dmitry Potapov <[hidden email]>
---
 builtin-update-index.c |    5 +++--
 read-cache.c           |    2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin-update-index.c b/builtin-update-index.c
index 38eb53c..d3e212c 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -85,7 +85,7 @@ static int process_lstat_error(const char *path, int err)
 
 static int add_one_path(struct cache_entry *old, const char *path, int len, struct stat *st)
 {
- int option, size;
+ int option, flags, size;
  struct cache_entry *ce;
 
  /* Was the old index entry already up-to-date? */
@@ -99,7 +99,8 @@ static int add_one_path(struct cache_entry *old, const char *path, int len, stru
  fill_stat_cache_info(ce, st);
  ce->ce_mode = ce_mode_from_stat(old, st->st_mode);
 
- if (index_path(ce->sha1, path, st, !info_only))
+ flags = info_only ? 0 : HASH_OBJECT_DO_CREATE;
+ if (index_path(ce->sha1, path, st, flags))
  return -1;
  option = allow_add ? ADD_CACHE_OK_TO_ADD : 0;
  option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0;
diff --git a/read-cache.c b/read-cache.c
index 2c03ec3..afd6005 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -550,7 +550,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
  alias->ce_flags |= CE_ADDED;
  return 0;
  }
- if (index_path(ce->sha1, path, st, 1))
+ if (index_path(ce->sha1, path, st, HASH_OBJECT_DO_CREATE))
  return error("unable to index file %s", path);
  if (ignore_case && alias && different_name(ce, alias))
  ce = create_alias_ce(ce, alias);
--
1.6.0.rc1.34.gad373

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] hash-object --no-filters

Junio C Hamano
In reply to this post by Dmitry Potapov
Dmitry Potapov <[hidden email]> writes:

> The --no-filters option makes git hash-object to work as there were no
> input filters. This option is useful for importers such as git-svn to
> put new version of files as is even if autocrlf is set.

I think this is going in the right direction, but I have to wonder a few
things.

First, on hash-object.

 (1) "hash-object --stdin" always hashes literally.  We may want to be
     able to say "The contents is this but pretend it came from this path
     and apply the usual input rules", perhaps with "--path=" option;

 (2) "hash-object temporaryfile" may want to honor the same "--path"
     option;

 (3) "hash-object --stdin-paths" may want to get pair of paths (i.e. two
     lines per entry) to do the same.

If we want to do the above, the existing low-level interface needs to be
adjusted.

index_pipe() and index_fd() can learn to take an additional string
parameter for attribute lookup to implement (1) and (2) above.  Perhaps
the string can be NULL to signal --no-filter behaviour, in which case the
HASH_OBJECT_LITERALLY change may not be necessary for this codepath.

index_path() is a healper for add_to_index() which is used for normal
addition of working tree entities, and I do not see an immediate need to
teach it about this "use this different path for attribute lookup" at
least for now.

By the way, why do we have index_pipe() and index_fd() to begin with?  Is
it because users of index_pipe() do not know what the path it is hashing
and also the fd being a pipe we cannot mmap it?

If these two are the only reasons, then I wonder if we can:

 - accept NULL as path and stat parameters for callers without a filename
   (which automatically implies we are doing a regular blob and we hash
   literally); and

 - first try to mmap(), and if it fails fall back to the "read once into
   strbuf" codepath to solve mmap-vs-pipe issue.

I am not sure if such a unification of these two functions is useful,
though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] hash-object --no-filters

Dmitry Potapov
On Sat, Aug 02, 2008 at 10:28:13AM -0700, Junio C Hamano wrote:

> Dmitry Potapov <[hidden email]> writes:
>
> > The --no-filters option makes git hash-object to work as there were no
> > input filters. This option is useful for importers such as git-svn to
> > put new version of files as is even if autocrlf is set.
>
> I think this is going in the right direction, but I have to wonder a few
> things.
>
> First, on hash-object.
>
>  (1) "hash-object --stdin" always hashes literally.  We may want to be
>      able to say "The contents is this but pretend it came from this path
>      and apply the usual input rules", perhaps with "--path=" option;

It makes sense.

>
>  (2) "hash-object temporaryfile" may want to honor the same "--path"
>      option;

Agreed.

>
>  (3) "hash-object --stdin-paths" may want to get pair of paths (i.e. two
>      lines per entry) to do the same.

I cannot come up with a good name for this option.

>
> If we want to do the above, the existing low-level interface needs to be
> adjusted.
>
> index_pipe() and index_fd() can learn to take an additional string
> parameter for attribute lookup to implement (1) and (2) above.

index_fd already has the 'path' parameter, which is used as hint for
for blob conversion.

> Perhaps
> the string can be NULL to signal --no-filter behaviour, in which case the
> HASH_OBJECT_LITERALLY change may not be necessary for this codepath.

Sounds like a good idea :)

>
> By the way, why do we have index_pipe() and index_fd() to begin with?  Is
> it because users of index_pipe() do not know what the path it is hashing
> and also the fd being a pipe we cannot mmap it?

index_fd() does not need the path for anything but to choose filters.
So, if index_pipe supported filters, it would have the same parameter.

There is one more parameter that index_fd() has and index_pipe() does
not. It is 'struct stat'. So I decided to look what this parameter is
used for in index_fd(), and it turned out for two things:
- to determine the size that needs to mmap
- to check whether the file is regular and if it is not then skip
  convert_to_git().

That made me wonder whether index_fd() can be ever called for a non-
regular file? I studied the source code and with the exception to git
hash-object, which can pass anything what it can bed opened, in all
other cases, we always call it for what is know as a regular file. In
fact, it could be otherwise. It won't work for non-regular files. It
is quite obvious that git hash-object for a directory will fail, but
I wondered what would happen if I'd give it something different. For
instance, a named pipe (FIFO)

 $mkfifo fifofile
 $git hash-object
 <wait for the other process to start write to it>
 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391

i.e. the same SHA-1 as for an empty file, and here is why: index_fd()
tries to mmap the file descriptor and that obviously fails, but xmmap()
has this particular code:

        if (ret == MAP_FAILED) {
                if (!length)
                        return NULL;

apparently, it was workaround for empty files, but because st_size is 0
for pipes, index_fd treats any pipe as empty file!

>
> If these two are the only reasons, then I wonder if we can:
>
>  - accept NULL as path and stat parameters for callers without a filename
>    (which automatically implies we are doing a regular blob and we hash
>    literally); and

I like this idea.

>
>  - first try to mmap(), and if it fails fall back to the "read once into
>    strbuf" codepath to solve mmap-vs-pipe issue.

I have an alternative proposal:

Because we have stat structure given as a parameter, we can always
check whether the file is regular or not. If it is regular, we can use
mmap() and if it is not then use "read once into strbuf" approach.

> I am not sure if such a unification of these two functions is useful,
> though.

I have implemented this unification, and it reduces the code size,
makes git-hash-object to work with named pipes, and makes easier to
add the --path and --no-filters options, because there is no need
to modify the index_fd interface anymore, and there is a single place
where convert_to_git is invoked. So it looks like a good idea.

Here is the patch:

-- >8 --
From: Dmitry Potapov <[hidden email]>
Date: Sun, 3 Aug 2008 08:39:16 +0400
Subject: [PATCH] teach index_fd to work with pipes

index_fd can now work with file descriptors that are not normal files
but any readable file. If the given file descriptor is a regular file
then mmap() is used; for other files, strbuf_read is used.

The path parameter, which has been used as hint for filters, can be
NULL now to indicate that the file should be hashed literally without
any filter.

The index_pipe function is removed as redundant.

Signed-off-by: Dmitry Potapov <[hidden email]>
---
 cache.h       |    1 -
 hash-object.c |   29 +++++++++++--------------
 sha1_file.c   |   64 +++++++++++++++++++++++++++-----------------------------
 3 files changed, 44 insertions(+), 50 deletions(-)

git-hash-object before
   text    data     bss     dec     hex filename
 148751    1332   93164  243247   3b62f git-hash-object

and after patch
   text    data     bss     dec     hex filename
 148687    1332   93164  243183   3b5ef git-hash-object


diff --git a/cache.h b/cache.h
index 2475de9..68ce6e6 100644
--- a/cache.h
+++ b/cache.h
@@ -391,7 +391,6 @@ extern int ie_modified(const struct index_state *, struct cache_entry *, struct
 
 extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, enum object_type type, const char *path);
-extern int index_pipe(unsigned char *sha1, int fd, const char *type, int write_object);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, int write_object);
 extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st);
 
diff --git a/hash-object.c b/hash-object.c
index 46c06a9..ce027b9 100644
--- a/hash-object.c
+++ b/hash-object.c
@@ -8,28 +8,25 @@
 #include "blob.h"
 #include "quote.h"
 
-static void hash_object(const char *path, enum object_type type, int write_object)
+static void hash_fd(int fd, const char *type, int write_object, const char *path)
 {
- int fd;
  struct stat st;
  unsigned char sha1[20];
- fd = open(path, O_RDONLY);
- if (fd < 0 ||
-    fstat(fd, &st) < 0 ||
-    index_fd(sha1, fd, &st, write_object, type, path))
+ if (fstat(fd, &st) < 0 ||
+    index_fd(sha1, fd, &st, write_object, type_from_string(type), path))
  die(write_object
     ? "Unable to add %s to database"
     : "Unable to hash %s", path);
  printf("%s\n", sha1_to_hex(sha1));
  maybe_flush_or_die(stdout, "hash to stdout");
 }
-
-static void hash_stdin(const char *type, int write_object)
+static void hash_object(const char *path, const char *type, int write_object)
 {
- unsigned char sha1[20];
- if (index_pipe(sha1, 0, type, write_object))
- die("Unable to add stdin to database");
- printf("%s\n", sha1_to_hex(sha1));
+ int fd;
+ fd = open(path, O_RDONLY);
+ if (fd < 0)
+ die("Cannot open %s", path);
+ hash_fd(fd, type, write_object, path);
 }
 
 static void hash_stdin_paths(const char *type, int write_objects)
@@ -45,7 +42,7 @@ static void hash_stdin_paths(const char *type, int write_objects)
  die("line is badly quoted");
  strbuf_swap(&buf, &nbuf);
  }
- hash_object(buf.buf, type_from_string(type), write_objects);
+ hash_object(buf.buf, type, write_objects);
  }
  strbuf_release(&buf);
  strbuf_release(&nbuf);
@@ -116,13 +113,13 @@ int main(int argc, char **argv)
  }
 
  if (hashstdin) {
- hash_stdin(type, write_object);
+ hash_fd(0, type, write_object, NULL);
  hashstdin = 0;
  }
  if (0 <= prefix_length)
  arg = prefix_filename(prefix, prefix_length,
       arg);
- hash_object(arg, type_from_string(type), write_object);
+ hash_object(arg, type, write_object);
  no_more_flags = 1;
  }
  }
@@ -131,6 +128,6 @@ int main(int argc, char **argv)
  hash_stdin_paths(type, write_object);
 
  if (hashstdin)
- hash_stdin(type, write_object);
+ hash_fd(0, type, write_object, NULL);
  return 0;
 }
diff --git a/sha1_file.c b/sha1_file.c
index e281c14..765a7e7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2353,51 +2353,22 @@ int has_sha1_file(const unsigned char *sha1)
  return has_loose_object(sha1);
 }
 
-int index_pipe(unsigned char *sha1, int fd, const char *type, int write_object)
+static int index_mem(unsigned char *sha1, void *buf, size_t size,
+     int write_object, enum object_type type, const char *path)
 {
- struct strbuf buf;
- int ret;
-
- strbuf_init(&buf, 0);
- if (strbuf_read(&buf, fd, 4096) < 0) {
- strbuf_release(&buf);
- return -1;
- }
-
- if (!type)
- type = blob_type;
- if (write_object)
- ret = write_sha1_file(buf.buf, buf.len, type, sha1);
- else
- ret = hash_sha1_file(buf.buf, buf.len, type, sha1);
- strbuf_release(&buf);
-
- return ret;
-}
-
-int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
-     enum object_type type, const char *path)
-{
- size_t size = xsize_t(st->st_size);
- void *buf = NULL;
  int ret, re_allocated = 0;
 
- if (size)
- buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
- close(fd);
-
  if (!type)
  type = OBJ_BLOB;
 
  /*
  * Convert blobs to git internal format
  */
- if ((type == OBJ_BLOB) && S_ISREG(st->st_mode)) {
+ if ((type == OBJ_BLOB) && path) {
  struct strbuf nbuf;
  strbuf_init(&nbuf, 0);
  if (convert_to_git(path, buf, size, &nbuf,
                    write_object ? safe_crlf : 0)) {
- munmap(buf, size);
  buf = strbuf_detach(&nbuf, &size);
  re_allocated = 1;
  }
@@ -2411,8 +2382,35 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
  free(buf);
  return ret;
  }
- if (size)
+ return ret;
+}
+
+int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
+     enum object_type type, const char *path)
+{
+ size_t size = xsize_t(st->st_size);
+ int ret;
+
+ if (!S_ISREG(st->st_mode))
+ {
+ struct strbuf sbuf;
+ strbuf_init(&sbuf, 0);
+ if (strbuf_read(&sbuf, fd, 4096) >= 0)
+ ret = index_mem(sha1, sbuf.buf, sbuf.len, write_object,
+ type, path);
+ else
+ ret = -1;
+ strbuf_release(&sbuf);
+ }
+ else if (size)
+ {
+ void *buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+ ret = index_mem(sha1, buf, size, write_object, type, path);
  munmap(buf, size);
+ }
+ else
+ ret = index_mem(sha1, NULL, size, write_object, type, path);
+ close(fd);
  return ret;
 }
 
--
1.6.0.rc1.53.gaeaa.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] hash-object --no-filters

Dmitry Potapov
On Sun, Aug 03, 2008 at 09:42:18AM +0400, Dmitry Potapov wrote:
>
> Here is the patch:

I am sorry, I forgot to commit a micro cleanup to my patch:

@@ -2378,10 +2378,8 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
  ret = write_sha1_file(buf, size, typename(type), sha1);
  else
  ret = hash_sha1_file(buf, size, typename(type), sha1);
- if (re_allocated) {
+ if (re_allocated)
  free(buf);
- return ret;
- }
  return ret;
 }

So, here is the corrected version of my patch:

-- >8 --
From: Dmitry Potapov <[hidden email]>
Date: Sun, 3 Aug 2008 08:39:16 +0400
Subject: [PATCH] teach index_fd to work with pipes

index_fd can now work with file descriptors that are not normal files
but any readable file. If the given file descriptor is a regular file
then mmap() is used; for other files, strbuf_read is used.

The path parameter, which has been used as hint for filters, can be
NULL now to indicate that the file should be hashed literally without
any filter.

The index_pipe function is removed as redundant.

Signed-off-by: Dmitry Potapov <[hidden email]>
---
 cache.h       |    1 -
 hash-object.c |   29 +++++++++++-------------
 sha1_file.c   |   66 ++++++++++++++++++++++++++------------------------------
 3 files changed, 44 insertions(+), 52 deletions(-)

diff --git a/cache.h b/cache.h
index 2475de9..68ce6e6 100644
--- a/cache.h
+++ b/cache.h
@@ -391,7 +391,6 @@ extern int ie_modified(const struct index_state *, struct cache_entry *, struct
 
 extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, enum object_type type, const char *path);
-extern int index_pipe(unsigned char *sha1, int fd, const char *type, int write_object);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, int write_object);
 extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st);
 
diff --git a/hash-object.c b/hash-object.c
index 46c06a9..ce027b9 100644
--- a/hash-object.c
+++ b/hash-object.c
@@ -8,28 +8,25 @@
 #include "blob.h"
 #include "quote.h"
 
-static void hash_object(const char *path, enum object_type type, int write_object)
+static void hash_fd(int fd, const char *type, int write_object, const char *path)
 {
- int fd;
  struct stat st;
  unsigned char sha1[20];
- fd = open(path, O_RDONLY);
- if (fd < 0 ||
-    fstat(fd, &st) < 0 ||
-    index_fd(sha1, fd, &st, write_object, type, path))
+ if (fstat(fd, &st) < 0 ||
+    index_fd(sha1, fd, &st, write_object, type_from_string(type), path))
  die(write_object
     ? "Unable to add %s to database"
     : "Unable to hash %s", path);
  printf("%s\n", sha1_to_hex(sha1));
  maybe_flush_or_die(stdout, "hash to stdout");
 }
-
-static void hash_stdin(const char *type, int write_object)
+static void hash_object(const char *path, const char *type, int write_object)
 {
- unsigned char sha1[20];
- if (index_pipe(sha1, 0, type, write_object))
- die("Unable to add stdin to database");
- printf("%s\n", sha1_to_hex(sha1));
+ int fd;
+ fd = open(path, O_RDONLY);
+ if (fd < 0)
+ die("Cannot open %s", path);
+ hash_fd(fd, type, write_object, path);
 }
 
 static void hash_stdin_paths(const char *type, int write_objects)
@@ -45,7 +42,7 @@ static void hash_stdin_paths(const char *type, int write_objects)
  die("line is badly quoted");
  strbuf_swap(&buf, &nbuf);
  }
- hash_object(buf.buf, type_from_string(type), write_objects);
+ hash_object(buf.buf, type, write_objects);
  }
  strbuf_release(&buf);
  strbuf_release(&nbuf);
@@ -116,13 +113,13 @@ int main(int argc, char **argv)
  }
 
  if (hashstdin) {
- hash_stdin(type, write_object);
+ hash_fd(0, type, write_object, NULL);
  hashstdin = 0;
  }
  if (0 <= prefix_length)
  arg = prefix_filename(prefix, prefix_length,
       arg);
- hash_object(arg, type_from_string(type), write_object);
+ hash_object(arg, type, write_object);
  no_more_flags = 1;
  }
  }
@@ -131,6 +128,6 @@ int main(int argc, char **argv)
  hash_stdin_paths(type, write_object);
 
  if (hashstdin)
- hash_stdin(type, write_object);
+ hash_fd(0, type, write_object, NULL);
  return 0;
 }
diff --git a/sha1_file.c b/sha1_file.c
index e281c14..fe863f5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2353,51 +2353,22 @@ int has_sha1_file(const unsigned char *sha1)
  return has_loose_object(sha1);
 }
 
-int index_pipe(unsigned char *sha1, int fd, const char *type, int write_object)
+static int index_mem(unsigned char *sha1, void *buf, size_t size,
+     int write_object, enum object_type type, const char *path)
 {
- struct strbuf buf;
- int ret;
-
- strbuf_init(&buf, 0);
- if (strbuf_read(&buf, fd, 4096) < 0) {
- strbuf_release(&buf);
- return -1;
- }
-
- if (!type)
- type = blob_type;
- if (write_object)
- ret = write_sha1_file(buf.buf, buf.len, type, sha1);
- else
- ret = hash_sha1_file(buf.buf, buf.len, type, sha1);
- strbuf_release(&buf);
-
- return ret;
-}
-
-int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
-     enum object_type type, const char *path)
-{
- size_t size = xsize_t(st->st_size);
- void *buf = NULL;
  int ret, re_allocated = 0;
 
- if (size)
- buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
- close(fd);
-
  if (!type)
  type = OBJ_BLOB;
 
  /*
  * Convert blobs to git internal format
  */
- if ((type == OBJ_BLOB) && S_ISREG(st->st_mode)) {
+ if ((type == OBJ_BLOB) && path) {
  struct strbuf nbuf;
  strbuf_init(&nbuf, 0);
  if (convert_to_git(path, buf, size, &nbuf,
                    write_object ? safe_crlf : 0)) {
- munmap(buf, size);
  buf = strbuf_detach(&nbuf, &size);
  re_allocated = 1;
  }
@@ -2407,12 +2378,37 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
  ret = write_sha1_file(buf, size, typename(type), sha1);
  else
  ret = hash_sha1_file(buf, size, typename(type), sha1);
- if (re_allocated) {
+ if (re_allocated)
  free(buf);
- return ret;
+ return ret;
+}
+
+int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
+     enum object_type type, const char *path)
+{
+ size_t size = xsize_t(st->st_size);
+ int ret;
+
+ if (!S_ISREG(st->st_mode))
+ {
+ struct strbuf sbuf;
+ strbuf_init(&sbuf, 0);
+ if (strbuf_read(&sbuf, fd, 4096) >= 0)
+ ret = index_mem(sha1, sbuf.buf, sbuf.len, write_object,
+ type, path);
+ else
+ ret = -1;
+ strbuf_release(&sbuf);
  }
- if (size)
+ else if (size)
+ {
+ void *buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+ ret = index_mem(sha1, buf, size, write_object, type, path);
  munmap(buf, size);
+ }
+ else
+ ret = index_mem(sha1, NULL, size, write_object, type, path);
+ close(fd);
  return ret;
 }
 
--
1.6.0.rc1.53.gf8e95

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
12