Quantcast

Transparently encrypt repository contents with GPG

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Transparently encrypt repository contents with GPG

Matthias Nothhaft
Hi,

I'm new to Git but I really already love it. ;-)

I would like to have repository that transparently encrypts and
decrypts all files using GPG.

What I need is a way to automatically modify each file

a) before it is written in the repository
b) after it is read from the repository

Is there a way to get this work somehow? Can someone give me some
hints where I need to begin?

regards,
Matthias
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Sverre Rabbelier-2
Heya,

On Thu, Mar 12, 2009 at 22:19, Matthias Nothhaft
<[hidden email]> > What I need is a way to
automatically modify each file
>
> a) before it is written in the repository
> b) after it is read from the repository

Have a look at smudging, you might not need to touch the git source
code at all ;).

--
Cheers,

Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

MichaelJGruber
Sverre Rabbelier venit, vidit, dixit 12.03.2009 22:34:

> Heya,
>
> On Thu, Mar 12, 2009 at 22:19, Matthias Nothhaft
> <[hidden email]> > What I need is a way to
> automatically modify each file
>>
>> a) before it is written in the repository
>> b) after it is read from the repository
>
> Have a look at smudging, you might not need to touch the git source
> code at all ;).
>

And people asked me not to be cryptic... even though the OP explicitely
asked for encryption, of course ;)

"git help attributes" may help: look for filter and set attributes and
config (filter.$name.{clean,smudge}) accordingly. smudge should probably
decrypt, clean should encrypt.

BTW: Why not use an encrypted file system? That way your work tree would
be encrypted also.

Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Sverre Rabbelier-2
Heya,

On Fri, Mar 13, 2009 at 11:46, Michael J Gruber
<[hidden email]> wrote:
> And people asked me not to be cryptic... even though the OP explicitely
> asked for encryption, of course ;)

I wasn't being cryptic, I just don't remember the details of smudge,
just that it exists, and that it allows you to perform operations on a
file on checkout and on add.

--
Cheers,

Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Thomas Rast
In reply to this post by MichaelJGruber
Michael J Gruber wrote:
> "git help attributes" may help: look for filter and set attributes and
> config (filter.$name.{clean,smudge}) accordingly. smudge should probably
> decrypt, clean should encrypt.

Wouldn't this trip over the randomness included in all encryption [to
avoid generating the same cyphertext for two separate identical
messages, which gives away some information], which would let git
think the file has been changed as soon as its stat info has changed
(or is just racy)?

Not to mention that this makes most source-oriented features such as
diff, blame, merge, etc., rather useless.

--
Thomas Rast
trast@{inf,student}.ethz.ch

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Sverre Rabbelier-2
Heya,

On Fri, Mar 13, 2009 at 12:15, Thomas Rast <[hidden email]> wrote:
> Not to mention that this makes most source-oriented features such as
> diff, blame, merge, etc., rather useless.

I would assume that smudge takes care of this somehow, it'd seem like
a rather useless feature otherwise :).

--
Cheers,

Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Michael J Gruber
Sverre Rabbelier venit, vidit, dixit 13.03.2009 12:17:
> Heya,
>
> On Fri, Mar 13, 2009 at 12:15, Thomas Rast <[hidden email]>
> wrote:
>> Not to mention that this makes most source-oriented features such
>> as diff, blame, merge, etc., rather useless.
>
> I would assume that smudge takes care of this somehow, it'd seem
> like a rather useless feature otherwise :).

Sverre was being prophetic with the somehow. Here's a working setup
(though I still don't know why not to use luks):

In .gitattributes (or.git/info/a..) use

* filter=gpg diff=gpg

In your config:

[filter "gpg"]
        smudge = gpg -d -q --batch --no-tty
        clean = gpg -ea -q --batch --no-tty -r C920A124
[diff "gpg"]
        textconv = decrypt

This gives you textual diffs even in log! You want use gpg-agent here.

Now for Sverre's prophecy and the helper I haven't shown you yet: It
turns out that blobs are not smudged before they are fed to textconv!
[Also, it seems that the textconv config does allow parameters, bit I
haven't checked thoroughly.]

This means that e.g. when diffing work tree with HEAD textconv is called
twice: once is with a smudged file (from the work tree) and once with a
cleaned file (from HEAD). That's why I needed a small helper script
"decrypt" which does nothing but

#!/bin/sh
gpg -d -q --batch --no-tty "$1" || cat $1

Yeah, this assumes gpg errors out because it's fed something unencrypted
(and not encrypted with the wrong key) etc. It's only proof of concept
quality.

Me thinks it's not right that diff is failing to call smudge here, isn't it?

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Sverre Rabbelier-2
Heya,

On Fri, Mar 13, 2009 at 14:56, Michael J Gruber
<[hidden email]> wrote:
> Sverre was being prophetic with the somehow. Here's a working setup
> (though I still don't know why not to use luks):

Glad to hear I was right ;). Also awesome that you looked into this
and shared your findings, thanks!

--
Cheers,

Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Jeff King
In reply to this post by Michael J Gruber
On Fri, Mar 13, 2009 at 02:56:22PM +0100, Michael J Gruber wrote:

> Sverre was being prophetic with the somehow. Here's a working setup
> (though I still don't know why not to use luks):
>
> In .gitattributes (or.git/info/a..) use
>
> * filter=gpg diff=gpg
>
> In your config:
>
> [filter "gpg"]
>         smudge = gpg -d -q --batch --no-tty
>         clean = gpg -ea -q --batch --no-tty -r C920A124
> [diff "gpg"]
>         textconv = decrypt
>
> This gives you textual diffs even in log! You want use gpg-agent here.

This is not going to work very well in general.  Smudging and cleaning
is about putting the canonical version of a file in the git repo, and
munging it for the working tree. Trying to go backwards is going to lead
to problems, including:

  1. Git sometimes wants to look at content of special files inside
     trees, like .gitignore. Now it can't.

  2. Git uses timestamps and inodes to decide whether files need to be
     looked at all to determine if they are different. So when you do
     a checkout and "git diff", everything will look OK. But when it
     does actually look at file contents, it compares canonical
     versions. And your canonical versions are going to be _different_
     everytime you encrypt, even if the content is the same:

       echo content >file
       git add file
       git diff ;# no output
       touch file
       git diff ;# looks like file is totally rewritten

     So you will probably end up with extra cruft in your commits if you
     ever touch files.

> Now for Sverre's prophecy and the helper I haven't shown you yet: It
> turns out that blobs are not smudged before they are fed to textconv!
> [Also, it seems that the textconv config does allow parameters, bit I
> haven't checked thoroughly.]

I don't think they should be smudged. Smudging is about converting for
the working tree, and the diff is operating on canonical formats. If
anything, I think the error is that we feed smudged data from the
working tree to textconv; we should always be handing it clean data (and
this goes for external diff, too, which I suspect behaves the same way).

I haven't looked, but it probably is a result of the optimization to
reuse worktree files.

-Peff

PS If it isn't obvious, I don't think this smudge/filter technique is
the right way to go about this. But one final comment if you did want to
pursue this: you are using asymmetric encryption in your GPG invocation,
which is going to be a lot slower and the result will take up more
space. Try using a symmetric cipher.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Junio C Hamano
In reply to this post by Michael J Gruber
Michael J Gruber <[hidden email]> writes:

> In .gitattributes (or.git/info/a..) use
>
> * filter=gpg diff=gpg
>
> In your config:
>
> [filter "gpg"]
>         smudge = gpg -d -q --batch --no-tty
>         clean = gpg -ea -q --batch --no-tty -r C920A124
> [diff "gpg"]
>         textconv = decrypt
>
> This gives you textual diffs even in log! You want use gpg-agent here.

Don't do this.

Think why the smudge/clean pair exists.

The version controlled data, the contents, may not be suitable for
consumption in the work tree in its verbatim form.  For example, a cross
platform project would want to consistently use LF line termination inside
a repository, but on a platform whose tools expect CRLF line endings, the
contents cannot be used verbatim.  We "smudge" the contents running
unix2dos when checking things out on such platforms, and "clean" the
platform specific CRLF line endings by running dos2unix when checking
things in.  By doing so, you can see what really got changed between
versions without getting distracted, and more importantly, "you" in this
sentence is not limited to the human end users alone.

git internally runs diff and xdelta to see what was changed, so that:

 * it can reduce storage requirement when it runs pack-objects;

 * it can check what path in the preimage was similar to what other path
   in the postimage, to deduce a rename;

 * it can check what blocks of lines in the postimage came from what other
   blocks of lines in the preimage, to pass blames across file boundaries.

If your "clean" encrypts and "smudge" decrypts, it means you are refusing
all the benifit git offers.  You are making a pair of similar "smudged"
contents totally dissimilar in their "clean" counterparts.  That is simply
backwards.

As the sole raison d'etre of diff.textconv is to allow potentially lossy
conversion (e.g. msword-to-text) applied to the preimage and postimage
pair of contents (that are supposed to be "clean") before giving a textual
diff to human consumption, the above config may appear to work, but if you
really want an encrypted repository, you should be using an encrypting
filesystem.  That would give an added benefit that the work tree
associated with your repository would also be encrypted.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

MichaelJGruber
Junio C Hamano venit, vidit, dixit 13.03.2009 21:23:

> Michael J Gruber <[hidden email]> writes:
>
>> In .gitattributes (or.git/info/a..) use
>>
>> * filter=gpg diff=gpg
>>
>> In your config:
>>
>> [filter "gpg"]
>>         smudge = gpg -d -q --batch --no-tty
>>         clean = gpg -ea -q --batch --no-tty -r C920A124
>> [diff "gpg"]
>>         textconv = decrypt
>>
>> This gives you textual diffs even in log! You want use gpg-agent here.
>
> Don't do this.
>
> Think why the smudge/clean pair exists.
>
> The version controlled data, the contents, may not be suitable for
> consumption in the work tree in its verbatim form.  For example, a cross
> platform project would want to consistently use LF line termination inside
> a repository, but on a platform whose tools expect CRLF line endings, the
> contents cannot be used verbatim.  We "smudge" the contents running
> unix2dos when checking things out on such platforms, and "clean" the
> platform specific CRLF line endings by running dos2unix when checking
> things in.  By doing so, you can see what really got changed between
> versions without getting distracted, and more importantly, "you" in this
> sentence is not limited to the human end users alone.
>
> git internally runs diff and xdelta to see what was changed, so that:
>
>  * it can reduce storage requirement when it runs pack-objects;
>
>  * it can check what path in the preimage was similar to what other path
>    in the postimage, to deduce a rename;
>
>  * it can check what blocks of lines in the postimage came from what other
>    blocks of lines in the preimage, to pass blames across file boundaries.
>
> If your "clean" encrypts and "smudge" decrypts, it means you are refusing
> all the benifit git offers.  You are making a pair of similar "smudged"
> contents totally dissimilar in their "clean" counterparts.  That is simply
> backwards.
>
> As the sole raison d'etre of diff.textconv is to allow potentially lossy
> conversion (e.g. msword-to-text) applied to the preimage and postimage
> pair of contents (that are supposed to be "clean") before giving a textual
> diff to human consumption, the above config may appear to work, but if you
> really want an encrypted repository, you should be using an encrypting
> filesystem.  That would give an added benefit that the work tree
> associated with your repository would also be encrypted.

Exactly. This is why I suggested using cryptfs/luks in my first response
already.

But I don't know the OP's requirements, which is why I also told him how
to do what he wanted, even though it has the drawbacks you and Jeff (and
maybe I) mentioned. Maybe it's an attempt at hosting a semi-private repo
on a public (free) server?

Besides the non-text nature of encrypted content, the problem here is
that d(e(x))=x for all x but e(d(x)) differs from x most probably, and
hopefully randomly, unless you use the right version of debian's openssl
of course ;)

That being said:
git diff calls textconv filters with smudged as well as cleaned files
(when diffing work tree files to blobs), and this does not seem right. I
hope this is not happening with the internal diff, nor with crlf!

Since both the cleaned and the smudged version are supposed to be
"authoritative" (as opposed to the textconv'ed one) one may argue either
way what's the right approach. For internal use comparing the cleaned
versions may make more sense, for displaying diff's the checked-out
form, i.e. smudged versions make more sense.

But that is another topic which would need to be substantiated with
tests. It's not completely unlikely I may come up with some, but don't
count on it...

Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Junio C Hamano
Michael J Gruber <[hidden email]> writes:

> Since both the cleaned and the smudged version are supposed to be
> "authoritative" (as opposed to the textconv'ed one) one may argue either
> way what's the right approach.

Smudged one can never be authoritative.  That is the whole point of smudge
filter and in general the whole convert_to_working_tree() infrastructure.
It changes depending on who you are (e.g. on what platform you are on).
So running comparison between two clean versions is the only sane thing to
do.

You could argue textconv should work on smudged contents or on clean
contents before smudging.  As long as it is done consistently, I do not
care either way too deeply, as its output is not supposed to be used for
anything but human consumption.  Two equally sane arrangement would be:

 (1) Start from two clean contents (run convert_to_git() if contents were
     obtained from the work tree), run textconv, run diff, and output the
     result literally; or

 (2) Start from two smudged contents (run convert_to_working_tree() for
     contents taken from the repository), run textconv, run diff, and
     run clean before sending the result to the output.

The former assumes a textconv filter that wants to work on clean
contents, the latter for a one that expects smudged input.  I probably
would suggest going the former approach, as it is consistent with the
general principle in other parts of the system (the internal processing
happens on clean contents).

Both of the above two assumes that the output should come in clean form;
it is consistent with the way normal diff is generated for consumption by
git-apply. You can certainly argue that the final output should be in
smudged form when textconv is used, as it is purely for human consumption,
and is not even supposed to be fed to apply.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

MichaelJGruber
Junio C Hamano venit, vidit, dixit 14.03.2009 19:45:

> Michael J Gruber <[hidden email]> writes:
>
>> Since both the cleaned and the smudged version are supposed to be
>> "authoritative" (as opposed to the textconv'ed one) one may argue either
>> way what's the right approach.
>
> Smudged one can never be authoritative.  That is the whole point of smudge
> filter and in general the whole convert_to_working_tree() infrastructure.
> It changes depending on who you are (e.g. on what platform you are on).
> So running comparison between two clean versions is the only sane thing to
> do.

Yes. I guess I'm being too much of a mathematician here: if clean is a
well-defined function, then clean(x) is well defined by specifying x. In
that sense x is equally authoritative.
Again, if smudge is the inverse of clean, i.e. smudge and clean are
bijective, then x differs from y iff clean(x) differs from clean(y).

> You could argue textconv should work on smudged contents or on clean
> contents before smudging.  As long as it is done consistently, I do not
> care either way too deeply, as its output is not supposed to be used for
> anything but human consumption.  Two equally sane arrangement would be:
>
>  (1) Start from two clean contents (run convert_to_git() if contents were
>      obtained from the work tree), run textconv, run diff, and output the
>      result literally; or
>
>  (2) Start from two smudged contents (run convert_to_working_tree() for
>      contents taken from the repository), run textconv, run diff, and
>      run clean before sending the result to the output.
>
> The former assumes a textconv filter that wants to work on clean
> contents, the latter for a one that expects smudged input.  I probably
> would suggest going the former approach, as it is consistent with the
> general principle in other parts of the system (the internal processing
> happens on clean contents).
>
> Both of the above two assumes that the output should come in clean form;
> it is consistent with the way normal diff is generated for consumption by
> git-apply. You can certainly argue that the final output should be in
> smudged form when textconv is used, as it is purely for human consumption,
> and is not even supposed to be fed to apply.

Also, I don't expect clean to be necessarily meaningful when applied to
the result of textconv, and even less so to the output of diff.

Now, a simple test shows that git diff obviously does this when diffing
HEAD to worktree:

diff between HEAD and clean(worktree)

Which is the right thing. It just seems so that textconv is not even
called "in the wrong place of the chain", but messes the diff up in this
way:

diff between textconv(HEAD) and textconv(worktree)

(I expected clean(textconv(worktree)) first, which would be wrong, too).
I.e., the clean filter is ignored completely in the presence of textconv.

OK, I'll stop bugging you, until I checked the existing tests and the
code...

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Jeff King
On Mon, Mar 16, 2009 at 05:01:33PM +0100, Michael J Gruber wrote:

> Now, a simple test shows that git diff obviously does this when diffing
> HEAD to worktree:
>
> diff between HEAD and clean(worktree)
>
> Which is the right thing. It just seems so that textconv is not even
> called "in the wrong place of the chain", but messes the diff up in this
> way:
>
> diff between textconv(HEAD) and textconv(worktree)
>
> (I expected clean(textconv(worktree)) first, which would be wrong, too).
> I.e., the clean filter is ignored completely in the presence of textconv.

Yeah, I think this should probably be textconv(clean(worktree)) to match
the regular HEAD/worktree diff (if it isn't already). Can you put
together a test that shows the breakage?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

Jeff King
In reply to this post by Junio C Hamano
On Fri, Mar 13, 2009 at 01:23:08PM -0700, Junio C Hamano wrote:

> As the sole raison d'etre of diff.textconv is to allow potentially lossy
> conversion (e.g. msword-to-text) applied to the preimage and postimage
> pair of contents (that are supposed to be "clean") before giving a textual
> diff to human consumption, the above config may appear to work, but if you
> really want an encrypted repository, you should be using an encrypting
> filesystem.  That would give an added benefit that the work tree
> associated with your repository would also be encrypted.

I can think of one reason that having git do the encryption might be
beneficial: pushing to an untrusted source.

If you encrypted all blobs but kept trees and commits in plaintext, you
could retain (some of) the benefits of git's incremental push. The
downsides, though, are:

  1. You are revealing the hashes of your blobs' plaintext. Which means
     I can try brute-forcing your blobs by checking against a hash
     function.

  2. The remote can't actually look at the blobs. The most obvious
     problem with this is that you can't send it thin packs, since it
     can't actually resolve deltas.

And given the ensuing mess that it would make of the code to
conditionally say "Oh, we have this object, but you're not allowed to
read it", it is almost certainly not worth it.

But maybe somebody can prove me wrong and design a system that allows
efficient encrypted pushing to a non-trusted remote and also doesn't
suck.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

bigbear
In reply to this post by Matthias Nothhaft
Matthias Nothhaft wrote
Hi,

I'm new to Git but I really already love it. ;-)

I would like to have repository that transparently encrypts and
decrypts all files using GPG.

What I need is a way to automatically modify each file

a) before it is written in the repository
b) after it is read from the repository

Is there a way to get this work somehow? Can someone give me some
hints where I need to begin?

regards,
Matthias
Have come across this on my own search for an encrypted git repo. Matthias it looks as if somebody has come up with a "working" system that uses the 'smudge & clean' filter features of git.
Seems to me that to use it for storing the repo on a non trusted or possibly public git repo with some private content in the files this seems to be a workable solution.

Transparent Git Encryption
https://gist.github.com/873637
and/or possibly
https://github.com/shadowhand/git-encrypt

The way to do this is to use git's "smudge" and "clean" filters, but it's not necessarily recommended for reasons that are explained here by Junio C Hamano, the maintainer of git:

    http://article.gmane.org/gmane.comp.version-control.git/113221




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

lalebarde
Hi,
I am puzzled from the recommandation of Junio C Hamano, the maintainer of git, to not encrypt files before pushing them :
Junio C Hamano wrote
If your "clean" encrypts and "smudge" decrypts, it means you are refusing all the benifit git offers.
Junio C Hamano wrote
the above config may appear to work
So, does it work or not, or partially ? And if partially, what does not work ?

Another issue is the use of the cypher ECB by git-encrypt. Some argue it is bad (cf also that).

So I made some experiments, tacking a 15Mb pdf :

$ openssl enc -base64 -aes-256-ecb -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf >WileyE1
$ openssl enc -base64 -aes-256-ecb -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf >WileyE2
$ md5sum WileyE1
d43058d8443777aea871350245d9865b  WileyE1
$ md5sum WileyE2
d43058d8443777aea871350245d9865b  WileyE2

$ openssl enc -base64 -aes-256-ofb -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf >WileyE1
$ openssl enc -base64 -aes-256-ofb -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf >WileyE2
503d82849ad53652268d1abdcfbce9de  WileyE1
503d82849ad53652268d1abdcfbce9de  WileyE2

$ openssl enc -base64 -aes-256-cbc -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf >WileyE1
$ openssl enc -base64 -aes-256-cbc -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf >WileyE2
e726431cbd9ff8780946ddfad775600a  WileyE1
e726431cbd9ff8780946ddfad775600a  WileyE2


As the hash are identical from one run to another, I don't understand why we should stick to the ECB cypher.

Can some one clarify the two points please ?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

geek
This post has NOT been accepted by the mailing list yet.
Hi,
    On your first question: So, does it work or not, or partially ?
And if partially, what does not work?

    As Junio C Hamano indicated in his message,
''
        git internally runs diff and xdelta to see what was changed, so that:
         * it can reduce storage requirement when it runs pack-objects;
         * it can check what path in the preimage was similar to what
other path in the postimage, to deduce a rename;
         * it can check what blocks of lines in the postimage came
from what other blocks of lines in the preimage, to pass blames across
file boundaries.
''
you will lose the benefits offered through these git features. I have
not tested it, but I believe what Junio said is true. "Git encryption
via smudge/clean filters" is a hack to the existing git system,
meaning it is not "by design" of git. The designing goal of
"transparent git encryption" is to provide confidentiality of git data
outsourced to an external server (or "the Cloud"). This is achieved by
asking yourselves to manage your passwords / keys. The integrity of
git data is partially protected by the git system itself through
chained hashing. If the features Junio mentioned aren't important to
you, then the method works. As also mentioned in Junio's message,
using an encrypted filesystem (with tools such as "truecrypt") is an
alternative way of achieving outsourced data confidentiality.

    On your second question: As the hash are identical from one run to
another, I don't understand why we should stick to the ECB cypher.

    You certainly do not have to stick to the ECB mode as long as your
encryption method is deterministic. In the example you have shown ($
openssl enc -base64 -aes-256-cbc -S 1762851 -k a5G4juy64VVBgfq4
<Wiley.pdf), you are explicitly providing a fixed-valued "salt" (the
-S option) so that it together with the password is used to
*deterministically* derive an IV and an encryption key for AES CBC
encryption. Note that using ECB mode is generally regarded as a bad
crypto practice; so is using a fixed-valued salt for CBC. (The latter
may be slightly better than the former, depending on what you
believe.) If we can manage to find a way of changing the salt value
based on the file name, I think it will be a better way. In fact, I
thought about the same thing some time ago, but have not found time to
look deeper into it. I may update my document in the near future once
I find out more.

    If you have high-value, high-impact data to protect on an external
server, do not use this method, and use an encrypted filesystem.

Ning

On Sun, Jun 17, 2012 at 12:33 AM, lalebarde [via git]
<[hidden email]> wrote:

> Hi,
> I am puzzled from the recommandation of Junio C Hamano, the maintainer of
> git, to not encrypt files before pushing them :
>
> Junio C Hamano wrote
> If your "clean" encrypts and "smudge" decrypts, it means you are refusing
> all the benifit git offers.
>
> Junio C Hamano wrote
> the above config may appear to work
>
> So, does it work or not, or partially ? And if partially, what does not work
> ?
>
> Another issue is the use of the cypher ECB by git-encrypt. Some argue it is
> bad (cf also that).
>
> So I made some experiments, tacking a 15Mb pdf :
>
> $ openssl enc -base64 -aes-256-ecb -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf
>>WileyE1
> $ openssl enc -base64 -aes-256-ecb -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf
>>WileyE2
> $ md5sum WileyE1
> d43058d8443777aea871350245d9865b  WileyE1
> $ md5sum WileyE2
> d43058d8443777aea871350245d9865b  WileyE2
>
> $ openssl enc -base64 -aes-256-ofb -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf
>>WileyE1
> $ openssl enc -base64 -aes-256-ofb -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf
>>WileyE2
> 503d82849ad53652268d1abdcfbce9de  WileyE1
> 503d82849ad53652268d1abdcfbce9de  WileyE2
>
> $ openssl enc -base64 -aes-256-cbc -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf
>>WileyE1
> $ openssl enc -base64 -aes-256-cbc -S 1762851 -k a5G4juy64VVBgfq4 <Wiley.pdf
>>WileyE2
> e726431cbd9ff8780946ddfad775600a  WileyE1
> e726431cbd9ff8780946ddfad775600a  WileyE2
>
> As the hash are identical from one run to another, I don't understand why we
> should stick to the ECB cypher.
>
> Can some one clarify the two points please ?
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://git.661346.n2.nabble.com/Transparently-encrypt-repository-contents-with-GPG-tp2470145p7561644.html
> This email was sent by lalebarde (via Nabble)
> To receive all replies by email, subscribe to this discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Transparently encrypt repository contents with GPG

lalebarde
Thanks for your clarifications ! stars 2 & 3 are still not clear for me. Probably because I am new to git.

Do you think that if a solution is found, in the hypothesis it respects both git & strong cryptography, it would have success ? My analyse is that small enterprises that do not have many servers nor premises may need git hosting. Even big companies with their own networks if they want more security.

TrueCrypt or encrypted file system on the host is not feasible off the shelves. One have to settle its own dedicated server at the host.

On my side, I am afraid to push my projects in clear into a host. But possibly I am too much paranoïde. Do you have an idea of the risk ?
Loading...