Quantcast

Linus' sha1 is much faster!

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Linus' sha1 is much faster!

giuseppe
Hi,

These are the results I reported (median of 5 plus an additional not
considered first run) on the Steve Reid's SHA1 implementation using the
same flags to the compiler that I used for previous tests.

GCC 4.3.3:  real 0m2.627s
GCC 4.4.1:  real 0m3.742s

In both cases it showed to be slower than other implementations I have
already tried.

Additional note: as for gnulib SHA1, GCC 4.4.1 produced slower code than
GCC 4.3.3.

Cheers,
Giuseppe



Steven Noonan <[hidden email]> writes:

>
> Interesting. I compared Linus' implementation to the public domain one
> by Steve Reid[1], which is used in OpenLDAP and a few other projects.
> Anyone with some experience testing these kinds of things in a
> statistically sound manner want to try it out? In my tests, I got
> this:
>
> (average of 5 runs)
> Linus' sha1: 283MB/s
> Steve Reid's sha1: 305MB/s
>
> - Steven
>
> [1] http://gpl.nas-central.org/SYNOLOGY/x07-series/514_UNTARED/source/openldap-2.3.11/libraries/liblutil/sha1.c
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Linus' sha1 is much faster!

George Spelvin
In reply to this post by Nicolas Pitre
> I don't think this trick of making source code textually different from
> another work while still intimately mimicking the same structure entitles
> you to any copyright (or non copyright) claims over that other work.  I
> certainly wouldn't bet any dime for this standing up in court.  

<div type="legal digression">
Actually, I would.  I did a lot more than text search and replace;
I re-implemented it from FIPS 180-2 (work of U.S. government, no copyright)
and then merged in the *ideas* from the mailing list.  

(And from elsewhere; the idea of a five-round macro is from Brian Gladman.)

Remember, to the extent that something is *functional*, it is not
copyrightable; copyright only covers the non-functional expressive bits.
The vast majority of that code is simply required by the standard,
or the desired calling interface.

For a large portion of the rest, remember that standard programming
conventions (e.g.  brace style, macro names IN CAPS, etc.) that's also
non-copyrightable "scene a faire" material.

It's well established that paraphrasing a recipe avoids copyright;
the proportions and treatment of the ingredients is not copyrightable.

For more details, see the extensive coverage of the NEC v. Intel decision
(1989) regarding the firmware for NEC's 8086-clone V20 microprocessor.
It was found non-infringing despite non-clean-room implementation and
substantial similarities.
</div>

> It is probably much safer to simply ask the people involved to agree
> with your relicensing.  And so far I don't see anyone with a stake in
> this fiercely wanting to stick to a particular license.

As for politeness, that's exactly why I did post it and solicit
objections.  The purpose of the rewrite is to avoid having to make
pessimistic assumptions about people who don't respond.

I suppose I should have made that request clearer:
Is there anyone who claims copyright on anything here?
Or would just like credit?
If so, are you willing to donate it to the public domain?

I like the GPL, but some code is generally applicable enough, and unlikely
to be improved on by others enough, that I'd rather just let everyone
use the same version.

> It's not quite 100% finished, as I haven't benchmarked it against Linus's
> code yet, but it's functionally correct.
>
> It's also clean with -W -Wall -Wextra.

> Not if you try with the unaligned put_be32() as the destination pointer
> is marked const.

Oops.  And the "||" in get_be32() doesn't help either.
There was another major bug I found, too.

And I'm still trying to match "linusv" performance.

> As to the actual result on ARM... Well, the assembly _looks_ much worse
> than Linus' version.  It uses a stack frame of 152 bytes instead of 64
> bytes.  The resulting binary is also 6868 bytes large compared to 6180
> bytes.  Surprisingly, the performance is not that bad (the reason for
> the underlined "looks" above) albeit still a bit worse, like 5% slower.  
> I was expecting much worse than that.

Thanks, I'll work on it.

> Also, wrt skipping the last 3 write back to the 16 word array...  For
> all the (limited) attempts I've made so far to do that, it always ended
> up making things worse.  I've yet to investigate why though.

Thanks.  That's certainly surprising!


If anyone cares, my current code (adapted to fit sha1bench) is appended.

/*
 * Secure Hash Algorith SHA-1, as published in FIPS PUB 180-2.
 *
 * This implementation is in the public domain.  Copyright abandoned.
 * You may do anything you like with it, including evil things.
 *
 * This is a rewrite from scratch, based on Linus Torvalds' "block-sha1"
 * from the git mailing list (August, 2009).  Additional optimization
 * ideas cribbed from
 * - Artur Skawina (x86, particularly P4, and much benchmarking)
 * - Nicolas Pitre (ARM)
 *
 * Contributors to this code are requested to donate their changes to
 * the public domain to avoid version skew.  If you don't want to, be
 * sure to change the copyright waiver above.
 */

#include <string.h> /* For memcpy() */
#include <arpa/inet.h> /* For ntohl() */

#include "pdsha1.h"

#define SHA_CTX pd_SHA_CTX
#define SHA_context pd_SHA_context
#define SHA1_Init pd_SHA1_Init
#define SHA1_Update pd_SHA1_Update
#define SHA1_Final pd_SHA1_Final

static void sha1_core(uint32_t iv[5], unsigned char const *p);


void SHA1_Init(struct SHA_context *c)
{
        /* This is a prefix of the SHA_context structure */
        static struct {
                uint64_t len;
                uint32_t iv[5];
        } const iv = { 0,
                { 0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0 }
        };

        memcpy(c, &iv, sizeof iv);
}

void SHA1_Update(struct SHA_context *c, void const *p, size_t n)
{
        size_t pos = c->len & 63; /* Offset into current input block */
        size_t nblocks;

        c->len += n;

        /* Initial partial block (if any) */
        if (pos) {
                size_t space = 63 - pos;
                if (n < space)
                        goto end;
                memcpy(c->buf + pos, p, space);
                sha1_core(c->iv, c->buf);
                n -= space;
                p = (char const *)p + space;
        }

        /* The large middle piece */
        nblocks = n >> 6;
        while (nblocks--) {
                sha1_core(c->iv, p);
                p += 64;
        }
        n &= 63;
        pos = 0;
end:
        /* Final partial block (may be zero size) */
        memcpy(c->buf + pos, p, n);
}

/* Machine specific hacks */
#if defined(__i386__) || defined(__x86_64__) || defined (__ppc__) || \
    defined(__ppc64__) || defined(__powerpc__) || defined (__powerpc64__) || \
    defined(__s390__) || defined(__s390x__)
/* Unaligned access is okay */
#define GET_BE32(p) ntohl(*(uint32_t const *)(p))
#define PUT_BE32(p,v) (*(uint32_t *)(p) = htonl(v))

#else
/*
 * Unaligned access is not okay; do conversion as byte fetches
 * These macros are IN_CAPS, as they are not side-effect safe
 * The argument p must be an "unsigned char *" or "uint8_t *"
 */
#define GET_BE32(p) \
        ((uint32_t)(p)[0]<<24 | (p)[1]<<16 | (p)[2]<<8 | (p)[3])
#define PUT_BE32(p,v) \
        ( (p)[0] = (v)>>24, (p)[1] = (v)>>16, (p)[2] = (v)>>8, (p)[3] = (v) )
#endif

void SHA1_Final(unsigned char hash[20], struct SHA_context *c)
{
        size_t pos = c->len & 63;
        unsigned i;

        /* Append a single 1 bit */
        c->buf[pos++] = 0x80;

        /* Append 0 bits until 64 bits remain in a block */
        if (pos > 56) {
                memset(c->buf + pos, 0, 64 - pos);
                sha1_core(c->iv, c->buf);
                pos = 0;
        }
        memset(c->buf + pos, 0, 56 - pos);

        /* Append total input length in bits */
        ((uint32_t *)c->buf)[14] = htonl((uint32_t)(c->len >> 29));
        ((uint32_t *)c->buf)[15] = htonl((uint32_t)c->len << 3);

        /* Final hash round */
        sha1_core(c->iv, c->buf);

        /* Copy hash result out */
        for (i = 0; i < 5; i++)
                PUT_BE32(hash + 4*i, c->iv[i]);
}


/*
 * Helper macros for sha1_core function.  To avoid clutter, these macros
 * are NOT fully parenthesized if it doesn't matter to the actual use.
 */

#if __GNUC__ && (defined(__i386__) || defined(__x86_64__))
/*
 * GCC by itself only generates left rotates.  Use right rotates if
 * possible to be kinder to dinky implementations with iterative rotate
 * instructions.
 */
#define ROT(op, x, k) \
        ({ uint32_t y; __asm__(op " %1,%0" : "=r" (y) : "I" (k), "0" (x)); y; })
#define ROL(x,k) ROT("roll", x, k)
#define ROR(x,k) ROT("rorl", x, k)

#else
/* Generic C equivalent */
#define ROT(x,l,r) ((x) << (l) | (x) >> (r))
#define ROL(x,k) ROT(x,k,32-(k))
#define ROR(x,k) ROT(x,32-(k),k)
#endif


/*
 * An important temporary array in SHA-1 is the working array W[],
 * which holds a scheduled key word per round.  Only the last 16 words
 * are relevant, so only use 16 words...
 */
#define W(i) w[(i) & 15]

/*
 * If you have a small register set, it helps GCC to force stores to
 * the w[] array to memory.  Given 22 or more registers (e.g. PowerPC),
 * GCC can map the entire w[] array to registers and this becomes
 * counterproductive.
 *
 * The optimal kludge seems to differ between x86 and ARM.  On the latter,
 * forcing a full memory barrier is required to stop GCC from splitting
 * live ranges for each round and generating a separate stack slot for
 * each.  (Which produces a huge stack frame and kills performance.)
 */
#if __GNUC__ && defined(__i386__) || defined(__x86_64__)
#define STORE(i, x) W(i) = x; __asm__ volatile("" : "+m" (W(x)))

#elif defined(__i386__) || defined(__x86_64__)
#define STORE(i, x) *(volatile uint32_t *)&W(i) = x

#elif __GNUC__ && defined(__arm__)
#define STORE(i, x) W(i) = x; __asm__("":::"memory")

#else
#define STORE(i, x) W(i) = x
#endif


/* The three round functions.  F2 is also used as F4 */
#define F1(b,c,d) (((d ^ c) & b) ^ d) /* Bitwise b ? c : d */
#define F2(b,c,d) (d ^ c ^ b) /* Even parity */
#define F3(b,c,d) (d & c) + ((d ^ c) & b) /* Majority function */

/* The four round constants */
#define K1 0x5a827999 /* 2^30 * sqrt(2) */
#define K2 0x6ed9eba1 /* 2^30 * sqrt(3) */
#define K3 0x8f1bbcdc /* 2^30 * sqrt(5) */
#define K4 0xca62c1d6 /* 2^30 * sqrt(10) */

/* Rounds 0..15 fetch a word from the input */
#define FETCH(t,i) t = GET_BE32(p + 4*(i)); STORE(i,t)
//#define FETCH(t,i) t = W(i)
/* Rounds 16..79 mix previous words to get a new one */
#define MIX(t,i) t = W(i) ^ W(i+2) ^ W(i+8) ^ W(i+13); t = ROL(t, 1)
/* Rounds 16..76 have to store back the result */
#define CALC(t,i) MIX(t,i); STORE(i,t)

/* The basic SHA-1 round */
#if 0
#define SHA_ROUND(a,b,c,d,e,f,k,src,t,i) \
        src(t,i); \
        e += t + f(b,c,d) + k + ROL(a,5); \
        b = ROR(b,2)
#elif 0
#define SHA_ROUND(a,b,c,d,e,f,k,src,t,i) do { \
        uint32_t t; \
        src(t,i); \
        e += f(b,c,d); \
        e += t + k; \
        b = ROR(b,2); \
        e += ROL(a,5); } while (0)
#else
#define SHA_ROUND(a,b,c,d,e,f,k,src,t,i) do { \
        uint32_t t = ROL(a,5); \
        e += f(b,c,d); \
        e += k + t; \
        src(t,i); \
        b = ROR(b,2); \
        e += t; } while(0)
#endif

/* An aligned group of 5 rounds */
#define SHA_ROUND5(f,k,src,t,i) \
        SHA_ROUND(a,b,c,d,e, f,k,src,t,i); \
        SHA_ROUND(e,a,b,c,d, f,k,src,t,i+1); \
        SHA_ROUND(d,e,a,b,c, f,k,src,t,i+2); \
        SHA_ROUND(c,d,e,a,b, f,k,src,t,i+3); \
        SHA_ROUND(b,c,d,e,a, f,k,src,t,i+4)

/*
 * The core SHA-1 transform.
 *
 * iv[5] = Current SHA-1 hash state.
 * p = Pointer to 64 bytes of source data.  Not necessarily aligned.
 */
static void
sha1_core(uint32_t iv[5], unsigned char const *p)
{
        //uint32_t e = iv[4], d = iv[3], c = iv[2], b = iv[1], a = iv[0];
        uint32_t a = iv[0], b = iv[1], c = iv[2], d = iv[3], e = iv[4];
        uint32_t w[16];

#if 0
        {
                int i;
                for (i = 0; i < 16; i++) {
                        STORE(i, get_be32(p + 4*i));
                }
        }
#endif
        SHA_ROUND5(F1, K1, FETCH, t,  0);
        SHA_ROUND5(F1, K1, FETCH, t,  5);
        SHA_ROUND5(F1, K1, FETCH, t, 10);
        SHA_ROUND(a,b,c,d,e, F1, K1, FETCH, t, 15);
        SHA_ROUND(e,a,b,c,d, F1, K1, CALC, t, 16);
        SHA_ROUND(d,e,a,b,c, F1, K1, CALC, t, 17);
        SHA_ROUND(c,d,e,a,b, F1, K1, CALC, t, 18);
        SHA_ROUND(b,c,d,e,a, F1, K1, CALC, t, 19);

        SHA_ROUND5(F2, K2, CALC, t, 20);
        SHA_ROUND5(F2, K2, CALC, t, 25);
        SHA_ROUND5(F2, K2, CALC, t, 30);
        SHA_ROUND5(F2, K2, CALC, t, 35);

        SHA_ROUND5(F3, K3, CALC, t, 40);
        SHA_ROUND5(F3, K3, CALC, t, 45);
        SHA_ROUND5(F3, K3, CALC, t, 50);
        SHA_ROUND5(F3, K3, CALC, t, 55);

        SHA_ROUND5(F2, K4, CALC, t, 60);
        SHA_ROUND5(F2, K4, CALC, t, 65);
        SHA_ROUND5(F2, K4, CALC, t, 70);

        SHA_ROUND(a,b,c,d,e, F2, K4, CALC, t, 75);
        SHA_ROUND(e,a,b,c,d, F2, K4, CALC, t, 76);
        /* Last 3 rounds don't need to store mixed value */
        SHA_ROUND(d,e,a,b,c, F2, K4, MIX, t, 77);
        SHA_ROUND(c,d,e,a,b, F2, K4, MIX, t, 78);
        SHA_ROUND(b,c,d,e,a, F2, K4, MIX, t, 79);

        iv[0] += a;
        iv[1] += b;
        iv[2] += c;
        iv[3] += d;
        iv[4] += e;
}


/* Compile with -DUNITTEST for self-tests */
#if UNITTEST

#include <stdio.h>
#include <stdlib.h>

/* Known answer test */
static void katest(char const *text, size_t len, unsigned char const hash[20])
{
        SHA_CTX c;
        unsigned char hash2[20];
        int i;

        SHA1_Init(&c);
        SHA1_Update(&c, text, len);
        SHA1_Final(hash2, &c);

        for (i = 0; i < 20; i++)
                if (hash[i] != hash2[i])
                        goto mismatch;
        printf("%u-byte known answer test PASSED\n", len);
        return;

mismatch:
        printf("%u-byte known answer test FAILED:\n", len);
        if (len < 64)
                printf("Input: \"%.*s\"\n", len, text);
        else
                printf("Input: \"%.64s\"...\n", text);
        printf("Computed:");
        for (i = 0; i < 20; i++)
                printf(" %02x", hash2[i]);
        printf("\nExpected:");
        for (i = 0; i < 20; i++)
                printf(" %02x", hash[i]);
        putchar('\n');
}

int
main(void)
{
        /* FIPS PUB 180.1 example A.1 */
        static char const text1[3] = "abc";
        static unsigned char const hash1[20] = {
                0xa9, 0x99, 0x3e, 0x36, 0x47, 0x06, 0x81, 0x6a, 0xba, 0x3e,
                0x25, 0x71, 0x78, 0x50, 0xc2, 0x6c, 0x9c, 0xd0, 0xd8, 0x9d };

        /* FIPS PUB 180.1 example A.2 */
        static char const text2[56] =
                "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq";
        static unsigned char const hash2[20] = {
                0x84, 0x98, 0x3e, 0x44, 0x1c, 0x3b, 0xd2, 0x6e, 0xba, 0xae,
                0x4a, 0xa1, 0xf9, 0x51, 0x29, 0xe5, 0xe5, 0x46, 0x70, 0xf1 };

        /* FIPS PUB 180.1 example A.3 */
        char *text3;
        static unsigned char const hash3[20] = {
                0x34, 0xaa, 0x97, 0x3c, 0xd4, 0xc4, 0xda, 0xa4, 0xf6, 0x1e,
                0xeb, 0x2b, 0xdb, 0xad, 0x27, 0x31, 0x65, 0x34, 0x01, 0x6f };

        katest(text1, sizeof text1, hash1);
        katest(text2, sizeof text2, hash2);
        text3 = malloc(1000000);
        memset(text3, 'a', 1000000);
        katest(text3, 1000000, hash3);
       
        return 0;
}
#endif
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Linus' sha1 is much faster!

Nicolas Pitre
On Mon, 17 Aug 2009, George Spelvin wrote:

> > I don't think this trick of making source code textually different from
> > another work while still intimately mimicking the same structure entitles
> > you to any copyright (or non copyright) claims over that other work.  I
> > certainly wouldn't bet any dime for this standing up in court.  
>
> <div type="legal digression">
> Actually, I would.  I did a lot more than text search and replace;
> I re-implemented it from FIPS 180-2 (work of U.S. government, no copyright)
> and then merged in the *ideas* from the mailing list.  
>
> (And from elsewhere; the idea of a five-round macro is from Brian Gladman.)
>
> Remember, to the extent that something is *functional*, it is not
> copyrightable; copyright only covers the non-functional expressive bits.
> The vast majority of that code is simply required by the standard,
> or the desired calling interface.
>
> For a large portion of the rest, remember that standard programming
> conventions (e.g.  brace style, macro names IN CAPS, etc.) that's also
> non-copyrightable "scene a faire" material.
>
> It's well established that paraphrasing a recipe avoids copyright;
> the proportions and treatment of the ingredients is not copyrightable.
>
> For more details, see the extensive coverage of the NEC v. Intel decision
> (1989) regarding the firmware for NEC's 8086-clone V20 microprocessor.
> It was found non-infringing despite non-clean-room implementation and
> substantial similarities.
> </div>

Whatever.  NEC and Intel were certainly commercial competitors.  They
were far from being friends.  So if you feel like having too many
friends then just go ahead with that stance.

> As for politeness, that's exactly why I did post it and solicit
> objections.

You said:

|It uses Linus's and Artur's performance ideas, and some of Linus' macro
|ideas (in the rotate implementation), but tries to be textually
|different. Is there anything recognizable that anyone cares to clam
|copyright to?

the "try to be textually different" in order to ask for "anything
recognizable that anyone cares to clam copyright to" is what I find
dubious.

> The purpose of the rewrite is to avoid having to make
> pessimistic assumptions about people who don't respond.
>
> I suppose I should have made that request clearer:
> Is there anyone who claims copyright on anything here?
> Or would just like credit?
> If so, are you willing to donate it to the public domain?

I think this is much nicer to everyone involved.

As far as I'm concerned, I'm OK with giving any small copyright I might
have in this SHA1 implementation, if any, to the public domain.  
Credits are always nice.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Linus' sha1 is much faster!

Steven Noonan
In reply to this post by Linus Torvalds-3
On Mon, Aug 17, 2009 at 9:22 AM, Linus
Torvalds<[hidden email]> wrote:

>
>
> On Mon, 17 Aug 2009, Steven Noonan wrote:
>>
>> Interesting. I compared Linus' implementation to the public domain one
>> by Steve Reid[1]
>
> You _really_ need to talk about what kind of environment you have.
>
> There are three major issues:
>  - Netburst vs non-netburst
>  - 32-bit vs 64-bit
>  - compiler version

Right. I'm running a Core 2 "Merom" 2.33GHz. The code was compiled for
x86_64 with GCC 4.2.1. I didn't _expect_ it to compile for x86_64, but
apparently the version of GCC that ships with Xcode 3.2 defaults to
compiling 64-bit code on machines that are capable of running it.

>
> Steve Reid's code looks great, but the way it is coded, gcc makes a mess
> of it, which is exactly what my SHA1 tries to avoid.
>
> [ In contrast, gcc does very well on just about _any_ straightforward
>  unrolled SHA1 C code if the target architecture is something like PPC or
>  ia64 that has enough registers to keep it all in registers.
>
>  I haven't really tested other compilers - a less aggressive compiler
>  would actually do _better_ on SHA1, because the problem with gcc is that
>  it turns the whole temporary 16-entry word array into register accesses,
>  and tries to do register allocation on that _array_.
>
>  That is wonderful for the above-mentioned PPC and IA64, but it makes gcc
>  create totally crazy code when there aren't enough registers, and then
>  gcc starts spilling randomly (ie it starts spilling a-e etc). This is
>  why the compiler and version matters so much. ]
>
>> (average of 5 runs)
>> Linus' sha1: 283MB/s
>> Steve Reid's sha1: 305MB/s
>
> So I get very different results:
>
>        #             TIME[s] SPEED[MB/s]
>        Reid            2.742       222.6
>        linus           1.464         417

Added -m32:

Steve Reid: 156MB/s
Linus: 209MB/s

So on x86, your code really kicks butt.

> this is Intel Nehalem, but compiled for 32-bit mode (which is the more
> challenging one because x86-32 only has 7 general-purpose registers), and
> with gcc-4.4.0.
>
>                        Linus
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Linus' sha1 is much faster!

George Spelvin
In reply to this post by Nicolas Pitre
>> The purpose of the rewrite is to avoid having to make
>> pessimistic assumptions about people who don't respond.
>>
>> I suppose I should have made that request clearer:
>> Is there anyone who claims copyright on anything here?
>> Or would just like credit?
>> If so, are you willing to donate it to the public domain?

> I think this is much nicer to everyone involved.
>
> As far as I'm concerned, I'm OK with giving any small copyright I might
> have in this SHA1 implementation, if any, to the public domain.  
> Credits are always nice.

My apologies.  I read a lot of people talking about wanting the code
under different licenses, and thought I'd just cut through it by
providing some PD code.

I didn't turn around and look at it from the point of view of the
people who'd put the work into developing it.  I don't mean to deny
anyone credit for their work.  In fact, providing more detail is on
the to-do list, but I haven't waded through the mail archives and
tracked down who contributed what yet.

I'll work on those polish details once I have it producing the same
assembly code as Linus'.

There are a lot of possible highly-permissive licenses if one is wanted
(zlib, MIT, CC-by), but public domain seems simpler.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Linus' sha1 is much faster!

Pádraig Brady
In reply to this post by Nicolas Pitre
Nicolas Pitre wrote:

> On Sat, 15 Aug 2009, Linus Torvalds wrote:
>
>> (Heh. Looking at that, I probably should move the 'size' field first,
>> since that would have different alignment rules, and the struct would be
>> more tightly packed that way, and initialize better).
>
> I was about to suggest (i.e. post) a patch for that.  This is indeed a
> good idea.
>
>> Afaik, none of the actual code remains (the mozilla SHA1 thing did the
>> wrong thing for performance even for just the final bytes, and did those a
>> byte at a time etc, so I rewrote even the trivial SHA1_Final parts).
>
> Maybe a patch adding a proper header with the actual license would be a
> good idea too.

So have you decided on a final licence,
and if so update the headers accordingly?

cheers!
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Linus' sha1 is much faster!

galt
This post has NOT been accepted by the mailing list yet.
This post was updated on .
I also wanted to include Linus' sha1 in our software at work.
But the GPLv2 license was incompatible.
Too bad it is not just in the public domain.
I grabbed Steve Reid's public domain code from 1999
and ran it. It produced the same output.
I ran it on a 3GB input file, and Linus' code from 2009 takes 37 to 40 seconds.
(Just reading the file in the same 4k buffers only takes 3 seconds
so disk reading does not dominate the time.)
When I ran Steve's old version on the same input it was taking just 36 or 37 seconds.
So it is slightly faster.
Have compilers improved?
I am using gcc 4.4.7-17.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Linus' sha1 is much faster!

galt
This post has NOT been accepted by the mailing list yet.
In reply to this post by Pádraig Brady
A Phádraig, cá bhfuil tú i do chónaí?
Tá mé i gCalafoirne.
12
Loading...