Project Nayuki


Fast SHA-1 hash implementation in x86 assembly

After doing something for the first time, doing it again is much easier. The MD5 and SHA-1 hash algorithms are quite similar in structure, and having written MD5 in x86 assembly recently, applying that knowledge to implement SHA-1 in x86 was a breeze.

Source code

The code comes in a number of parts:

Files:

To use this code, compile it on Linux with one of these commands:

Then run the executable with ./sha1test.

Benchmark results

Code Compilation Speed on x86 Speed on x86-64
C (naive)GCC -O098 MiB/s97 MiB/s
C (naive)GCC -O1184 MiB/s212 MiB/s
C (naive)GCC -O2178 MiB/s198 MiB/s
C (naive)GCC -O3177 MiB/s199 MiB/s
C (naive)GCC -O1 -fomit-frame-pointer206 MiB/s
C (naive)GCC -O2 -fomit-frame-pointer191 MiB/s
C (naive)GCC -O3 -fomit-frame-pointer191 MiB/s
C (fast)GCC -O0111 MiB/s111 MiB/s
C (fast)GCC -O1253 MiB/s292 MiB/s
C (fast)GCC -O2137 MiB/s191 MiB/s
C (fast)GCC -O3137 MiB/s191 MiB/s
C (fast)GCC -O1 -fomit-frame-pointer269 MiB/s
C (fast)GCC -O2 -fomit-frame-pointer182 MiB/s
C (fast)GCC -O3 -fomit-frame-pointer182 MiB/s
Assembly (naive)GCC -O1270 MiB/s
Assembly (fast)GCC -O0313 MiB/s
Assembly (fast)GCC -O1327 MiB/s
Assembly (OpenSSL[0])GCC -O0305 MiB/s

On x86, my assembly code is 1.22× as fast as my C code best compiled by GCC. On x86-64, my assembly code is 1.07× as fast as my C code best compiled by GCC.

All the benchmark results above are based on: CPU = Intel Core 2 Quad Q6600 2.40 GHz (single-threaded), OS = Ubuntu 10.04 (32-bit and 64-bit), compiler = GCC 4.4.3.

Notes

More info