Project Nayuki


Lowest SHA-512 value by brute force

Lowest SHA-512 program running screenshot

I encountered a competition to find an ASCII string with a SHA-512 hash value as low as possible (or the most number of leading zeros). So I accepted the challenge and built a brute-force search program in C, based on my fast SHA-512 implementation in x86-64 assembly. Additionally, I rewrote SHA-512 in vectorized form to do dual and quad SHA-512 hashing using x86 AVX instructions, and implemented multithreading with pthreads.

Source code

C + x86-64 AVX (dual SHA-512)

Download: lowest-sha512-dual-pthread.c, sha512-64-avx.S

Compile: cc -pthread -O1 lowest-sha512-dual-pthread.c sha512-64-avx.S -o lowest-sha512-dual-pthread

This uses the 128-bit XMM registers to compute two simultaneous instances of the SHA-512 hash function per thread. The program prints abbreviated hashes to standard error and full hashes to standard output (intended for logging).

C + x86-64 AVX2 (quad SHA-512)

Download: lowest-sha512-quad-pthread.c, sha512-64-avx2.S

Compile: cc -pthread -O1 lowest-sha512-quad-pthread.c sha512-64-avx2.S -o lowest-sha512-quad-pthread

This uses the 256-bit YMM registers to compute four simultaneous instances of the SHA-512 hash function per thread. The program prints abbreviated hashes to standard error and full hashes to standard output (intended for logging).

C (portable)

Download: lowest-sha512.c

Compile: cc -O1 lowest-sha512.c -o lowest-sha512

This C code is portable, not depending on the x86 machine architecture and forgoes multithreading. Though it’s not very fast, the code illustrates my core ideas invovled in this brute-force hashing program and contains fewer minute details than the optimized versions.

Python

Download: lowest-sha512.py

This highly simplified program demonstrates the core of what the C programs are doing. The program prints abbreviated hashes to standard output only. Compatible with Python 2 and 3.

Notes

  • The 3-operand arithmetic introduced in AVX is a welcome change from x86’s classic 2-operand form. I had to change my habits and refrain from writing MOV instructions a few times (e.g. because bit shifting is no longer destructive), but overall the clarity of the code is improved.

  • The code is only available for Linux x86-64 because more than 8 registers are needed in the hash computation. x86-32 can be supported, but the code would need numerous register spills, making it inefficient.

  • For those who have CPUs that don’t support AVX or AVX2, you can still compile the code, download the Intel Software Development Emulator, and run the binaries under the SDE to verify that they work – albeit at a slowdown factor of about 1000×. (In fact, I developed the quad/AVX2 version using the SDE, because AVX2 is only supported since Intel Haswell (year 2013) and I didn’t have access to such a new CPU in March 2014.)