Project Nayuki


Bitcoin cryptography library

This project implements the cryptographic primitives used in the Bitcoin system, especially elliptic curve operations and hash functions.

The code is written in two independent versions in C++ and Java. It includes a test suite of over a thousand test vectors that cover every feature provided by the library.

The library is open source, and is written by Nayuki from the ground up. It is designed with portability and clarity in mind, and is rigorously verified for correctness and quality.

Source code

Browse source files on GitHub: https://github.com/nayuki/Bitcoin-Cryptography-Library
Download the ZIP package: https://github.com/nayuki/Bitcoin-Cryptography-Library/archive/master.zip
License: MIT (open source)

Project components:

  • C++ cryptography implementations (all the .hpp files and corresponding .cpp files)

  • C++ test suites (all the *Test.cpp files, each having a main() function)

  • Makefile for C++ code (type “make check” to compile and run all)

  • Java cryptography implementations (in the package io.nayuki.bitcoin.crypto)

  • Java JUnit test suites (all the *Test.java files)

Library features:

  • Uint256: An unsigned 256-bit integer with wrap-around overflow arithmetic.

  • FieldInt: An integer modulo a prime number, in the field used for the elliptic curve.

  • CurvePoint: A point on the secp256k1 elliptic curve for Bitcoin.

  • Ecdsa: The elliptic curve digital signature algorithm, signing and verification.

  • Hash functions: RIPEMD-160 (core Bitcoin use), SHA-256 (core), SHA-512 (only needed for HD wallets), Keccak-256 (only for Ethereum).

  • Base58Check: Convert both ways between binary data and ASCII strings.


Design notes

Implemented in C++11

My Bitcoin cryptography library is implemented in C++ instead of C, to take advantage of features like encapsulation, instance methods (instead of global functions), pass-by-reference, arithmetic operator overloading, and name­spaces. As for C++11, the features used are move constructors, deleting default functions, and (strangely) the header cstdint.

Although the code is written in C++, no C++ libraries are actually used (iostream, vector, algorithm, etc.) in the core algorithms. This is good for compiling the code to run on embedded systems because C++ libraries can generate a lot of binary code and use a lot of memory. (But the test suites do use C++ libraries, and are meant to be run on a desktop computing environ­ment.) The only libraries used are essentially the cstdint type definitions, size_t, memset(), memcpy(), and strlen(). Also all memory used by this library is allocated on the stack, not on the heap.

The decision to use C++ means that the library excludes projects/developers who work in a pure C environment, but I feel that C++ makes my code clearer and more maintainable. For example in C++ I would write:

#include <cstdint>
#include <iostream>
using namespace std;

class Uint256 {
    uint32_t value[8];
    Uint256(const char *s) { ... }
    void add(const Uint256 &other) { ... }
    void multiply(const Uint256 &other) { ... }
	ostream &operator<<(ostream &os, const Uint256 &x) { ... }
};

int main() {
    Uint256 x("12...34");
    Uint256 y("fe...dc");
    x.multiply(y);
    cout << x << endl;
    return 0;
}

Whereas in C, I would define global functions with a namespace prefix, and frequently work with pointer variables (Type *x) and passing by addresses (&y) – all of which I feel contribute to syntactical clutter:

#include <stdint.h>
#include <stdio.h>

struct Uint256 {
    uint32_t value[8];
};

void Uint256_init(struct Uint256 *x, const char *s) { ... }
void Uint256_add(struct Uint256 *x, const struct Uint256 *y) { ... }
void Uint256_multiply(struct Uint256 *x, const struct Uint256 *y) { ... }
void Uint256_print(const struct Uint256 *x) { ... }

int main(void) {
    struct Uint256 x, y;
    Uint256_init(&x, "12...34");
    Uint256_init(&y, "fe...dc");
    Uint256_multiply(&x, &y);
    Uint256_print(&x);
    printf("\n");
    return 0;
}
Targeted toward 32-bit CPUs

The cryptography code is designed to run efficiently on microprocessors that support 32-bit integer operations and have a hardware multiplier. The code has been tested on desktop (x86, x86-64) and embedded (32-bit ARM) systems alike. High-performance 64-bit CPUs are supported too, but the code would run faster if specifically rewritten for them by using 64-bit addition, multiplication, etc.

The consequence of this design is that 8-bit microcontrollers (Arduino, Atmel megaAVR/ATmega, etc.) are not supported. In theory it is possible to literally emulate the 32-bit operations with sequences of 8-bit operations, but this is slow and it would be much preferable to design algorithms for native 8-bit processing from the start.

Although there is no C++ code that caters specifically to 64-bit CPUs, there is an optional set of functions implemented in x86-64 assembly that replaces key functions in Uint256 and FieldInt, found in AsmX8664.s. When compiling the code with -DNDEBUG and without -fsanitize=undefined, the asm-optimized version runs at about 2.5× the speed of the pure-C++ version for ECDSA signing/verification.

Constant-time arithmetic operations

The entire stack of code for ECDSA signing is implemented using constant-time arithmetic operations. This means when executing x.func(y), the sequence of instructions executed, the branches taken, and the memory addresses accessed will be same regardless of the values inside the structures x and y. In particular, the classes Uint256, FieldInt, CurvePoint, and Ecdsa are designed with data-independent constant-time execution in mind. This technique thwarts timing attacks, cache-based attacks, and side channel attacks (such as power analysis via an oscilloscope).

To achieve this, the basic ideas are that relational operators (==, <, etc.) generate an integer result of 0 or 1, and that we can use bit masking to select which value we want. Here is a simple example:

// Naive version
uint x = (...);
if (x > 5)
    x = 2;

// Constant-time version
uint x = (...);
uint mask = -(uint)(x > 5); // 0x00000000 or 0xFFFFFFFF
x = (x & ~mask) | (2 & mask);

By applying this masking technique, we can take low-level functions that are known to be constant-time and build them into higher-level functions that are still constant-time. This technique is also discussed in the BearSSL library and on a page about crypto coding rules.

Fast elliptic curve arithmetic

The elliptic curve operations in this implementation are moderately optimized – not featuring the cutting edge in mathematics and algorithms, but still about 100× faster than the straightforward textbook version (actually bench­marked). On a 100 MHz 32-bit ARM processor, it takes only 200 ms to create an ECDSA signature (the naive version really took 20 seconds). On a 2012 desktop PC, it takes about 5 ms.

There are two major optimizations implemented: Performing elliptic curve point addition and doubling in projective coordinates, and using Barrett reduction for modular multiplication. Even though both optimizations make the code significantly harder to verify compared to the naive algorithms, the speed gains are well worth it when computing ECDSA signatures on embedded microcontrollers.

Safe use of integer arithmetic

All the core code is carefully designed to use fixed-width integer types like uint8_t / uint32_t / etc. instead of unsigned char / int / etc. Thus the logic will work correctly and identically on 32-bit and 64-bit platforms. In theory it would even work perfectly on 8- and 16-bit platforms if the compiler generated instructions to emulate 32-bit arithmetic.

The logic intentionally avoids signed integer overflow (a form of undefined behavior); almost all math is done with unsigned integers. Type punning is never used. The code makes no assumptions on the endianness of the machine, and thus is usable on both big-endian and little-endian processors.

(When I say that the core code is designed carefully, I’m contrasting with the fact that the runnable test programs are written more informally, as they often use int variables instead of the strict types uint32_t / size_t / etc. Practically speaking this is fine because within the test cases, the int variables are almost always within the range [0, 1000], so there is no need to reason about integer widths or overflow.)

Small, rigorously verified code

Firstly, my Bitcoin cryptography library has low code complexity. The core implementation (excluding tests) is only 2300 lines of code (including blank lines and comments)! This ensures that complete verification of the codebase is tractable for a human reviewer.

Every line of source code has been verified carefully on printed paper. (It’s easier to patiently scrutinize every detail on paper rather than on screen, and the environment is less distracting.) When it came to editing the code, no detail was spared – even something as minor as a const would be imple­mented, and the wording of documentation comments was edited for maximum clarity and consistency.

Most of my code can be reviewed by knowing basic C++ language features and basic computer-oriented mathematics (such as bitwise arithmetic and overflow). Three pieces of tricky code deserve an extra explanation:

  • Class CurvePoint: Modular reciprocal/division is slow, so the essence of performing elliptic curve operations in projective coordinates is to do one reciprocal at the end of point multiplication rather than doing a reciprocal after every point addition or doubling (which means 512 reciprocals per point multiplication). This is achieved by doing some massively ugly algebra (but technically just high school level difficulty) to keep track of the two numera­tors and shared denominator of two fractions instead of performing a division step immediately. At the end of the process, the numerators are divided by the denominator to get the ordinary affine point. The algebra is explicitly documented in the comments of the CurvePoint class. A Wikibooks page was used for reference, but the math has been independently re-derived and verified by me.

  • FieldInt.multiply(): I have a full write-up for the Barrett reduction algorithm, where the article carefully justifies the math behind the algorithm and reasons about the bit widths of the intermediate numbers. This know­ledge was faithfully translated from math into code, and the code does not attempt to re-explain the math.

  • Uint256.reciprocal(): With some effort, it is possible to take the binary GCD algorithm and remove the branches and make the outer loop have a constant number of iterations to cover the worst case (namely 2n iterations, where n is the bit width of bigint). No mathematical justifications are given in the code, but the inline comments are helpful to look at if you are re-deriving your own implementation of this constant-time extended GCD concept.

But so far, the code has only been reviewed and tested by Nayuki. It hasn’t been subjected to independent review yet due to a lack of resources. Feedback from readers and programmers would be highly appreciated.

Comparison with libsecp256k1

Libsecp256k1 is the code used in the official Bitcoin project for elliptic curve arithmetic and other cryptography (such as the SHA-256 hash). In early year , in version 0.10.0, it replaced the previous use of the OpenSSL library.

Libsecp256k1 is the effort of a couple of smart developers for years, and is deployed in live Bitcoin clients processing billions of transactions. Its core functionality is implemented in about 8000 lines of code. Nayuki’s library is the effort of one person for a few months and only tested lightly in the real world.

Libsecp256k1 uses more clever mathematics and algorithms, as well as esoteric C language/compiler features. Also it is implemented in pure C whereas mine is C++. Due to the complexity of the math and code however, I don’t feel I can competently audit libsecp256k1 – even though I created from scratch a competing library with comparable features and was able to audit my own work.

Java version

The cryptography code is also implemented in Java, following the same design principles as the C++ code: 32-bit integer arithmetic, constant-time operations to thwart timing attacks, fast algorithms for elliptic curve arith­metic, and stack allocation of memory.

However, the requirement to avoid heap allocation makes the Java code fairly ugly. Objects can only be allocated on the heap, so instead of allocating temporary objects during calculations (which is possible in C++), the code uses a temporary int array allocated by the caller. All numbers are passed in and out of functions as indices to an int array, not as Uint256/FieldInt/CurvePoint objects (which would have better encapsulation and follow the object-oriented programming paradigm).

Because of the usage of raw integer arrays, indices, and manual memory allocation, the Java code is actually at a lower abstraction level than the C++ code. But at least the Java code does achieve its design goals and computes the elliptic curve functions correctly and efficiently.