Project Nayuki

Unsigned int considered harmful for Java


Every now and then, a developer comes along and complains about the lack of unsigned integer types in Java, demanding that they be implemented in the language. Oftentimes it’s a novice coming from C/C++ or C# who has worked with unsigned types before, or one who wants to guarantee no negative numbers in certain situations. Said novice C/C++/C# programmer typically does not fully understand the semantics and ramifications of unsigned integers in C/C++/C# to begin with.

While on the surface this feature request seems reasonable – extra types will be added but one can choose to ignore them if not explicitly needed – it would actually cause deep problems for all Java developers whether they like it or not. Moreover, arithmetic operations on unsigned integers can already be quite easily and efficiently emulated using signed integers, as proven in practice in numerous low-level libraries. Here I will argue why Java supporting unsigned integer types would be not only unnecessary but also harmful.

Emulating unsigned arithmetic

Let’s assume that for whatever reason, you must perform unsigned integer arithmetic in Java. Maybe you’re writing a cryptographic algorithm, decoding a multimedia stream, or implementing a file format / network protocol where data fields are specified as unsigned.

Straightforward emulation

The conceptually simple way to emulate unsigned arithmetic using only Java’s signed integer types is to widen the number while masking off the new high bits, perform the arithmetic, and narrow down the result.

Suppose we have this C/C++ code:

uint16_t a = (...);  // Input
uint16_t b = (...);  // Input
uint16_t c = a + b;  // Output

Then this is one way to make it work in Java:

short a = (...);           // Treat bits as unsigned value
short b = (...);           // Treat bits as unsigned value
int wideA = a & 0xFFFF;    // This value is now correct
int wideB = b & 0xFFFF;    // This value is now correct
int temp = wideA + wideB;  // Wide result
short c = (short)temp;     // Discard top bits; treat bits as unsigned value

Or more compactly (Java):

short a = (...);  // Input
short b = (...);  // Input
short c = (short)((a & 0xFFFF) + (b & 0xFFFF));


Efficient emulation

The Java language specification requires its signed integers to be represented in two’s complement format. Because of this, many basic operations are exactly the same whether the integer type is signed or unsigned. For example, the sum of two 8-bit integers giving an 8-bit result will have the same bit pattern whether all three of these values are treated as signed or unsigned. The same goes for subtraction, multiplication, left shift, equality, and of course bitwise AND/OR/XOR/NOT operations. The only ones that are signedness-sensitive are division, remainder, right shift, inequalities, casting, and string conversion. To summarize:

Java’s advantage of having mostly unified signed/unsigned arithmetic operations (from a bit-level standpoint) is not enjoyed by C/C++ users. This is because signed integers can be implemented on the machine in two’s complement, ones’ complement, or sign-magnitude, and the C/C++ standards allow the low-level detail of the bit format to be exposed to the programmer. Furthermore, while signed overflow is silent and well-defined in Java, it is undefined behavior in C/C++ and leads to all sorts of nasty problems. (In fact, if you stretch the argument you could say that if you wanted safe and reliable two’s complement signed overflow in C/C++, you should emulate the signed arithmetic using unsigned arithmetic – the reverse situation of what’s being argued here for Java.)

Continuing the previous example, this is the efficient Java translation:

short a = (...);           // Input, treat bits as unsigned value
short b = (...);           // Input, treat bits as unsigned value
short c = (short)(a + b);  // Output, treat bits as unsigned value

We can ignore the mandatory widening from 16 bits to 32 bits (promotion from short to int with sign extension), and we can ignore the arithmetic that occurs in the high 16 bits because it has no effect at all on the low 16 bits.

This technique is even easier for 32- and 64-bit unsigned arithmetic because no narrowing in Java is needed. C/C++ example:

uint32_t a = (...);  // Input
uint32_t b = (...);  // Input
uint32_t c = a * b;  // Output

Java translation (look ma, no extra effort!):

int a = (...);  // Input, treat bits as unsigned value
int b = (...);  // Input, treat bits as unsigned value
int c = a * b;  // Output, treat bits as unsigned value

As for the operations that behave differently for signed versus unsigned types, let’s discuss how to emulate them without resorting to widening:


Mixed-type arithmetic pitfalls

Type conversion rules

When a programming language has both signed and unsigned integer types, it means that they will interact with each other in some way or another. At the very least, there must exist explicit conversions between types and rules that govern these conversions. The rules tend to become complicated when implicit conversions are implemented in the language specification. For example, this is what C has to say about mixed-type integer arithmetic (C11 standard draft, page 71):

  1. If both operands have the same type, then no further conversion is needed.

  2. Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.

  3. Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.

  4. Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

  5. Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.


(Note: These examples avoid showing the promotion of char and short to int.)

Usage examples

Other considerations


Unsigned integer types do not belong in the Java programming language. They add considerable complexity to the type system, type conversion rules, library APIs, and the language and virtual machine specifications. Yet for developers who truly need unsigned arithmetic, this functionality can already be achieved effectively using signed integer types with only a small amount of knowledge and effort. Hence unsigned ints would add almost no expressive power but considerable pain – a rather unwise trade-off, isn’t it?

More info