Summary of C/C++ integer rules
This is my own collection of hard-earned knowledge about how integers work in C/C++, and how to use them carefully and correctly. In this article, I try to strike a balance between brevity (easing the reader) and completeness (providing absolute correctness and extensive detail).
Whenever I read or write C/C++ code, I need to recall and apply these rules in many situations, such as: Choosing an appropriate type for a local variable / array element / structure field, converting between types, and doing any kind of arithmetic or comparison. Note that floating-point number types will not be discussed at all, because that mostly deals with how to analyze and handle approximation errors that stem from rounding. By contrast, integer math is a foundation of programming and computer science, and all calculations are always exact in theory (ignoring implementations issues like overflow).
Data types
Basic integer types
A basic integer type is specified using some legal sequence of keywords drawn from the set {char
, short
, int
, long
, signed
, unsigned
}.
Although the bit width of each basic integer type is implementation-defined (i.e. depends on the compiler and platform), the following properties are guaranteed by the language standards:
char
: At least 8 bits wide.short
: At least 16 bits, and at least as wide aschar
.int
: At least 16 bits, and at least as wide asshort
.long
: At least 32 bits, and at least as wide asint
.long long
: At least 64 bits, and at least as wide aslong
.
Signedness
Unqualified
char
may be signed or unsigned, which is implementation-defined.Unqualified
short
,int
,long
, andlong long
are signed. Adding theunsigned
keyword makes them unsigned.Signed numbers may be encoded in binary as two’s complement, ones’ complement, or sign-magnitude; this is implementation-defined. Note that ones’ complement and sign-magnitude each have distinct bit patterns for negative zero and positive zero, whereas two’s complement has a unique zero.
Character literals (in single quotes) have the type (signed)
int
in C, but (signed or unsigned)char
in C++.
Additional rules
sizeof(char)
always equals 1, regardless of the bit width ofchar
.The bit widths need not be distinct. For example, it’s legal to have
char
,short
, andint
all be 32 bits wide.The bit widths need not be powers of 2. For example,
int
could be 36 bits wide.-
There are various ways to write an integer type. For example, each line is a set of synonyms:
int
,signed
,signed int
,int signed
.short
,short int
,short signed
,short signed int
.unsigned long long
,long unsigned int long
,int long long unsigned
.
Standard library types
size_t
(defined in stddef.h) is unsigned and at least 16 bits wide. It is not guaranteed to be at least as wide asint
.ptrdiff_t
(defined in stddef.h) is a signed integer type. Subtracting two pointers will yield this type. Do not assume that subtracting two pointers will yieldint
.Specific type widths are defined in stdint.h:
uint8_t
,int8_t
, 16, 32, and 64. Be careful with arithmetic promotions – for example,uint8_t + uint8_t
yieldsint
(which is signed and at least 16 bits wide), notuint8_t
as one would reasonably expect.
Conversions
Suppose that a value of some source integer type needs to be converted to a value of some destination integer type. This situation might arise from an explicit cast expression, an implicit cast upon assignment, or an arithmetic promotion. What happens in the conversion?
A primary principle is that if the destination type can hold the value of the source type, then the value will be semantically preserved. To elaborate:
When a source type is widened to a destination type of the same signedness (e.g.
signed char
→int
;unsigned short
→unsigned long
), then every source value will be preserved after conversion.Even if the source and destination types have different ranges, all values in the overlapping part of their ranges will be preserved. (For example, an
int
that holds a value in the range [0, 255] will be losslessly converted tounsigned char
.)
More specifically, the rules are as follows:
-
When converting to an unsigned type, the new value equals the old value modulo 2destination bit width. Explanations:
If the source type is unsigned and wider than the destination, then effectively the top bits are discarded.
If the source type is signed, then the conversion effectively takes the source value and repeatedly adds/
subtracts 2destination bit width until the new value fits in the destination type’s range. Moreover if the signed number system is two’s complement, then the conversion effectively discards the top bits, just like the unsigned case.
-
When converting to a signed type, these are the cases:
If the source value fits in the destination type’s range, then the conversion procedure (e.g. sign extension) produces a destination value that is semantically equal to the source value.
Otherwise if it doesn’t fit, then the behavior is implementation-defined, and could raise an exception (e.g. overflow trap).
Arithmetic
Promotions/conversions
A unary arithmetic operator has one operand, e.g.
-
,~
.A binary arithmetic operator has two operands, e.g.
+
,*
,&
,<<
.-
If any operand of an operator has type
bool
,char
, orshort
(whether signed or unsigned), then it is promoted to (signed)int
ifint
can hold all values of the source type; otherwise it is promoted tounsigned int
; the promotion is designed to be lossless. Examples:An implementation has 16-bit
short
and 24-bitint
. Letting variablesx
andy
have the typeunsigned short
, the expressionx & y
promotes both operands tosigned int
.An implementation has 32-bit
char
and 32-bitint
. Letting variablesx
andy
have the typeunsigned char
, the expressionx - y
promotes both operands tounsigned int
.
-
For binary operators, both operands are implicitly converted to the same common type before the arithmetic happens. Consider the list of conversion ranks in ascending order:
int
,long
,long long
. The rank of the common type is the highest rank among the two operand types. If both operands have the same signedness, then the common type has that signedness. Else if the operand with the unsigned type has greater or equal rank than the other operand type, then the common type is unsigned. Else if the signed operand type can represent all values of the other operand type, then the common type is signed. Otherwise the common type is unsigned. Examples:(long) + (long) → (long)
.(unsigned int) * (int) → (unsigned int)
.(unsigned long) / (int) → (unsigned long)
.If
int
is 32-bit andlong
is 64-bit:(unsigned int) % (long) → (long)
.If
int
is 32-bit andlong
is 32-bit:(unsigned int) % (long) → (unsigned long)
.
Conversion rules simulator
Implementation-defined integer type bit widths:
Type | Bit width |
---|---|
char |
|
short |
|
int |
|
long |
|
long long |
Input variables:
-
typeof(
x
) = -
typeof(
y
) =
Output expressions:
- Unary promotion: typeof(
+x
) = - Unary promotion: typeof(
+y
) = - Binary operation: typeof(
x * y
) =
Undefined behavior
Signed overflow:
When arithmetic is performed on a signed integer type, overflow is undefined behavior. UB can produce correct, inconsistent, and/or incorrect behavior, now or anytime in the future.
When arithmetic is performed on an unsigned integer type (after promotions and conversions), any overflow is guaranteed to wrap around. For example,
UINT_MAX + 1 == 0
.-
Performing arithmetic on unsigned fixed-width integer types can lead to subtle bugs. For example:
Let
uint16_t = unsigned short
and letint
be 32-bit. Thenuint16_t x=0xFFFF, y=0xFFFF, z=x*y;
would havex
andy
promoted toint
, andx * y
would overflowint
– hence undefined behavior.Let
uint32_t = unsigned char
and letint
be 33-bit. Thenuint32_t x=0xFFFFFFFF, y=0xFFFFFFFF, z=x+y;
would havex
andy
promoted toint
, andx + y
would overflowint
– hence undefined behavior.To force safe unsigned arithmetic, either add
0U
or multiply by1U
as a no-op. For example:0U + x + y
or1U * x * y
. This ensures that the operands are promoted to at leastint
rank and still kept as unsigned.
Divide/
Division by zero and remainder with a zero divisor are undefined behavior.
Unsigned division/
remainder has no other special cases. Signed division can overflow – e.g.
INT_MIN / -1
.Signed remainder with negative operands seems to be tricky; some parts are uniform while others are implementation-defined.
Bit shifts:
It is undefined behavior to bit shift (
<<
and>>
) by an amount that is either negative, equal to the bit width, or greater than the bit width.Left-shifting an unsigned operand (after promotion/
conversion) is well-defined and behaves normally. Left-shifting a signed operand type containing a non-negative value, such that a 1 bit would go into the sign bit, is undefined behavior.
Left-shifting a negative value is undefined behavior.
Right-shifting a non-negative value (in a signed or unsigned operand type) is well-defined and behaves normally.
Right-shifting a negative value is implementation-defined.
Loop counting
Type choice
Suppose we have an array where we want to process each element sequentially, and the array length is stored in the variable len
of type T0
. How should we declare our loop counter variable i
of type T1
?
-
The easiest rule is to use the same type as the length variable. For example:
uint8_t len = (...); for (uint8_t i = 0; i < len; i++) { ... }
More generally, a counter variable of type
T1
will work correctly if the range ofT1
is a (non-strict) superset of the range ofT0
. For example iflen
has typeuint16_t
, then counting usingsigned long
(at least 32 bits) works.But more specifically, the loop counter merely needs to cover all possible actual lengths. For example if
len
of typeint
is guaranteed to have a value in the range [3, 50] (due to some application-specific logic), then it is okay to count the loop using a signed or unsignedchar
(where the range [0, 127] is guaranteed to be representable).-
It’s a bad idea to use different signedness for the length variable versus the counter variable. The comparison would force an implicit conversion, and signed-unsigned conversions are hard to understand and come with subtle platform-dependent pitfalls. The following example should not be written in real code:
size_t len = (...); // Unsigned for (int i = 0; i < len; i++) { ... }
Counting down
For downward-counting loops, using a signed counter is more natural because we can write:
for (int i = len - 1; i >= 0; i--) { process(array[i]); }
Whereas an unsigned counter would require code like:
for (unsigned int i = len; i > 0; i--) { process(array[i - 1]); }
Note: The comparison i >= 0
makes sense if i
is a signed type, but always yields true if i
is unsigned. So if this expression ever appears in an unsigned context, then the code writer very likely made a mistake in the logic.
Misconceptions
All points in the following list are myths. Do not rely on these false beliefs if you want to write correct and portable code!
char
is always 8 bits wide.int
is always 32 bits wide.sizeof(T)
represents the number of 8-bit bytes (octets) needed to store a variable of typeT
. (This is false because if saychar
is 32 bits, thensizeof(T)
measures in 32-bit words.)We can use
int
everywhere in a program and ignore nuanced types likesize_t
,uint32_t
, etc.Signed overflow is guaranteed to be wrap around. (e.g.
INT_MAX + 1 == INT_MIN
.)Character literals are equal to their ASCII code values, e.g.
'A' == 65
. (This is false due to EBCDIC, etc.)Converting a pointer to an
int
and back to a pointer will be lossless.Converting {a pointer to one integer type} to {a pointer to another integer type} is safe. (e.g.
int *p = (...); long *q = (long*)p;
.) (See type punning and strict aliasing.)Whenever all the operand(s) of an arithmetic operator (unary or binary) have unsigned types, the arithmetic operation will be performed in unsigned mode (thus never triggering undefined behavior) and the result type will be unsigned. For example: Suppose
uint8_t x; uint8_t y; uint32_t z;
, thenx + y
should have a type likeuint8_t
orunsigned int
or something reasonable, and+z
would still beuint32_t
. (This is false because promotions favor signed types.)
My critique
In short, it’s a heavy mental burden to know and use all these rules all the time. Failing to adhere to the rules results in a risk of writing unportable or incorrect code, and such errors might surface immediately or lie latent for days or years.
The difficulties begin with the implementation-defined bit widths for basic integer types. For example,
int
might be 16 bits wide, 32, 64, or some other number. One always needs to choose a type with enough range for the application. But sometimes, having too much range (e.g. unusual 128-bitint
) could cause headaches or introduce vulnerabilities too. To make matters worse, standard library types likesize_t
bear no relationship with other types likeunsigned int
oruint32_t
; the standard allows them to be wider or narrower.The conversion rules are completely insane. Making matters worse, implicit conversions are allowed almost everywhere, which makes human auditing much harder. Unsigned types are straightforward, but signed types have so many possible legal implementations (e.g. ones’ complement, raising exceptions). Types with a lower rank than
int
are automatically promoted, causing unintuitive behavior with ranges and overflow. When operands have different signedness and ranks, they get converted to a common type in a way that depends on the implementation-defined bit widths. For example, performing arithmetic on two operands, where at least one operand has an unsigned type, will convert both of them to either a signed or unsigned type depending on the implementation.Arithmetic operations abound with undefined behaviors: signed overflow in add/
sub/ mul/ div, division by zero, bit shifts. It’s easy to accidentally trigger these UB conditions, not easy to trigger them deliberately or detect them at run time, and hard to identify their root causes. It takes conscious awareness and effort to design and implement arithmetic code that avoids overflow/UB. It’s difficult to audit and fix existing code that wasn’t written with overflow/UB safety in mind. Having signed and unsigned variants of every integer type essentially doubles the number of options to choose from. This adds to the mental burden, yet has little payoff because signed types can do almost everything that unsigned ones can.
-
No other mainstream language has as many rules and pitfalls regarding integer types as C and C++. For example:
-
Java integers behave the same in every environment. The language defines exactly 5 integer types (unlike C/C++ having at least 10), the types have fixed bit widths, almost all of these types are signed (except for
char
), signed numbers must be in two’s complement, implicit conversions only allow lossless ones, and all arithmetic and conversions are well-defined (no UB). Java’s fixed-width integer types support fast computations and efficient array packing, compared to languages like Python which only have (variable-width) bigint.Java is built heavily around the 32-bit
int
type, especially for counting arrays. This means Java can’t run efficiently on underpowered 16-bit CPUs (common in embedded microcontrollers), and Java can’t directly address large arrays on 64-bit systems. By comparison, C/C++ makes it possible to write code that runs efficiently on 16-bit, 32-bit, and/or 64-bit CPUs, but requiring a great deal of care from the coder. Python only has one integer type, which is a signed bigint. Compared to C/C++, this renders moot all discussions about bit widths, signedness, and conversions – one type rules all the code. But the price to pay includes slow execution and inconsistent memory usage.
JavaScript doesn’t even have an integer type, and instead expresses everything in terms of float64 math (corresponding to C/C++
double
). Because of this, the bit width and numerical range are fixed, numbers are always signed, there are no conversions, and overflow is well-defined.The assembly language for any particular machine architecture (x86, MIPS, etc.) defines a set of fixed-width integer types, arithmetic operations, and conversions – with few or no undefined behaviors.
-
More info
- cppreference.com: C++ - Fundamental types
- Wikipedia: C data types
- cppreference.com: C - Implicit conversions - Integer conversions
- cppreference.com: C - Implicit conversions - Usual arithmetic conversions
- C in a Nutshell: Chapter 4. Type Conversions
- Stack Overflow: Implicit type promotion rules
- Stack Overflow: 32 bit unsigned multiply on 64 bit causing undefined behavior?
- Stack Overflow: What’s the best C++ way to multiply unsigned integers modularly safely?
- Stack Overflow: Is masking before unsigned left shift in C/C++ too paranoid?