Project Nayuki


Near-duplicate features of C++

A large collection of features within the C++ programming language have very similar functionality. Much of this is due to features being inherited straight from C, then new native C++ features being added as alternatives.

Knowing which feature to use in what situation can be a continual learning struggle. Each team might use a different subset of features, which can cause needless confusion and stylistic variation. As new features get added to the C++ language or as awareness of existing features increases, it’s sometimes necessary to revisit old codebases and revise them to comply with modern practices.

Finally, although this state of affairs is an unchangeable fact about C++, other major programming languages have far less duplication. Don’t automatically assume that C++’s situation is desirable, necessary, or inevitable.

Abstract feature Features in C Features in C++
File extension
  • Implementation: c
  • Header: h
  • Implementation: cpp, cc, cxx, c++, C
  • Header: hpp, hh, hxx, h++, H
Single include
  • #ifdef guard
  • #pragma once
  • (No new feature in C++)
Null pointer value
  • 0
  • NULL (stddef.h)
  • nullptr
Minimum integer width
  • int
  • int_least16_t (stdint.h)
  • (No new feature in C++)
Operator token
  • x &= y ^ ~z;
  • x and_eq y xor compl z;
  • (No new feature in C++)
Reference type
  • Pointer (*)
  • Reference (&)
Reference passing
  • Pass by address (foo(&x))
  • Pass by reference (foo(x))
Type aliasing
  • typedef const Foo (*Bar)[4];
  • using Bar = const Foo(*)[4];
Type conversion
  • (Type)val
  • Type(val)
  • Type{val}
  • xxx_cast<Type>(val)
  • Implicit constructor
Variable initializer
  • Foo x = y;
  • Foo x(y);
  • Foo x{y};
Nullary function
  • int foo(void) { ... }
  • int foo() { ... }
Nullary constructor call
  • (Unavailable in C)
  • Foo *x = new Foo;
  • Foo *x = new Foo();
  • Foo *x = new Foo{};
Optional argument
  • Varargs (stdarg.h)
  • Default function argument
  • Function overloading
Return type
  • int main() { ... }
  • auto main() -> int { ... }
Compound data
  • struct
  • class
Field initialization
  • Struct initializer
  • Field initializer
  • Constructor field initializer
  • Constructor assignment statement
Member hiding
  • static
  • private
  • Anonymous namespace
Namespacing
  • (Unavailable in C)
  • namespace
  • class
Generic code
  • Preprocessor macro
  • Template function/class
Type parameter
  • (Unavailable in C)
  • template <class T>
  • template <typename T>
Variable function
  • Function pointer
  • Virtual method
  • Lambda expression
Heap allocation
  • malloc()
  • calloc()
  • new
  • new[]
Heap deallocation
  • free()
  • delete
  • delete[]
Character string
  • Array of char
  • std::string
Sequence of values
  • T[] (array)
  • T* (pointer)
  • std::vector<T>
  • std::array<T>
Access into sequence
  • Integer index
  • Iterator
Array filling
  • memset()
  • std::fill()
Array copying
  • memcpy()
  • memmove()
  • std::copy()
  • std::copy_backward()
C standard library header
  • #include <foobar.h>
  • #include <cfoobar>
I/O library
  • #include <stdio.h>
  • printf("%d", n);
  • scanf("%d", &n);
  • #include <iostream>
  • std::cout << n;
  • std::cin >> n;
Random number generation
  • #include <stdlib.h>
  • srand()
  • rand()
  • RAND_MAX
  • #include <random>
  • Various engines
  • Various distributions
Exception handling
  • setjmp()
  • longjmp()
  • try { ... }
  • catch (T v) { ... }
  • throw val;
File extension

Many pieces C source code (but not all) can be compiled in C++ mode without modifications – so .c is a valid file extension for C++ files. Many pure-C++ header files still use a .h extension (same extension as C header files) instead of a C++-specific extension.

As for the C++-specific file extensions, generally .cpp and .cc can be found in the wild. The other alternatives are rare. Unlike other major languages, there is no standardization on file extensions. Surprisingly, C doesn’t suffer from this because C files are universally named as .c or .h.

Single include

There are at least two ways to ensure that any given header file is included at most once. The standards-compliant, but cumbersome and brittle way is to use an #ifdef + #define + body + #endif construct. It requires 3 lines of code, and the defined constant needs to be manually synchronized with the file name. The convenient and widely supported (but technically non-standard) way is to simply write #pragma once at the top. If the chosen C/C++ compiler doesn’t support this, it’s not hard to write a script that replaces every header file’s #pragma once with an auto-generated #ifdef guard.

Null pointer value

The only null pointer value in C is 0, and NULL is simply a macro constant defined as 0. The nullptr keyword introduced in C++11 is much more type-safe and less ambiguous in overloads. Always use nullptr instead of the old NULL or 0.

Minimum integer width

The C and C++ standards make a number of guarantees on the bit widths of basic integer types, such as: short and int are at least 16 bits, long is at least 32 bits, width(char) ≤ width(short) ≤ width(int), et cetera. C99 and C++11 introduce the stdint.h header, which defines explicitly sized types like int_least16_t. Because the simple int type is already guaranteed to have at least 16 bits, we might as well use it instead of the fancier type name.

Operator token

The C language uses characters such as | and ~, and can support non-ASCII character sets. Some characters used in the language are absent from certain character sets, so alternate spellings of certain operators and tokens were added to the language. In C, these synonyms are activated by including iso646.h, whereas in C++ these synonyms are a mandatory part of the language. One consequence is that you cannot name a variable or function as and, or, not, etc. Other consequences are that the feature can lead to style disagreements or can be abused for code obfuscation.

(More info: cppreference.com: Alternative operator representations)

Reference type

Example using a pointer:

int x;
int *ptr = &x;
*ptr = 2;

Example using a reference:

int x;
int &ref = x;
ref = 2;

Both pieces of example code above behave identically. Internally, the reference is implemented as a pointer. Some key differences are that a reference is never nullptr, a reference cannot be redefined (reseating), and a reference cannot be indexed/subscripted – a pointer can do all three of these things, but the functionality is often unneeded. References are essentially restricted pointers, and don’t really add new features (except possibly for checking nullptr at the time of assignment instead at the time of reading/writing the value). References do reduce the syntactic burden where you write continually write * to dereference a pointer. References tend to be more useful and idiomatic in C++ than pointers, but pointers are indispensible for some tasks still.

Reference passing

Passing a raw value by reference requires no symbol at the call site, whereas passing by pointer does. While this is convenient, it can easily hide the fact that another function can change the value of a variable even though no function has a pointer to the variable.

Type aliasing

C++11 introduces a new way to create type aliases. The new way uses a different keyword, and the ordering of the tokens is arguably more natural, especially for complex types such as arrays and functions.

Type conversion

C++ exploded the number of ways to convert between types. For primitive types, if the old C cast of (Type)val is valid, then the constructor notations (officially called function-style casts) of Type(val) (all C++ versions) and Type{val} (since C++11) are valid too. For example:

bool a = (...);
int b = (...);
long c = (...);
float d = (...);

char e = char(a);  // From bool
short f = short(b);  // Front int
typedef long long LL;  // Need this for multi-token types
long long g = LL(c);  // Can't just write: long long(c)
double h = double{d}  // Introduced in C++11;

The various language-level cast operators cover conversions on integers, constness, primitive pointers, object pointers, etc.: static_cast, const_cast, reinterpret_cast, dynamic_cast.

For structs and classes, a unary constructor without the explicit designation can be used as an implicit cast:

class Foo {
public:
  Foo(int x) {}
  explicit Foo(char *y) {}
};

int a = (...);
char *b = (...);
Foo c = a;  // OK
Foo d = b;  // Compile-time error
Variable initializer

A variable can be initialized in 3 possible ways, with different semantics with respect to which constructor is called, the assignment operator, and variable-length lists:

Foo x = w;  // C style
Foo y(w);  // C++ style
Foo z{w};  // C++11 and above
Nullary function

In C++, these two constructs are synonyms, and the simpler form with () is preferred over (void). In C, the form with (void) means that the function must take no arguments, whereas the form with () has complicated semantics that can lead to subtle errors; hence the the form with (void) is strongly recommended in C.

(More info: Stack Overflow: Is there a difference between foo(void) and foo() in C++ or C?, Stack Overflow: Is it better to use C void arguments “void foo(void)” or not “void foo()”?)

Nullary constructor call

When creating an object on the heap with new and calling a zero-argument constructor, there are 3 possible notations, with the last two being semantically equivalent:

Foo *u = new Foo;
Foo *v = new Foo();
Foo *w = new Foo{};

When creating an object on the stack and calling a zero-argument constructor, the parentheses option is not available because that would declare a function prototype instead:

Foo x;  // OK
Foo y();  // Different meaning
Foo z{};  // OK

Now consider these class definitions:

// POD (plain old data) type
class A { public: int i; };

// Non-POD type, and compiler provides default constructor
class B { public: int i;  ~B() {} };

// Explicit constructor without initialization
class C { public: int i;  C() {} };

// Explicit constructor with initialization
class D { public: int i;  D() { i=1; } };

If we create an object of each type without parentheses/braces (e.g. A *p = new A;), then:

  • an object of type A will have i uninitialized.

  • an object of type B will have i uninitialized.

  • an object of type C will have the C() constructor called and i uninitialized.

  • an object of type D will have the D() constructor called and i initialized to 1.

Whereas if we create an object of each type with parentheses/braces (e.g. A *p = new A();):

  • an object of type A will have i default-initialized to 0.

  • an object of type B will have i default-initialized to 0.

  • an object of type C will have the C() constructor called and i uninitialized.

  • an object of type D will have the D() constructor called and i initialized to 1.

As we can see, the parentheses/braces are optional when the target type has a default constructor explicitly defined. Otherwise, the parentheses/braces will force default inititialization.

(More info: Stack Overflow: Do the parentheses after the type name make a difference with new?)

Optional argument

A function can be declared with default argument values for optional parameters:

int foo(int bar=0) { ... }
print(foo());  // Equivalent to print(foo(0))

However, the above construct is a special case of the more general and powerful mechanism of function overloading:

int foo() { return foo(0); }
int foo(int bar) { ... }
print(foo());  // Calls the top definition, which leads to foo(0)

By comparison, Python only has default arguments, and Java only has method overloading.

Return type

The classic C syntax (also adopted in C++, C#, D, Java, etc.) places the return type in front of the function name:

int main(...) { ... }

C++11 allows the keyword auto as a dummy return type, then have the actual return type declared after the argument list and an arrow:

auto main(...) -> int { ... }

The functional benefit of this style is that the trailing return syntax allows the return type to depend on the arguments.

The trailing style could aid readability. Perhaps because of this, many new languages like Scala, Go, Rust, Swift, etc. declare functions in this way.

Compound data

structs and classes can both contain the same things (fields, constructors, methods, nested classes, etc.) and can have parent classes, but they different with respect to default visibility level and possibly other subtle characteristics. The cleanest approach is to use a struct if it contains only fields and no other members, and a class when constructors and methods are needed.

Field initialization

The fields of a struct or class can be initialized in a few possible places:

class Foo {
  int x = 0;
  int y;
  int z;
  Foo () : y(1) {
    z = 2;
  }
};

The constructor’s initializer list (between the colon and opening brace) is mandatory for variables with a reference type or a type without a default constructor.

Note that Java suffers from three choices too, with two of them being syntactically identical to C++:

class Bar {
  int x = 0;
  int y;
  int z;
  {  // Instance initializer block (rarely used)
    y = 1;
  }
  public Bar() {
    z = 2;
  }
}
Member hiding

Members outside of classes can be confined to the compilation unit by adding static to the declaration:

static int counter = 0;
static void func() { ... }

Members outside of classes can also be confined to the compilation unit by putting them inside an anonymous namespace:

namespace {
  int counter = 0;
  void func() { ... }
}

Members inside classes/structs are hidden with the private access modifier:

class Test {
  private: static int counter = 0;
  private: static void func() { ... }
};
Namespacing

Global-ish variables and functions can be placed inside a namespace or as static members inside a class:

namespace Alpha {
  int gamma;
  void delta();
}

class Beta {
  static int gamma;
  static void delta();
};

// Same usage syntax
print(Alpha::gamma);
print(Alpha::delta());
print(Beta::gamma);
print(Beta::delta());
Generic code

Some forms of generic code are expressible using C preprocessor macros:

#define MAX(x, y)  ((x) >= (y) ? (x) : (y))

But C++ templates are far more type-safe and powerful:

template <typename T>
T max(T x, T y) {
  return x >= y ? x : y;
}
Type parameter

A template with type parameters can be specified with class (old style, discouraged) or typename (modern style).

Variable function

Function pointers are one way to convey a variable function (this comes from C):

int foo() { ... }
int bar() { ... }
int (*chosen)() = choice ? foo : bar;
print(chosen());

Objects with virtual methods are another way to convey a variable function (and this is the only way in Java):

class Base {
  virtual int doIt();
};
class Foo : Base {
  virtual int doIt() { ... }
};
class Bar : Base {
  virtual int doIt() { ... }
};

Base *chosen = choice ? new Foo() : new Bar();
print(chosen->doIt());

Lambda expressions (introduced in C++11) provide a new way to convey a variable function:

auto foo = []() { return 0; };
auto bar = []() { return 1; };
int (*chosen)() = choice ? foo : bar;
Heap allocation

The idiomatic C++ way to allocate an object on the heap is to use the new operator:

class Foo { ... };
Foo *x = new Foo;
Foo *y = new Foo[10];

An alternative way that allows lower level control is to use malloc() (from C) and manually call placement-new:

Foo *x = malloc(sizeof(Foo));
new (x) Foo;
Foo *y = malloc(10 * sizeof(Foo));
new (&y[0]) Foo;
new (&y[1]) Foo;
(... et cetera ...)
Heap deallocation

When a heap object is allocated with malloc() (both scalars and arrays), simply call free() on the pointer.

When a single heap object is allocated with new, it must be released with delete. But an array of heap objects like ptr = new Type[n] must be released with delete[] ptr. The distinction between delete and delete[] must be carefully respected, or else undefined behavior occurs.

Character string

Raw C strings are popular in C++ but cumbersome when it comes to memory allocation:

#include <string.h>
char buffer[100] = "Hello";  // Need to set size
strcat(buffer, " world");  // Need to avoid overrun

C++ provides a string library that handles memory allocation under the hood:

#include <string>
std::string str("Hello");
str += " world";
const char *cstr = str.c_str();  // Easy conversion
Sequence of values

C has arrays (supported natively in the language) and linked lists (supported manually through structs). C++ adds safer, more powerful, and more convenient implementations of the sequence ADT, primarily std::vector, std::array, and std::list (linked list).

Access into sequence

An array is accessed by an integer index:

int *a = (...);
int index = 5;
print(a[index]);

A vector can be accessed by index or iterator:

std::vector<int> b = (...);
print(b[index]);  // No bounds checking
print(b.at(index));  // Bounds-checked
std::vector<int>::iterator it = b.begin();
++it;
print(*it);  // Same as b[1]
Array filling

C only has the memset() function to fill a block of memory with a repeated char-sized value. It is mainly useful for setting to zero, or occasionally to 0xFF. It cannot fill a multi-byte value or work with specific struct fields. However, this simplicity and narrow scope makes it relatively easy to have an assembly-optimized implementation in the standard library.

The std::fill() function in C++ is essentially a loop that performs a value assignment on each element within a range. This means it works on types of any size, and also calls the appropriate constructor (with possible computations and side effects).

Array copying

The C way to copy an array of values is to call the memcpy() or memmove() function. This is also appropriate in C++ for arrays of numbers and simple structs.

The C++ way to copy a sequence of values is to call the std::copy() or std::copy_backward() function. Choosing which function to use is only relevant if the input and output ranges overlap; otherwise std::copy() is fine. Compared to memcpy(), the function std::copy() also works on std::vector and other container types with iterators, and will properly call the (possibly overridden) type assignment operator to set the destination values.

C standard library header

Almost all C++ code depend on features of the C standard library (which are a part of C++). Including a C standard library header file can be done in one of two ways:

#include <stdfoo.h>  // Old (compatible with C)
#include <cstdfoo>  // New (pure C++)

These ways are almost equivalent except for the subtle matter of namespacing. The first way guarantees that members will be available in the global namespace, e.g. size_t and printf(). The second way guarantees that members will be available in the std namespace, e.g. std::size_t and std::printf(). (Preprocessor macros have no namespace and are always global.) This means it is technically a mistake to #include <cstdint> and use the type uint32_t, because the type needs the std:: prefix. However, most compilers make both the global name and the std-namespaced name available, which masks this subtle error.

I/O library

The C way of doing I/O is through FILE* handles, fread() and fwrite(), and printf() and scanf() functions with format strings and variable-length arguments. Note that the stdio library covers I/O for the console, files, and strings.

The C++ way of doing I/O is through objects derived from the istream and ostream classes, calling instance methods, using the overloaded << and >> operators, and passing option objects into the overloaded operators. The functionality of C’s stdio is covered by multiple C++ headers such as iostream, fstream, sstream.

Random number generation

The RNG library of C is small, making it easy but weak at the same time. There is only one global generator state. srand() has a rather small range for a seed. RAND_MAX is often defined as 215−1 or 231−1, which makes it painful to generate large numbers (such as uint64) or double-precision floating-point numbers.

The RNG library of C++ is simultaneously fancy and intimidating. Each RNG is a separate object, and can be chosen from multiple implementations – linear congruential, Mersenne Twister, hardware RNG, etc. To generate a random number, you have to first define a distribution – such as integers in the range [a, b] or a Boolean with probability p – then call the distribution with the generator, i.e. double val = dist(gen);.

Exception handling

In C, the closest mechanism to modern exception handling are the pair of functions setjmp() and longjmp(). Otherwise, exceptional situations are conveyed through function return values, global status code variables/functions, or by signals.

The C++ exception mechanism with try, catch, and throw is used in many other languages. try blocks can be nested, and different catch blocks are used to catch different types of values that are thrown.

More info