Project Nayuki


Knuth’s -yllion number notation

Introduction

Donald Knuth proposed an alternative way to write large numbers into English phrases. This way uses fewer kinds of different words compared to the conventionally accepted notation. Take this number for example: 12,345,678,900,011.

In conventional English naming we write this number as: twelve trillion three hundred forty-five billion six hundred seventy-eight million nine hundred thousand eleven.

12345678900011
twelvethree hundred forty-fivesix hundred seventy-eightnine hundredeleven
trillionbillionmillionthousand

In the bottom row of the table above, we see that there is a new number name every three digits. That is to say, there is a name for 103 (thousand), 106 (million), 109 (billion), 1012 (trillion), etc.

In Knuth’s -yllion number notation, we group the digits as 12,3456;7890,0011, and write in words: twelve myriad thirty-four hundred fifty-six myllion seventy-eight hundred ninety myriad eleven.

12345678900011
twelvethirty-fourfifty-sixseventy-eightninety(zero)eleven
hundredhundred(hundred)
myriadmyriad
myllion

In this case, there is a new number name every time the number of digits doubles. For example, 104 is myriad, 108 is myllion, 1016 is byllion, 1032 is tryllion, etc.

Source code

Here is a program to demonstrate these two systems for naming numbers. Both the Java and Python implementations behave identically.

Supported features:

  • Printing a bunch of random large numbers to standard output as a demonstration. (Sample output)
  • Converting an integer −1066 < n < 1066 to conventional English short scale notation (thousand, million, billion, etc.).
  • Converting an integer −108192 < n < 108192 to Knuth’s -yllion notation in English (myriad, myllion, byllion, etc.).
  • Converting an integer −108192 < n < 108192 to Knuth’s -yllion notation in Chinese (, , , , etc.).
  • Adding separators for an integer in conventional English notation, e.g.: 12345678 → 12,345,678.
  • Adding separators for an integer in Knuth’s -yllion notation, e.g.: 12345678901234567890 → 1234:5678,9012;3456,7890.

Notes

  • Parsing conventional number notation is fairly easy for humans. You read a spelled number whose value is between 1 and 999, followed by a -illion word. Then you repeat this reading procedure on the next small block of words. At the large scale, this encoding scheme is quite “flat” in structure.
  • Parsing Knuth’s -yllion notation gets increasingly difficult with large numbers because you have to keep track of which words have been recently used, in order to determine how much of the previous phrase the -yllion word applies to. Overall, the phrases form a nested block structure that quickly gets confusing for average humans.
  • The spelling of names used for the -illion numbers are badly non-uniform. For example, 10(4+1)×3 is quadrillion but 10(14+1)×3 is quattuordecillion; 10(2+1)×3 is billion but 10(20+1)×3 is vigintillion. (Knuth’s -yllion notation inherits this spelling consistency problem too, but needs extremely large numbers before the problem appears in practice.)
  • When framed in the language of asymptotic analysis, we can say that for an arbitrary n-digit integer, conventional notation uses Θ(n) different words, whereas the -yllion notation uses only Θ(log n) different words.
  • All of this talk of spelling out large numbers in words is moot in practice. Both the conventional system and Knuth’s system are unwieldy and verbose. Realistically, large numbers are written out as plain numeric digits, possibly with some side annotations to keep track of how many digits were written. It is true that large numbers have practical applications and are routinely processed on computers for cryptography (e.g. 2048-bit RSA encryption). But the numbers are handled internally and never need to be seen by humans. Even when the numbers are displayed for debugging purposes, writing out the digits is sufficient, and using words to spell out the numbers is counterproductive.
  • The algorithms I implemented for converting numbers to phrases in conventional and Knuth’s notations are not the most efficient; I emphasized simple, clear code over absolute performance. For example, YllionEnglishNotation.numberToWords() repeatedly converts between String and BigInteger representations of a number, even though it is possible to convert to String just once. Also, the string concatenation logic can be made more efficient by passing a StringBuilder into the recursive function calls instead of producing immutable Strings as return values.

More info