Project Nayuki

Optimal text segmentation for QR Codes


The QR Code standard offers multiple ways to encode text. The text string can be split into segments, and each segment can have a different mode. The primary modes are numeric, alphanumeric, byte, and kanji. Each mode supports a different set of representable characters and has a different bit cost per character.

The demo here shows that we can find the optimal way to split any given string into QR Code segments with the best modes. The technique of dynamic programming is used to minimize the total number of bits to encode.

Live application (JavaScript)

User input

Show example text:
Text string:
Error correction level:
Force minimum version: (between 1 and 40)

Program output

Optimal text split

Segment details

Idx Mode Head bits Data bits Seg bits Chars Text string

Table column explanations:

Various statistics

Property Optimal Byte-only Savings
Text length (code points)
Number of segments N/A
QR Code version
Total segment bits

Table explanations:


The algorithm

Basic info on how QR Code segments work:

Let s be a text string we want to segment optimally, and let n be the length of s in code points. Let v be the QR Code version we are using. First, define two mutually recursive functions:

Second, we reason about the base case. For each mode m, let f(0, m) = (bit length of empty segment of mode m) = 4 + (number of character count bits for a segment of mode m at version v).

Third, we reason about the inductive step:

Now we have computed a table of bit costs, which is f(i, m) for 0 ≤ in and for m in all 4 possible modes. Initialize a blank array c of length n to store the best mode needed to encode each code point.

Fourth, we work backward in the cost table to figure out the best mode for each code point index:

  1. Find the mode m that minimizes f(n, m). This tells us the best mode be in after segmenting the whole text string.

  2. Update m = h(n, m) so that m represents the best mode to encode the code point at index n−1, and store c[n−1] = m.

  3. Update m = h(n−1, m), and store c[n−2] = m.

  4. Repeat until we finish storing to c[0].

Finally, with the array c representing the best mode for each code point in the input string, we scan the array forward, starting a new segment whenever the mode changes, and putting each current code point into the current open segment.