Project Nayuki


Fast QR Code generator library

Introduction

This is an alternate, speed-optimized implementation of my QR library in Java. It contrasts with my standard/reference/slow library which optimizes for clarity and conciseness. The functionality of this fast library and its API are nearly identical to the other one, but it runs anywhere from 1.5× to 10× as fast. However, even the slow library takes less than 10 ms in the worst case, so the benefit of increasing speed is dubious at best.

The slow library is available in Java plus 5 other programming languages, but the fast one is only available in Java. To achieve speed-ups, I rewrote parts of the logic to implement caching and optimize low-level arithmetic, at the cost of making the architecture more complicated and the code longer. The fast library can be useful for applications that need to generate large batches of QR Code symbols. Two demo applications are provided to illustrate the usage.

Source code

Browse the project’s source code at GitHub: https://github.com/nayuki/QR-Code-generator/tree/master/java-fast

import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;
import javax.imageio.ImageIO;
import io.nayuki.fastqrcodegen.*;

// Simple operation
QrCode qr0 = QrCode.encodeText("Hello, world!", QrCode.Ecc.MEDIUM);
BufferedImage img = toImage(qr0, 4, 10);  // See QrCodeGeneratorDemo
ImageIO.write(img, "png", new File("qr-code.png"));

// Manual operation
List<QrSegment> segs = QrSegment.makeSegments("3141592653589793238462643383");
QrCode qr1 = QrCode.encodeSegments(segs, QrCode.Ecc.HIGH, 5, 5, 2, false);
for (int y = 0; y < qr1.size; y++) {
    for (int x = 0; x < qr1.size; x++) {
        (... paint qr1.getModule(x, y) ...)
    }
}

Design notes

Here is a high-level explanation of how I took my reference/slow library and changed the architecture and calculations to optimize for speed.

BitBuffer class
  • The slow bit buffer class is based on java.util.BitSet. The fast bit buffer is based on low-level manipulation of an array of 32-bit int values.

  • The slow buffer only reads or writes one bit at a time, whereas the fast buffer works with many consecutive bits that don’t cross a word boundary.

  • The speed of the bit buffer impacts the creation and concatenation of QR segments.

QrSegment class
  • The slow makeNumeric() uses Integer.parseInt(), but the fast version calculates directly on the character codes.

  • The slow makeAlphanumeric() uses String.indexOf(), whereas the fast version queries a look-up table once per character.

  • The slow isNumeric() and isAlphanumeric() use regular expressions to test the incoming string before starting to encode. The fast variants of the functions perform the character set checking while encoding output bits; if the string is invalid then the partially encoded result is discarded and an exception is thrown.

  • The constructor (which generally shouldn’t be used because it’s quite low-level) takes a raw array and length instead of a BitBuffer object, in order to reduce a layer of pointer indirection.

  • Overall, the speed improvement within this class shouldn’t be great. The main speed-up comes from QrSegment utilizing the new BitBuffer class.

ReedSolomonGenerator class
  • The QR Code standard uses between 7 to 30 error correction bytes per block. We can create Reed–Solomon encoders for given ECC sizes, and cache them for later reuse. These cached encoders include pre-computed multiplication tables.

  • Note that QR Code symbols of many versions and error correction levels may share the same RS ECC generator, due to having the same number of error correction bytes per block.

QrTemplate class
  • Some aspects of a QR Code symbol only depend on the version (size) and not on the user data being conveyed. These pieces of information can be pre-computed and cached, if you generate many QR Codes that have the same version number.

  • Most of the function patterns stay the same: finders, alignment, timing, version info block, and a single dark module. The format info (mask and error correction level) does vary. Hence, the invariant parts of a QR Code of a particular version are rendered onto an initially blank template.

  • There is also a temporary template that marks whether each module is for a function pattern versus a bit from a user data/ECC codeword.

  • The template computes and saves the zigzag scan. Hence the template knows, for each user data/ECC codeword bit, what location the bit is rendered at in the final QR Code symbol.

  • The template also saves all 8 mask patterns, with respect to the version number and avoidance of function pattern modules.

Memoizer class
  • This is a thread-safe function execution cache that is used for ReedSolomonGenerator and QrTemplate.

  • When a worker is given f and x and wants to compute f(x), it first checks if the dictionary (cache) already contains the answer. If the key-value mapping exists and the soft reference’s target is not null, then it immediately returns that value. Note that the soft reference could be cleared by the garbage collector at any time to reclaim memory. Otherwise, the worker checks if another worker is already trying to compute the same value; if so then it waits for the other worker to finish and then it rechecks the dictionary; otherwise it marks the entry as pending, runs the function, stores the result in the dictionary, notifies all waiting workers, and returns the answer.

QrCode class
  • The early parts of the QR Code encoding process are the same in the slow and fast versions: making segments, finding a version number and error correction level, and concatenating segments.

  • The computation of error correction bytes is performed using the fast, cacheable Reed–Solomon generator.

  • With the version number known and all the data plus ECC bytes, an appropriate QR Code template is cloned as the basis of this barcode.

  • The format info is computed and drawn. Then the codewords are drawn using the pre-computed zigzag scan.

  • If automatic masking is requested, then each of the 8 masks is applied, the penalty score is computed, and the mask is removed.

  • The getPenaltyScore() method fuses all the horizontal-scanning loops into one, and likewise for the vertical-scanning loops. This reduces the amount of overhead spent on extracting bits from specific locations in the QR Code symbol.

Automatic mask selection is an expensive feature that seems to resist effective optimization. Surprisingly in the fast library, the evaluation of one penalty score takes more time than all the other steps needed to render a QR Code symbol. When comparing the fast and slow libraries with the same input data and options, the speed-up is greater if manual masking is used, and less if automatic masking is used.


Benchmark timings

I measured the time to generate a single QR barcode, for all versions from 1 to 40, for all 4 error correction levels, for both the slow and fast Java libraries. The data for each case is a random lowercase string (which must be encoded in byte mode), such that the length is the longest supported by the particular version and error correction level. The platform is OpenJDK Java 12.0.1 (64-bit), Intel Core i5-4690 (Haswell) 3.50 GHz, Windows 8.1 Pro (64-bit).

Manual mask selection

The computations performed are rendering the main QR Code content, applying one mask, and not calculating the penalty score. All times are in microseconds (μs). The speed-up factor is the slow time divided by the corresponding fast time.

Version Slow L Slow M Slow Q Slow H Fast L Fast M Fast Q Fast H Speedup L Speedup M Speedup Q Speedup H
11310101043223.40×3.88×4.38×5.06×
21616171834325.04×4.37×5.99×10.33×
32427202054434.52×7.05×5.33×7.59×
43629312264445.71×7.47×8.74×6.23×
55144313056449.57×7.88×7.66×7.17×
65344434066558.61×8.03×8.16×8.14×
76452434376558.80×8.75×8.17×7.86×
88469565298779.68×8.89×8.32×7.89×
91108063591198810.18×9.00×7.71×7.56×
1010099797311111098.79×9.20×8.15×7.78×
111221249578131311109.20×9.50×8.38×7.41×
1215212110392161413129.66×8.79×8.07×7.59×
1317613811794181615139.91×8.84×8.01×7.13×
142071571161062017151510.11×9.02×7.64×7.31×
15201177154119212018169.42×9.03×8.39×7.28×
16235204154144242219199.71×9.18×8.00×7.58×
172722261801532724222010.01×9.34×8.35×7.58×
183142391951683126242210.07×9.07×8.08×7.51×
193302642071753329262410.03×9.10×8.08×7.31×
20357282243202363130279.88×9.01×8.18×7.47×
213863012492213833312910.02×9.07×8.13×7.56×
224153412832214237343110.00×9.29×8.29×7.19×
234683733042534640373410.08×9.25×8.30×7.49×
245023973282765043403710.02×9.18×8.29×7.53×
25499431352291514743399.85×9.22×8.22×7.41×
26558458358317565044439.93×9.19×8.09×7.43×
276204863963356253494510.00×9.09×8.16×7.43×
28649514425354655652489.99×9.14×8.24×7.44×
29690546446374696054519.95×9.11×8.18×7.40×
30733587479397746458549.93×9.15×8.22×7.37×
31777621503421786862579.93×9.14×8.11×7.36×
32823657539447837266619.94×9.12×8.19×7.34×
33871694565473887670649.92×9.11×8.13×7.35×
34920732594502938073689.93×9.10×8.09×7.37×
35967769621518978476709.99×9.18×8.14×7.37×
3610198106535511028880759.99×9.18×8.12×7.36×
37107284468657410793857810.00×9.12×8.12×7.32×
38112889072159911398898210.01×9.10×8.12×7.29×
39117493675763611810393879.98×9.10×8.10×7.33×
401233983796663123108989110.01×9.11×8.11×7.30×
Version Slow L Slow M Slow Q Slow H Fast L Fast M Fast Q Fast H Speedup L Speedup M Speedup Q Speedup H

Automatic mask selection

The computations performed are rendering the main QR Code content, iterating 8 times of {apply mask, calculate penalty score, remove mask}, and applying one final mask. All times are in microseconds (μs). The speed-up factor is the slow time divided by the corresponding fast time.

Version Slow L Slow M Slow Q Slow H Fast L Fast M Fast Q Fast H Speedup L Speedup M Speedup Q Speedup H
1102949392635654531.61×1.68×1.71×1.73×
2142137137137868078771.64×1.71×1.75×1.79×
31951911831831171081081061.66×1.76×1.70×1.73×
42552422422321511401391371.69×1.73×1.74×1.69×
53243102972961851761731741.75×1.76×1.72×1.70×
63853713703692242142132141.72×1.74×1.74×1.72×
74624434354332682572562551.72×1.72×1.70×1.70×
85535345205153163073043041.75×1.74×1.71×1.69×
96576256096053693573573561.78×1.75×1.71×1.70×
107307267107034224104134121.73×1.77×1.72×1.71×
118388438168014794724704731.75×1.79×1.74×1.69×
129649359189095435335355361.77×1.75×1.72×1.70×
1310861048103410116115986036021.78×1.75×1.72×1.68×
1412261180114311356826716746721.80×1.76×1.70×1.69×
1513361312129612637577477517501.77×1.76×1.73×1.68×
1614871462141514048368258288291.78×1.77×1.71×1.69×
1716491604156715429209089129111.79×1.77×1.72×1.69×
181818174717161694100599310029991.81×1.76×1.71×1.69×
19196519071863183310941085109110931.80×1.76×1.71×1.68×
20213520672041200411841179118411871.80×1.75×1.72×1.69×
21228222242187216512671273128012851.80×1.75×1.71×1.68×
22245524132377232213621372138213891.80×1.76×1.72×1.67×
23266525992556251814701477148714971.81×1.76×1.72×1.68×
24286927902749270515831591160516071.81×1.75×1.71×1.68×
25302629892942289916911699171717271.79×1.76×1.71×1.68×
26325431963126310118041820183018421.80×1.76×1.71×1.68×
27349833993349330419281943195619671.81×1.75×1.71×1.68×
28375936543580352420772086208920971.81×1.75×1.71×1.68×
29400038853803374122092218222422331.81×1.75×1.71×1.68×
30423741284037397223342349236423681.82×1.76×1.71×1.68×
31448643624265419824702481249225071.82×1.76×1.71×1.67×
32474346084521444226152630264926511.81×1.75×1.71×1.68×
33500148614769468627552768279127941.82×1.76×1.71×1.68×
34527851275022493729012917294429421.82×1.76×1.71×1.68×
35553553755275518630453062309230931.82×1.76×1.71×1.68×
36581856505540545032033220324932531.82×1.75×1.71×1.68×
37610859275821572133653379341034151.81×1.75×1.71×1.68×
38640762166105600035183546357535871.82×1.75×1.71×1.67×
39669965106393628736963710374337421.81×1.75×1.71×1.68×
40702268346705658638553886392039321.82×1.76×1.71×1.68×
Version Slow L Slow M Slow Q Slow H Fast L Fast M Fast Q Fast H Speedup L Speedup M Speedup Q Speedup H

Observations

  • For manual masking, the fast library is about 8× the speed of the slow library. It represents a good payoff for the amount of new code written.

  • For automatic masking, the speed-up is only about 1.7×. This figure does not adequately justify the effort and complexity of the new code.

  • For the fast library, manual masking is about 40 times faster than automatic masking. Using manual masking is a good choice if the input already has good randomness properties, such as the output of a hash function.

  • For a given version, using less error correction (and storing more useful data) takes somewhat more computation time. This is entirely due to how much work the Reed–Solomon encoding performs; it is unaffected by the templates and rendering logic.