Fast discrete cosine transform algorithms

The discrete cosine transform is the most popularly used signal processing tool for compressing images and sounds, found in standards such as JPEG and MP3. (Less often used methods include wavelet transforms, polyphase filters, Hadamard transforms, etc.) However, algorithms for computing the DCT quickly are not well-known. The formulas for the naive Θ(n²) algorithms are often cited, but desirably fast Θ(n log n) algorithms are rarely described in detail. Figuring out how to compute the DCT quickly and efficiently is not obvious.

By contrast, the discrete Fourier transform (DFT) is popular for frequency analysis and visualization (e.g. spectrograms), and many kinds of image/audio processing, but is rarely used for compression. The Cooley–Tukey radix-2 fast Fourier transform algorithm is well-known, and the code is readily available from too many independent sources. Moreover, the math that underlies the algorithm can be derived or verified with a math education at a late high school or early university undergraduate level.

Here I provide reference-quality fast DCT algorithms, which I learned from reading papers. After I found some descriptions of fast DCT algorithms with reasonable clarity, I converted the formulas and diagrams into real executable code, and resolved mistakes/ambiguities/omissions along the way. Then I implemented the naive algorithms for reference, added a test suite to compare the naive algorithms against the fast algorithms, and ensured the tests passed. Although I don’t personally understand these fast DCT algorithms nor have I verified the mathematical formulas, the unit tests against the naive algorithm provide sufficient evidence that they are designed and implemented correctly.

Note that it’s not hard to use an FFT to compute a fast forward/inverse DCT in the desired Θ(n log n) asymptotic run time – but that approach incurs a small constant factor penalty in time and space for storing and operating on redundant numbers. Thus the focus here is on pure approaches to computing the DCT, which don’t use complex numbers, padding, etc.

Source code

Java (SE 7+)

Python

C (C99 and above)

C++ (C++11 and above)

TypeScript / JavaScript

C# (.NET 4.0+)

Rust

Note: The fast 8-point DCT algorithms implemented here use different scaling conventions than the other 3 DCT algorithms (naive, Lee, FFT). The test suite code shows how to make the values match.

More info

Stanford University: EE398A - Image and Video Compression: Transform Coding.
(Fast scaled length-8 DCT-II and DCT-III algorithms, by Arai, Agui, Nakajima (新井幸宏、安居院猛、中嶋正之), 1988.)
Design and Implementation of an MPEG-1 Layer III Audio Decoder.
(Mentions Lee’s fast unscaled DCT-II algorithm for power-of-2 lengths. This is the forward transform.)
A New Algorithm to Compute the Discrete Cosine Transform, by Byeong Gi Lee (이병기), 1984.
(A fast unscaled DCT-III algorithm for power-of-2 lengths. This is the inverse transform.)
Signal Processing Stack Exchange: Fast Cosine Transform via FFT.
(Summarizes efficient packing methods from a paper by John I. Makhoul.)