Fast discrete cosine transform algorithms
The discrete cosine transform (DCT) is the most popularly used signal processing tool for compressing images and sounds, found in standards such as JPEG and MP3. (Less often used methods include wavelet transforms, polyphase filters, Hadamard transforms, etc.) However, algorithms for computing the DCT quickly are not wellknown. The formulas for the naive Θ(n^{2}) algorithms are often cited, but desirably fast Θ(n log n) algorithms are rarely described in detail. Figuring out how to compute the DCT quickly and efficiently is not obvious.
By contrast, the discrete Fourier transform (DFT) is popular for frequency analysis and visualization (e.g. spectrograms), and many kinds of image/audio processing, but is rarely used for compression. The CooleyTukey radix2 fast Fourier transform (FFT) algorithm is wellknown, and the code is readily available from too many independent sources. Moreover, the math that underlies the algorithm can be derived or verified with a math education at a late high school or early university undergraduate level.
Here I provide referencequality fast DCT algorithms, which I learned from reading papers. After I found some descriptions of fast DCT algorithms with reasonable clarity, I converted the formulas and diagrams into real executable code, and resolved mistakes/ambiguities/omissions along the way. Then I implemented the naive algorithms for reference, added a test suite to compare the naive algorithms against the fast algorithms, and ensured the tests passed. Although I don’t personally understand these fast DCT algorithms nor have I verified the mathematical formulas, the unit tests against the naive algorithm provide sufficient evidence that they are designed and implemented correctly.
Note that it’s not hard to use an FFT to compute a fast forward/inverse DCT in the desired Θ(n log n) asymptotic run time – but that approach incurs a small constant factor penalty in time and space for storing and operating on redundant numbers. Thus the focus here is on pure approaches to computing the DCT, which don’t use complex numbers, padding, etc.
Source code
 Java (SE 7+)
 Python (compatible with 2 and 3)
 C (C99 and above)
 C++ (C++11 and above)
 JavaScript
 TypeScript

 FastDct.ts
The public API is identical to the handwritten JavaScript version.  fft.ts
 Reuse the test infrastructure from the JavaScript version
 FastDct.ts
 C# (.NET 4.0+)
 Rust
Note: The fast 8point DCT algorithms implemented here use different scaling conventions than the other 3 DCT algorithms (naive, Lee, FFT). The test suite code shows how to make the values match.
More info
 Stanford University: EE398A  Image and Video Compression: Transform Coding.
(Fast scaled length8 DCTII and DCTIII algorithms, by Arai, Agui, Nakajima (新井 幸宏、 安居院 猛、 中嶋 正之), 1988.)  Design and Implementation of an MPEG1 Layer III Audio Decoder.
(Mentions Lee’s fast unscaled DCTII algorithm for powerof2 lengths. This is the forward transform.)  A New Algorithm to Compute the Discrete Cosine Transform, by Byeong Gi Lee (이병기), 1984.
(A fast unscaled DCTIII algorithm for powerof2 lengths. This is the inverse transform.)  Signal Processing Stack Exchange: Fast Cosine Transform via FFT.
(Summarizes efficient packing methods from a paper by John I. Makhoul.)