Benchmark of Nayuki’s FLAC encoder
Introduction
I built my own FLAC encoder starting from basic principles, in order to gain a deep understanding of how the FLAC file format works. My codebase is still a work in progress with inadequate documentation, but runnable code can be found in its GitHub repository. The code is considered experimental, and any output should be verified with the Xiph.Org official FLAC tool’s test option (flac -t YourFile.flac
) before any serious usage/
The main feature I wanted to explore in my encoder was the use of variable block sizes. Xiph.Org’s documentation states in multiple places[0][1] that the reference encoder only implements logic to generate fixed/
In this benchmark I tested the output file size and execution time of FLAC encoders from these vendors:
- Xiph.Org’s official FLAC encoder (version 1.3.2, )
- Justin Ruggles’s Flake encoder (version 0.11, )
(one of the very few alternative implementations available) - Nayuki’s FLAC encoder ()
Summary of the best compression results for each vendor, under the subset and lax modes:
Vendor | Subset/Lax | Song A | Song B | Song C | Song D |
---|---|---|---|---|---|
Xiph.Org | Subset | 11 196 245 | 11 539 714 | 38 167 311 | 31 371 358 |
Flake | Subset | 11 227 995 | 11 563 030 | 38 235 155 | 31 402 095 |
Nayuki | Subset | 11 198 081 | 11 333 596 | 38 227 336 | 31 366 408 |
Xiph.Org | Lax | 11 105 532 | 11 532 241 | 38 093 818 | 31 232 660 |
Flake | Lax | 11 092 458 | 11 433 924 | 38 132 915 | 31 115 846 |
Nayuki | Lax | 11 032 067 | 11 288 787 | 38 081 574 | 31 032 849 |
We can see that in the subset format, Xiph.Org and Nayuki each take the lead half of the time, and Flake trails behind noticeably. When there are no format constraints, Flake is often better than Xiph.Org, and Nayuki’s encoder consistently produces the smallest output – though at the cost of immense CPU time.
The FLAC subset format is produced by the official FLAC tool by default, and is recommended for best compatibility toward embedded music players with limited CPU/memory. Because the official tool forces you add the --lax
option to escape the subset format constraints, it suggests that they don’t recommend you to do it. Therefore in this benchmark, I clearly label the results achieved under the subset format versus the results achieved under no constraints (“lax”). Having access to the full/lax FLAC format is important because it dramatically raises the maximum LPC order, Rice partition order, and block size. In practice this saves around 100 KB for a typical 5-minute song (or roughly 0.3 absolute percentage points).
Proof of smallest output files: nayuki-flac-encoder-benchmark-songs.torrent (total file size 91.4 MB)
Test clips
- Song A (2 min 19 sec) (YouTube sample)
-
Title: Carmen: Prelude to Act 1
Composer: Georges Bizet
Album: Philips Classics – Music for the Millions – Disc 3Comment: Orchestral non-vocal music, featuring many tonal instruments and some percussion.
- Song B (3 min 38 sec) (YouTube sample)
-
Title: Sonata in A, K. 331: 3. Rondo “alla turca”
Composer: Wolfgang Amadeus Mozart
Album: Philips Classics – Music for the Millions – Disc 5Comment: Pure piano music, with simple tones that FLAC can compress easily.
- Song C (4 min 32 sec) (YouTube sample)
-
Title: CLICK
Vocals: ClariS
Anime series: Nisekoi (ニセコイ)Comment: Loud pop music with layers of vocals, tonal instruments, and percussion – a relatively high-entropy situation that FLAC compresses poorly.
- Song D (4 min 53 sec) (YouTube sample)
-
Title: Hello Alone -Yui Ballade-
Vocals: Touyama Nao (東山奈央)
Anime series: Yahari Ore no Seishun Love Comedy wa Machigatteiru. (やはり俺の青春ラブコメはまちがっている。)Comment: Soft pop music with one voice and a small set of instruments – it compresses more readily than loud pop music.
Note: The links to YouTube copies of these songs are for listening only, not for encoder testing. Lossy and lossless copies of clips #2 and #3 can be found online (as they are pop music). But the exact versions of clips #0 and #1 produced by Philips Classics are hard or impossible to find – if you search online you will almost certainly encounter a different performance/
File size benchmark
The following table shows the output .flac file size (in bytes) when encoded with various parameters/
Vendor | Profile | Song A | Song B | Song C | Song D |
---|---|---|---|---|---|
Raw samples | 24 550 176 | 38 542 224 | 47 987 856 | 51 767 520 | |
Xiph.Org | 00 (Subset) | 12 012 649 | 12 765 650 | 40 260 345 | 34 644 421 |
Xiph.Org | 01 (Subset) | 12 041 679 | 12 605 764 | 39 654 273 | 33 656 746 |
Xiph.Org | 02 (Subset) | 11 989 312 | 12 584 084 | 39 642 052 | 33 514 701 |
Xiph.Org | 03 (Subset) | 11 331 657 | 12 124 438 | 39 134 639 | 32 687 770 |
Xiph.Org | 04 (Subset) | 11 335 394 | 11 992 890 | 38 318 774 | 31 703 722 |
Xiph.Org | 05 (Subset) | 11 292 888 | 11 916 555 | 38 314 870 | 31 615 686 |
Xiph.Org | 06 (Subset) | 11 260 189 | 11 735 452 | 38 301 526 | 31 532 011 |
Xiph.Org | 07 (Subset) | 11 228 542 | 11 759 302 | 38 215 718 | 31 449 049 |
Xiph.Org | 08 (Subset) | 11 217 563 | 11 688 041 | 38 210 760 | 31 427 443 |
Xiph.Org | 09 (Subset) | 11 217 561 | 11 688 041 | 38 210 760 | 31 427 443 |
Xiph.Org | 10 (Subset) | 11 219 012 | 11 694 989 | 38 202 896 | 31 417 135 |
Xiph.Org | 11 (Subset) | 11 202 102 | 11 567 040 | 38 198 970 | 31 382 581 |
Xiph.Org | 12 (Subset) | 11 196 245 | 11 539 714 | 38 167 311 | 31 371 358 |
Flake | 00 (Subset) | 11 366 474 | 11 918 232 | 38 465 695 | 31 836 311 |
Flake | 01 (Subset) | 11 305 452 | 11 704 346 | 38 354 061 | 31 596 711 |
Flake | 02 (Subset) | 11 276 927 | 11 703 583 | 38 278 805 | 31 478 365 |
Flake | 03 (Subset) | 11 251 067 | 11 619 430 | 38 241 267 | 31 406 576 |
Flake | 04 (Subset) | 11 248 041 | 11 593 401 | 38 241 150 | 31 404 713 |
Flake | 05 (Subset) | 11 231 746 | 11 573 035 | 38 245 383 | 31 411 903 |
Flake | 06 (Subset) | 11 227 995 | 11 563 030 | 38 235 155 | 31 402 095 |
Nayuki | 00 (Subset) | 11 948 303 | 12 501 207 | 39 562 748 | 33 409 321 |
Nayuki | 01 (Subset) | 11 264 792 | 11 402 799 | 38 329 603 | 31 502 982 |
Nayuki | 02 (Subset) | 11 223 624 | 11 367 675 | 38 246 673 | 31 390 886 |
Nayuki | 03 (Subset) | 11 217 348 | 11 356 585 | 38 244 567 | 31 382 724 |
Nayuki | 04 (Subset) | 11 200 349 | 11 341 761 | 38 229 105 | 31 371 984 |
Nayuki | 05 (Subset) | 11 199 483 | 11 338 107 | 38 228 320 | 31 369 881 |
Nayuki | 06 (Subset) | 11 198 810 | 11 335 321 | 38 227 899 | 31 368 308 |
Nayuki | 07 (Subset) | 11 198 081 | 11 333 596 | 38 227 336 | 31 366 408 |
Xiph.Org | 13 (Lax) | 11 157 753 | 11 714 779 | 38 157 743 | 31 334 704 |
Xiph.Org | 14 (Lax) | 11 157 751 | 11 714 779 | 38 157 743 | 31 334 704 |
Xiph.Org | 15 (Lax) | 11 139 539 | 11 560 395 | 38 197 497 | 31 315 586 |
Xiph.Org | 16 (Lax) | 11 113 210 | 11 559 055 | 38 145 088 | 31 258 435 |
Xiph.Org | 17 (Lax) | 11 114 838 | 11 578 463 | 38 108 506 | 31 232 660 |
Xiph.Org | 18 (Lax) | 11 128 191 | 11 610 087 | 38 097 994 | 31 235 965 |
Xiph.Org | 19 (Lax) | 11 105 532 | 11 532 241 | 38 093 818 | 31 241 266 |
Flake | 07 (Lax) | 11 134 291 | 11 592 342 | 38 183 267 | 31 218 520 |
Flake | 08 (Lax) | 11 130 256 | 11 566 606 | 38 180 273 | 31 210 587 |
Flake | 09 (Lax) | 11 120 167 | 11 548 126 | 38 188 437 | 31 227 148 |
Flake | 10 (Lax) | 11 120 457 | 11 541 471 | 38 178 257 | 31 215 731 |
Flake | 11 (Lax) | 11 097 885 | 11 471 292 | 38 145 971 | 31 139 616 |
Flake | 12 (Lax) | 11 097 541 | 11 459 784 | 38 141 017 | 31 130 928 |
Flake | 13 (Lax) | 11 102 137 | 11 453 551 | 38 137 336 | 31 122 361 |
Flake | 14 (Lax) | 11 092 458 | 11 433 924 | 38 132 915 | 31 115 846 |
Nayuki | 08 (Lax) | 11 157 900 | 11 370 585 | 38 222 635 | 31 294 038 |
Nayuki | 09 (Lax) | 11 115 962 | 11 377 733 | 38 261 969 | 31 248 579 |
Nayuki | 10 (Lax) | 11 184 068 | 11 317 177 | 38 182 939 | 31 319 823 |
Nayuki | 11 (Lax) | 11 050 146 | 11 309 086 | 38 126 946 | 31 078 568 |
Nayuki | 12 (Lax) | 11 172 036 | 11 305 836 | 38 158 824 | 31 301 848 |
Nayuki | 13 (Lax) | 11 033 035 | 11 291 614 | 38 081 695 | 31 034 557 |
Nayuki | 14 (Lax) | 11 032 067 | 11 288 787 | 38 081 574 | 31 032 849 |
Notes: The size of the raw samples includes only the audio data, but not even critical metadata like sample rate, bit depth, and number of channels. Dividing the .flac file size by the raw samples size gives the compression ratio, where smaller is better (e.g. 38081695÷47987856 ≈ 0.794 for Song C). A minimal WAV file is always 44 bytes larger than the raw samples alone, and does contain critical metadata necessary for playback.
Xiph.Org: The preset compression levels from -0
to -8
(represented in my profiles #00 to #08) are all so fast that you might as well use -8
to get the best compression. Looking at profiles #15 to #18, increasing the block size from 3072 to 8192 samples sometimes helps and sometimes hurts. By contrast, Nayuki’s encoder manages to use plenty of large 12288-sample blocks in the most efficient encoding of all the test clips. Exhaustive LPC order search (-e
) is one of the better things you can do (e.g. #14 vs. #16). The -p
option helps somewhat (#11 vs. #12, and #16 vs. #19). Increasing the maximum Rice partition order basically does nothing (#08 vs. #09, and #13 vs. #14).
Flake: Within the subset format, the best setting for Flake (even with variable block sizes enabled) cannot match for Xiph.Org’s smaller output size. However Flake pulls ahead under the lax format, and the best setting usually beats Xiph.Org under lax mode. We can see in profiles #08 to #10 that -v 0
(constant block size) is sometimes better than -v 1
(variable block size heuristic 1), but -v 2
(variable block size heuristic 2) is always better than both (but takes 5× more CPU time).
Nayuki, subset: When using the same constraints as Xiph.Org, such as the subset format with constant block size, Nayuki’s encoder basically cannot compete with Xiph.Org on output size (see Xiph.Org #12 vs. Nayuki #03). Because both encoders are exhaustively searching all the available LPC orders (up to 12), Rice partition orders (up to 8), and stereo coding modes, it implies that the only place where Nayuki’s encoder lags behind Xiph.Org is the calculation/
Nayuki, lax: Under lax mode with the full power of the FLAC file format, Nayuki’s variable block size encoding can finally beat Xiph.Org’s best output consistently (Xiph.Org #19 vs. Nayuki #14), but again at a huge cost of CPU time. Presumably Nayuki’s encoder is still disadvantaged compared to Xiph.Org in terms of LPC coefficients; if an encoder can combine the qualities of near-optimal variable block sizing with superior LPC coefficient selection, it would be interesting to see how much smaller the output could be.
There are tools to examine what block sizes are used in a FLAC file, which is helpful to know when a file uses variable block size encoding. You can use the official FLAC tool’s -a
option to analyze an existing .flac file, which generates a megabyte-long text file with information about every subframe of encoded audio. Also you can use Nayuki’s ShowFlacFileStats
tool, which prints histograms/
Encoding time benchmark
The table below shows the number of seconds to encode the uncompressed WAV file to a FLAC file. You can hover over cells in the “Profile” column to see the detailed list of encoding options. Cells are color-coded from red for slowest/
Vendor | Profile | Song A | Song B | Song C | Song D |
---|---|---|---|---|---|
Audio length | 139.2 | 218.5 | 272.0 | 293.5 | |
Xiph.Org | 00 (Subset) | 0.2 | 0.3 | 0.4 | 0.4 |
Xiph.Org | 01 (Subset) | 0.2 | 0.3 | 0.4 | 0.4 |
Xiph.Org | 02 (Subset) | 0.2 | 0.3 | 0.5 | 0.5 |
Xiph.Org | 03 (Subset) | 0.2 | 0.3 | 0.4 | 0.4 |
Xiph.Org | 04 (Subset) | 0.2 | 0.3 | 0.5 | 0.5 |
Xiph.Org | 05 (Subset) | 0.3 | 0.4 | 0.6 | 0.6 |
Xiph.Org | 06 (Subset) | 0.3 | 0.5 | 0.8 | 0.8 |
Xiph.Org | 07 (Subset) | 0.4 | 0.6 | 0.8 | 0.8 |
Xiph.Org | 08 (Subset) | 0.5 | 0.8 | 1.2 | 1.2 |
Xiph.Org | 09 (Subset) | 0.6 | 1.0 | 1.4 | 1.4 |
Xiph.Org | 10 (Subset) | 0.6 | 1.0 | 1.4 | 1.4 |
Xiph.Org | 11 (Subset) | 2.7 | 4.1 | 5.7 | 6.0 |
Xiph.Org | 12 (Subset) | 20.1 | 31.6 | 42.6 | 44.7 |
Xiph.Org | 13 (Lax) | 2.1 | 2.7 | 4.3 | 4.8 |
Xiph.Org | 14 (Lax) | 2.3 | 3.0 | 4.9 | 5.2 |
Xiph.Org | 15 (Lax) | 30.1 | 44.4 | 63.4 | 66.6 |
Xiph.Org | 16 (Lax) | 30.4 | 44.5 | 64.0 | 67.1 |
Xiph.Org | 17 (Lax) | 29.9 | 43.6 | 62.8 | 65.7 |
Xiph.Org | 18 (Lax) | 30.2 | 43.7 | 63.7 | 66.4 |
Xiph.Org | 19 (Lax) | 225.3 | 330.2 | 474.1 | 497.7 |
Flake | 00 (Subset) | 0.6 | 0.9 | 1.3 | 1.4 |
Flake | 01 (Subset) | 0.8 | 1.2 | 1.7 | 1.8 |
Flake | 02 (Subset) | 0.9 | 1.4 | 1.9 | 2.0 |
Flake | 03 (Subset) | 1.1 | 1.7 | 2.2 | 2.4 |
Flake | 04 (Subset) | 1.6 | 2.6 | 3.4 | 3.6 |
Flake | 05 (Subset) | 1.7 | 2.6 | 3.4 | 3.7 |
Flake | 06 (Subset) | 8.6 | 14.0 | 17.6 | 19.2 |
Flake | 07 (Lax) | 1.9 | 3.0 | 4.0 | 4.4 |
Flake | 08 (Lax) | 5.0 | 7.9 | 10.1 | 11.0 |
Flake | 09 (Lax) | 5.0 | 7.9 | 10.2 | 11.1 |
Flake | 10 (Lax) | 25.5 | 40.9 | 51.9 | 56.5 |
Flake | 11 (Lax) | 4.9 | 7.8 | 10.0 | 10.9 |
Flake | 12 (Lax) | 25.2 | 40.4 | 51.2 | 55.8 |
Flake | 13 (Lax) | 4.8 | 7.7 | 9.7 | 10.6 |
Flake | 14 (Lax) | 24.3 | 39.0 | 49.5 | 53.9 |
Nayuki | 00 (Subset) | 7.7 | 10.3 | 17.9 | 17.3 |
Nayuki | 01 (Subset) | 9.9 | 14.5 | 23.5 | 23.1 |
Nayuki | 02 (Subset) | 13.6 | 20.3 | 31.2 | 31.3 |
Nayuki | 03 (Subset) | 121.4 | 193.4 | 268.4 | 283.0 |
Nayuki | 04 (Subset) | 357.8 | 561.3 | 764.9 | 795.2 |
Nayuki | 05 (Subset) | 642.9 | 1003.1 | 1344.1 | 1426.9 |
Nayuki | 06 (Subset) | 1180.3 | 1862.0 | 2456.9 | 2583.3 |
Nayuki | 07 (Subset) | 2160.2 | 3465.1 | 4357.3 | 4678.9 |
Nayuki | 08 (Lax) | 26.4 | 40.8 | 58.1 | 60.2 |
Nayuki | 09 (Lax) | 40.5 | 63.0 | 87.0 | 91.1 |
Nayuki | 10 (Lax) | 110.2 | 172.0 | 234.4 | 245.1 |
Nayuki | 11 (Lax) | 369.3 | 583.9 | 773.1 | 821.0 |
Nayuki | 12 (Lax) | 831.1 | 1305.7 | 1758.2 | 1844.7 |
Nayuki | 13 (Lax) | 2775.7 | 4398.4 | 5816.8 | 6187.1 |
Nayuki | 14 (Lax) | 4430.0 | 7017.4 | 9189.0 | 9754.3 |
Notes: The audio length can be used to gauge how fast an encoder is relative to real time. In terms of raw data rate, all the test clips are 44100 Hz, 16-bit, stereo (CD quality). Benchmarks were performed on Windows 8.1 Pro 64-bit, Intel Core i5-4690, 3.50 GHz, single-core operation. For Xiph.Org FLAC 1.3.2, the 64-bit EXE was used. Flake 0.11 only has a 32-bit EXE. Nayuki’s FLAC encoder was run on the Oracle Java 1.8.0_45 64-bit JVM. Files are read from and written to an SSD, so I/O time is considered negligible.
Xiph.Org’s encoder is fast for all but the ludicrous exhaustive searches. The strongest standard (non-custom) option of -8
runs well over 100× real-time speed. Even in lax mode with the maximum LPC order of 32 (-l 32
) and exhaustive LPC order search (-e
), it still runs several times faster than real time. Only when exhaustive LPC coefficient optimization is enabled (-p
) does the amount of time start to become unreasonable (e.g. profile #19 in my trials). Beyond the settings I explored, not many options remain to increase this encoder’s search space and run time – the only ones left are the block size and list of apodization functions. All in all, the reference encoder implementation is a solid, efficient piece of engineering.
Flake is also quite fast, in more or less the same order of magnitude as Xiph.Org’s encoder. Indeed the author stated fast encoding as a feature on their project page. I would say that for the same computation time it produces roughly the same compression ratio as Xiph.Org. It appears that the most intensive setting is still faster than some of Xiph.Org’s intensive settings, and I have exhausted the ways to make Flake search more thoroughly. The variable block size option -v 2
is about 5 times slower than -v 1
(see profile #05 vs. #06, #13 vs. #14, etc.), but almost always produces smaller files. The option value -v 1
takes the same amount of time as -v 0
(constant block size), but sometimes produces bigger files and sometimes smaller ones (see #04 vs. #05, and #08 vs. #09)
Nayuki’s encoder is slow for a myriad of reasons. It is written in Java instead of C plus assembly/
Encoding profiles used
This table lists the details for the profiles used in benchmarking. For Xiph.Org and Flake, the command line options/--totally-silent
or -q
to suppress needless printing for batch processing). For Nayuki’s encoder, values passed into the Java API are shown (it doesn’t support command line options); their meanings won’t be explicitly explained until a proper article for the encoder/
Choosing which set of options to test/
Vendor | Profile # | Subset/Lax | Details |
---|---|---|---|
Xiph.Org | 00 | Subset | --no-padding --no-seektable -0 |
Xiph.Org | 01 | Subset | --no-padding --no-seektable -1 |
Xiph.Org | 02 | Subset | --no-padding --no-seektable -2 |
Xiph.Org | 03 | Subset | --no-padding --no-seektable -3 |
Xiph.Org | 04 | Subset | --no-padding --no-seektable -4 |
Xiph.Org | 05 | Subset | --no-padding --no-seektable -5 |
Xiph.Org | 06 | Subset | --no-padding --no-seektable -6 |
Xiph.Org | 07 | Subset | --no-padding --no-seektable -7 |
Xiph.Org | 08 | Subset | --no-padding --no-seektable -8 |
Xiph.Org | 09 | Subset | --no-padding --no-seektable -l 12 -b 4096 -m -r 8 -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 10 | Subset | --no-padding --no-seektable -l 12 -b 4608 -m -r 8 -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 11 | Subset | --no-padding --no-seektable -l 12 -b 4608 -m -r 8 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 12 | Subset | --no-padding --no-seektable -l 12 -b 4608 -m -r 8 -e -p -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 13 | Lax | --no-padding --no-seektable --lax -l 32 -b 4096 -m -r 6 -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 14 | Lax | --no-padding --no-seektable --lax -l 32 -b 4096 -m -r 15 -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 15 | Lax | --no-padding --no-seektable --lax -l 32 -b 3072 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 16 | Lax | --no-padding --no-seektable --lax -l 32 -b 4096 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 17 | Lax | --no-padding --no-seektable --lax -l 32 -b 6144 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 18 | Lax | --no-padding --no-seektable --lax -l 32 -b 8192 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Xiph.Org | 19 | Lax | --no-padding --no-seektable --lax -l 32 -b 4096 -m -r 15 -e -p -A tukey(0.5);partial_tukey(2);punchout_tukey(3) |
Flake | 00 | Subset | -p 0 -6 |
Flake | 01 | Subset | -p 0 -7 |
Flake | 02 | Subset | -p 0 -8 |
Flake | 03 | Subset | -p 0 -9 |
Flake | 04 | Subset | -p 0 -10 |
Flake | 05 | Subset | -p 0 -10 -v 1 |
Flake | 06 | Subset | -p 0 -10 -v 2 |
Flake | 07 | Lax | -p 0 -11 |
Flake | 08 | Lax | -p 0 -12 |
Flake | 09 | Lax | -p 0 -12 -v 1 |
Flake | 10 | Lax | -p 0 -12 -v 2 |
Flake | 11 | Lax | -p 0 -12 -b 8192 -v 1 |
Flake | 12 | Lax | -p 0 -12 -b 8192 -v 2 |
Flake | 13 | Lax | -p 0 -12 -b 12288 -v 1 |
Flake | 14 | Lax | -p 0 -12 -b 12288 -v 2 |
Nayuki | 00 | Subset | baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=4 minLpcOrder=-1 maxLpcOrder=-1 lpcRoundVars=0 maxRiceOrder=8 |
Nayuki | 01 | Subset | baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=8 lpcRoundVars=0 maxRiceOrder=5 |
Nayuki | 02 | Subset | baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8 |
Nayuki | 03 | Subset | baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=4 minLpcOrder=1 maxLpcOrder=12 lpcRoundVars=4 maxRiceOrder=8 |
Nayuki | 04 | Subset | baseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8 |
Nayuki | 05 | Subset | baseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=1 maxRiceOrder=8 |
Nayuki | 06 | Subset | baseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=2 maxRiceOrder=8 |
Nayuki | 07 | Subset | baseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=3 maxRiceOrder=8 |
Nayuki | 08 | Lax | baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=22 lpcRoundVars=0 maxRiceOrder=15 |
Nayuki | 09 | Lax | baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=0 maxRiceOrder=15 |
Nayuki | 10 | Lax | baseSize=2048 sizeMultiples=1,2,3,4 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8 |
Nayuki | 11 | Lax | baseSize=2048 sizeMultiples=1,2,3,4 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=0 maxRiceOrder=15 |
Nayuki | 12 | Lax | baseSize=1024 sizeMultiples=1,2,3,4,5,6,7,8,9,10,11,12 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8 |
Nayuki | 13 | Lax | baseSize=1024 sizeMultiples=1,2,3,4,5,6,7,8,9,10,11,12 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=0 maxRiceOrder=15 |
Nayuki | 14 | Lax | baseSize=1024 sizeMultiples=1,2,3,4,5,6,7,8,9,10,11,12 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=1 maxRiceOrder=15 |
Note that I left a good chunk of options for Nayuki’s encoder unexplored due to immense increases in computation time. The running time is more or less independent of baseSize, but it is proportional to the sum of the sizeMultiples list, together which determine the set of usable block sizes. Also, the run time is more or less proportional to maxLpcOrder minus minLpcOrder, and proportional to 2 to the power of lpcRoundVars (exponential time!).
Conclusion
Xiph.Org’s FLAC encoder is a fine piece of software in many ways – feature-rich, very fast, and good ratio of bytes-saved-to-CPU-time-spent. I showed that it is possible in practice to beat its compressed output size by about 100 KB per song, by applying variable block sizes intelligently. However Nayuki’s FLAC encoder is slow, and the tiny savings in file size are probably not worth the extra CPU time. It is questionable whether Nayuki’s encoder serves any practical purpose other than to show what magnitude of benefit is achieved by switching block sizes in a near-optimal way.
This work nearly exhausts all the possibilities that the current FLAC file format offers for shrinking audio data. Certainly the stereo coding modes, LPC order, Rice partition order, and Rice partition parameters are all searched exhaustively. The only places with room for improvement (however small) are the choice of LPC coefficients (where Xiph.Org currently beats Nayuki), the precision of LPC coefficients (probably not important), and trying a more fine-grained set of variable block sizes (say {100,200,300,...,19900} versus {5000,10000,15000}). There already exists other lossless audio codecs with slightly better compression than FLAC, albeit with tradeoffs in CPU time, licensing, etc. It would be interesting to see an exploration in adding new coding methods to FLAC, such as Huffman coding (instead of Rice coding), better modeling of non-stationary distributions, and who knows what.
Footnotes
- [0]: “
--blocksize=#
: [...] The reference encoder uses the same block size for the entire stream.” (Source) - [1]: “Blocking: [...] Though FLAC allows the block size to vary within a stream, the reference encoder uses a fixed block size.” (Source)
- [2]: Except that the final block may be shorter than the constant block size. This desirable behavior ensures that the FLAC file has the exact same number of samples as the input, unlike some other codecs which may silently introduce padding.