Benchmark of Nayuki’s FLAC encoder

Introduction

I built my own FLAC encoder starting from basic principles, in order to gain a deep understanding of how the FLAC file format works. My codebase is still a work in progress with inadequate documentation, but runnable code can be found in its GitHub repository. The code is considered experimental, and any output should be verified with the Xiph.Org official FLAC tool’s test option (flac -t YourFile.flac) before any serious usage/distribution/archival.

The main feature I wanted to explore in my encoder was the use of variable block sizes. Xiph.Org’s documentation states in multiple places^[0]^[1] that the reference encoder only implements logic to generate fixed/constant block sizes^[2]. What I knew was that when given a set of possible block sizes to choose from (e.g. {2048, 3072, 4096, 6144} samples), it is possible to apply dynamic programming to find the optimal sequence of block sizes to use to encode the whole audio file.

In this benchmark I tested the output file size and execution time of FLAC encoders from these vendors:

Xiph.Org’s official FLAC encoder (version 1.3.2, 2017-01-01)
Justin Ruggles’s Flake encoder (version 0.11, 2007-07-05)
(one of the very few alternative implementations available)
Nayuki’s FLAC encoder (2017-02-14)

Summary of the best compression results for each vendor, under the subset and lax modes:

Vendor	Subset/Lax	Song A	Song B	Song C	Song D
Xiph.Org	Subset	11 196 245	11 539 714	38 167 311	31 371 358
Flake	Subset	11 227 995	11 563 030	38 235 155	31 402 095
Nayuki	Subset	11 198 081	11 333 596	38 227 336	31 366 408
Xiph.Org	Lax	11 105 532	11 532 241	38 093 818	31 232 660
Flake	Lax	11 092 458	11 433 924	38 132 915	31 115 846
Nayuki	Lax	11 032 067	11 288 787	38 081 574	31 032 849

We can see that in the subset format, Xiph.Org and Nayuki each take the lead half of the time, and Flake trails behind noticeably. When there are no format constraints, Flake is often better than Xiph.Org, and Nayuki’s encoder consistently produces the smallest output – though at the cost of immense CPU time.

The FLAC subset format is produced by the official FLAC tool by default, and is recommended for best compatibility toward embedded music players with limited CPU/memory. Because the official tool forces you add the --lax option to escape the subset format constraints, it suggests that they don’t recommend you to do it. Therefore in this benchmark, I clearly label the results achieved under the subset format versus the results achieved under no constraints (“lax”). Having access to the full/lax FLAC format is important because it dramatically raises the maximum LPC order, Rice partition order, and block size. In practice this saves around 100 KB for a typical 5-minute song (or roughly 0.3 absolute percentage points).

Proof of smallest output files: nayuki-flac-encoder-benchmark-songs.torrent (total file size 91.4 MB)

Test clips

Song A (2 min 19 sec) (YouTube sample)

Title: Carmen: Prelude to Act 1
Composer: Georges Bizet
Album: Philips Classics – Music for the Millions – Disc 3

Comment: Orchestral non-vocal music, featuring many tonal instruments and some percussion.

Song B (3 min 38 sec) (YouTube sample)

Title: Sonata in A, K. 331: 3. Rondo “alla turca”
Composer: Wolfgang Amadeus Mozart
Album: Philips Classics – Music for the Millions – Disc 5

Comment: Pure piano music, with simple tones that FLAC can compress easily.

Song C (4 min 32 sec) (YouTube sample)

Title: CLICK
Vocals: ClariS
Anime series: Nisekoi (ニセコイ)

Comment: Loud pop music with layers of vocals, tonal instruments, and percussion – a relatively high-entropy situation that FLAC compresses poorly.

Song D (4 min 53 sec) (YouTube sample)

Title: Hello Alone -Yui Ballade-
Vocals: Touyama Nao (東山奈央)
Anime series: Yahari Ore no Seishun Love Comedy wa Machigatteiru. (やはり俺の青春ラブコメはまちがっている。)

Comment: Soft pop music with one voice and a small set of instruments – it compresses more readily than loud pop music.

Note: The links to YouTube copies of these songs are for listening only, not for encoder testing. Lossy and lossless copies of clips #2 and #3 can be found online (as they are pop music). But the exact versions of clips #0 and #1 produced by Philips Classics are hard or impossible to find – if you search online you will almost certainly encounter a different performance/recording of the same piece of classical music.

File size benchmark

The following table shows the output .flac file size (in bytes) when encoded with various parameters/switches. Note that the default padding is disabled (which saves about 8 KB). You can hover over cells in the “Profile” column to see the detailed list of encoding options. Cells are color-coded from red for largest/worst to blue for smallest/best.

Vendor	Profile	Song A	Song B	Song C	Song D
Raw samples		24 550 176	38 542 224	47 987 856	51 767 520
Xiph.Org	00 (Subset)	12 012 649	12 765 650	40 260 345	34 644 421
Xiph.Org	01 (Subset)	12 041 679	12 605 764	39 654 273	33 656 746
Xiph.Org	02 (Subset)	11 989 312	12 584 084	39 642 052	33 514 701
Xiph.Org	03 (Subset)	11 331 657	12 124 438	39 134 639	32 687 770
Xiph.Org	04 (Subset)	11 335 394	11 992 890	38 318 774	31 703 722
Xiph.Org	05 (Subset)	11 292 888	11 916 555	38 314 870	31 615 686
Xiph.Org	06 (Subset)	11 260 189	11 735 452	38 301 526	31 532 011
Xiph.Org	07 (Subset)	11 228 542	11 759 302	38 215 718	31 449 049
Xiph.Org	08 (Subset)	11 217 563	11 688 041	38 210 760	31 427 443
Xiph.Org	09 (Subset)	11 217 561	11 688 041	38 210 760	31 427 443
Xiph.Org	10 (Subset)	11 219 012	11 694 989	38 202 896	31 417 135
Xiph.Org	11 (Subset)	11 202 102	11 567 040	38 198 970	31 382 581
Xiph.Org	12 (Subset)	11 196 245	11 539 714	38 167 311	31 371 358
Flake	00 (Subset)	11 366 474	11 918 232	38 465 695	31 836 311
Flake	01 (Subset)	11 305 452	11 704 346	38 354 061	31 596 711
Flake	02 (Subset)	11 276 927	11 703 583	38 278 805	31 478 365
Flake	03 (Subset)	11 251 067	11 619 430	38 241 267	31 406 576
Flake	04 (Subset)	11 248 041	11 593 401	38 241 150	31 404 713
Flake	05 (Subset)	11 231 746	11 573 035	38 245 383	31 411 903
Flake	06 (Subset)	11 227 995	11 563 030	38 235 155	31 402 095
Nayuki	00 (Subset)	11 948 303	12 501 207	39 562 748	33 409 321
Nayuki	01 (Subset)	11 264 792	11 402 799	38 329 603	31 502 982
Nayuki	02 (Subset)	11 223 624	11 367 675	38 246 673	31 390 886
Nayuki	03 (Subset)	11 217 348	11 356 585	38 244 567	31 382 724
Nayuki	04 (Subset)	11 200 349	11 341 761	38 229 105	31 371 984
Nayuki	05 (Subset)	11 199 483	11 338 107	38 228 320	31 369 881
Nayuki	06 (Subset)	11 198 810	11 335 321	38 227 899	31 368 308
Nayuki	07 (Subset)	11 198 081	11 333 596	38 227 336	31 366 408
Xiph.Org	13 (Lax)	11 157 753	11 714 779	38 157 743	31 334 704
Xiph.Org	14 (Lax)	11 157 751	11 714 779	38 157 743	31 334 704
Xiph.Org	15 (Lax)	11 139 539	11 560 395	38 197 497	31 315 586
Xiph.Org	16 (Lax)	11 113 210	11 559 055	38 145 088	31 258 435
Xiph.Org	17 (Lax)	11 114 838	11 578 463	38 108 506	31 232 660
Xiph.Org	18 (Lax)	11 128 191	11 610 087	38 097 994	31 235 965
Xiph.Org	19 (Lax)	11 105 532	11 532 241	38 093 818	31 241 266
Flake	07 (Lax)	11 134 291	11 592 342	38 183 267	31 218 520
Flake	08 (Lax)	11 130 256	11 566 606	38 180 273	31 210 587
Flake	09 (Lax)	11 120 167	11 548 126	38 188 437	31 227 148
Flake	10 (Lax)	11 120 457	11 541 471	38 178 257	31 215 731
Flake	11 (Lax)	11 097 885	11 471 292	38 145 971	31 139 616
Flake	12 (Lax)	11 097 541	11 459 784	38 141 017	31 130 928
Flake	13 (Lax)	11 102 137	11 453 551	38 137 336	31 122 361
Flake	14 (Lax)	11 092 458	11 433 924	38 132 915	31 115 846
Nayuki	08 (Lax)	11 157 900	11 370 585	38 222 635	31 294 038
Nayuki	09 (Lax)	11 115 962	11 377 733	38 261 969	31 248 579
Nayuki	10 (Lax)	11 184 068	11 317 177	38 182 939	31 319 823
Nayuki	11 (Lax)	11 050 146	11 309 086	38 126 946	31 078 568
Nayuki	12 (Lax)	11 172 036	11 305 836	38 158 824	31 301 848
Nayuki	13 (Lax)	11 033 035	11 291 614	38 081 695	31 034 557
Nayuki	14 (Lax)	11 032 067	11 288 787	38 081 574	31 032 849

Notes: The size of the raw samples includes only the audio data, but not even critical metadata like sample rate, bit depth, and number of channels. Dividing the .flac file size by the raw samples size gives the compression ratio, where smaller is better (e.g. 38081695÷47987856 ≈ 0.794 for Song C). A minimal WAV file is always 44 bytes larger than the raw samples alone, and does contain critical metadata necessary for playback.

Xiph.Org: The preset compression levels from -0 to -8 (represented in my profiles #00 to #08) are all so fast that you might as well use -8 to get the best compression. Looking at profiles #15 to #18, increasing the block size from 3072 to 8192 samples sometimes helps and sometimes hurts. By contrast, Nayuki’s encoder manages to use plenty of large 12288-sample blocks in the most efficient encoding of all the test clips. Exhaustive LPC order search (-e) is one of the better things you can do (e.g. #14 vs. #16). The -p option helps somewhat (#11 vs. #12, and #16 vs. #19). Increasing the maximum Rice partition order basically does nothing (#08 vs. #09, and #13 vs. #14).

Flake: Within the subset format, the best setting for Flake (even with variable block sizes enabled) cannot match for Xiph.Org’s smaller output size. However Flake pulls ahead under the lax format, and the best setting usually beats Xiph.Org under lax mode. We can see in profiles #08 to #10 that -v 0 (constant block size) is sometimes better than -v 1 (variable block size heuristic 1), but -v 2 (variable block size heuristic 2) is always better than both (but takes 5× more CPU time).

Nayuki, subset: When using the same constraints as Xiph.Org, such as the subset format with constant block size, Nayuki’s encoder basically cannot compete with Xiph.Org on output size (see Xiph.Org #12 vs. Nayuki #03). Because both encoders are exhaustively searching all the available LPC orders (up to 12), Rice partition orders (up to 8), and stereo coding modes, it implies that the only place where Nayuki’s encoder lags behind Xiph.Org is the calculation/choice of LPC coefficients. When only constrained to the subset format but allowing variable block size (with a maximum of 4608 samples), Nayuki’s encoder gains some ground (see Xiph.Org #12 vs. Nayuki #07), but the inconsistent results don’t justify the expenditure of CPU time.

Nayuki, lax: Under lax mode with the full power of the FLAC file format, Nayuki’s variable block size encoding can finally beat Xiph.Org’s best output consistently (Xiph.Org #19 vs. Nayuki #14), but again at a huge cost of CPU time. Presumably Nayuki’s encoder is still disadvantaged compared to Xiph.Org in terms of LPC coefficients; if an encoder can combine the qualities of near-optimal variable block sizing with superior LPC coefficient selection, it would be interesting to see how much smaller the output could be.

There are tools to examine what block sizes are used in a FLAC file, which is helpful to know when a file uses variable block size encoding. You can use the official FLAC tool’s -a option to analyze an existing .flac file, which generates a megabyte-long text file with information about every subframe of encoded audio. Also you can use Nayuki’s ShowFlacFileStats tool, which prints histograms/graphs to the console.

Encoding time benchmark

The table below shows the number of seconds to encode the uncompressed WAV file to a FLAC file. You can hover over cells in the “Profile” column to see the detailed list of encoding options. Cells are color-coded from red for slowest/worst to blue for fastest/best.

Vendor	Profile	Song A	Song B	Song C	Song D
Audio length		139.2	218.5	272.0	293.5
Xiph.Org	00 (Subset)	0.2	0.3	0.4	0.4
Xiph.Org	01 (Subset)	0.2	0.3	0.4	0.4
Xiph.Org	02 (Subset)	0.2	0.3	0.5	0.5
Xiph.Org	03 (Subset)	0.2	0.3	0.4	0.4
Xiph.Org	04 (Subset)	0.2	0.3	0.5	0.5
Xiph.Org	05 (Subset)	0.3	0.4	0.6	0.6
Xiph.Org	06 (Subset)	0.3	0.5	0.8	0.8
Xiph.Org	07 (Subset)	0.4	0.6	0.8	0.8
Xiph.Org	08 (Subset)	0.5	0.8	1.2	1.2
Xiph.Org	09 (Subset)	0.6	1.0	1.4	1.4
Xiph.Org	10 (Subset)	0.6	1.0	1.4	1.4
Xiph.Org	11 (Subset)	2.7	4.1	5.7	6.0
Xiph.Org	12 (Subset)	20.1	31.6	42.6	44.7
Xiph.Org	13 (Lax)	2.1	2.7	4.3	4.8
Xiph.Org	14 (Lax)	2.3	3.0	4.9	5.2
Xiph.Org	15 (Lax)	30.1	44.4	63.4	66.6
Xiph.Org	16 (Lax)	30.4	44.5	64.0	67.1
Xiph.Org	17 (Lax)	29.9	43.6	62.8	65.7
Xiph.Org	18 (Lax)	30.2	43.7	63.7	66.4
Xiph.Org	19 (Lax)	225.3	330.2	474.1	497.7
Flake	00 (Subset)	0.6	0.9	1.3	1.4
Flake	01 (Subset)	0.8	1.2	1.7	1.8
Flake	02 (Subset)	0.9	1.4	1.9	2.0
Flake	03 (Subset)	1.1	1.7	2.2	2.4
Flake	04 (Subset)	1.6	2.6	3.4	3.6
Flake	05 (Subset)	1.7	2.6	3.4	3.7
Flake	06 (Subset)	8.6	14.0	17.6	19.2
Flake	07 (Lax)	1.9	3.0	4.0	4.4
Flake	08 (Lax)	5.0	7.9	10.1	11.0
Flake	09 (Lax)	5.0	7.9	10.2	11.1
Flake	10 (Lax)	25.5	40.9	51.9	56.5
Flake	11 (Lax)	4.9	7.8	10.0	10.9
Flake	12 (Lax)	25.2	40.4	51.2	55.8
Flake	13 (Lax)	4.8	7.7	9.7	10.6
Flake	14 (Lax)	24.3	39.0	49.5	53.9
Nayuki	00 (Subset)	7.7	10.3	17.9	17.3
Nayuki	01 (Subset)	9.9	14.5	23.5	23.1
Nayuki	02 (Subset)	13.6	20.3	31.2	31.3
Nayuki	03 (Subset)	121.4	193.4	268.4	283.0
Nayuki	04 (Subset)	357.8	561.3	764.9	795.2
Nayuki	05 (Subset)	642.9	1003.1	1344.1	1426.9
Nayuki	06 (Subset)	1180.3	1862.0	2456.9	2583.3
Nayuki	07 (Subset)	2160.2	3465.1	4357.3	4678.9
Nayuki	08 (Lax)	26.4	40.8	58.1	60.2
Nayuki	09 (Lax)	40.5	63.0	87.0	91.1
Nayuki	10 (Lax)	110.2	172.0	234.4	245.1
Nayuki	11 (Lax)	369.3	583.9	773.1	821.0
Nayuki	12 (Lax)	831.1	1305.7	1758.2	1844.7
Nayuki	13 (Lax)	2775.7	4398.4	5816.8	6187.1
Nayuki	14 (Lax)	4430.0	7017.4	9189.0	9754.3

Notes: The audio length can be used to gauge how fast an encoder is relative to real time. In terms of raw data rate, all the test clips are 44100 Hz, 16-bit, stereo (CD quality). Benchmarks were performed on Windows 8.1 Pro 64-bit, Intel Core i5-4690, 3.50 GHz, single-core operation. For Xiph.Org FLAC 1.3.2, the 64-bit EXE was used. Flake 0.11 only has a 32-bit EXE. Nayuki’s FLAC encoder was run on the Oracle Java 1.8.0_45 64-bit JVM. Files are read from and written to an SSD, so I/O time is considered negligible.

Xiph.Org’s encoder is fast for all but the ludicrous exhaustive searches. The strongest standard (non-custom) option of -8 runs well over 100× real-time speed. Even in lax mode with the maximum LPC order of 32 (-l 32) and exhaustive LPC order search (-e), it still runs several times faster than real time. Only when exhaustive LPC coefficient optimization is enabled (-p) does the amount of time start to become unreasonable (e.g. profile #19 in my trials). Beyond the settings I explored, not many options remain to increase this encoder’s search space and run time – the only ones left are the block size and list of apodization functions. All in all, the reference encoder implementation is a solid, efficient piece of engineering.

Flake is also quite fast, in more or less the same order of magnitude as Xiph.Org’s encoder. Indeed the author stated fast encoding as a feature on their project page. I would say that for the same computation time it produces roughly the same compression ratio as Xiph.Org. It appears that the most intensive setting is still faster than some of Xiph.Org’s intensive settings, and I have exhausted the ways to make Flake search more thoroughly. The variable block size option -v 2 is about 5 times slower than -v 1 (see profile #05 vs. #06, #13 vs. #14, etc.), but almost always produces smaller files. The option value -v 1 takes the same amount of time as -v 0 (constant block size), but sometimes produces bigger files and sometimes smaller ones (see #04 vs. #05, and #08 vs. #09)

Nayuki’s encoder is slow for a myriad of reasons. It is written in Java instead of C plus assembly/SIMD. It hasn’t implemented multithreading for the embarrassingly parallel parts of the search. As a budding project it focuses on correctness and just getting a working implementation at all. It uses 64-bit arrays to support encoding 32-bit integer audio files (whereas Xiph.Org only supports up to 24 bits), but this is an overhead because 32-bit arrays would be enough to support 16- and 24-bit audio (and I don’t think anyone actually uses 32-bit integer audio). And it relies heavily on exhaustive search (for LPC order, Rice partition order, block size switching, etc.), without trying to implement any heuristics to reduce the search space (the other encoders deploy numerous heuristics).

Encoding profiles used

This table lists the details for the profiles used in benchmarking. For Xiph.Org and Flake, the command line options/flags are shown (plus --totally-silent or -q to suppress needless printing for batch processing). For Nayuki’s encoder, values passed into the Java API are shown (it doesn’t support command line options); their meanings won’t be explicitly explained until a proper article for the encoder/library is written.

Choosing which set of options to test/report is a search problem in a multi-dimensional space. Using human-based heuristics, I only explored a small part of the space to comparatively illustrate key points about how certain options (e.g. block size, LPC order) affect output size and run time.

Vendor	Profile #	Subset/Lax	Details
Xiph.Org	00	Subset	`--no-padding --no-seektable -0`
Xiph.Org	01	Subset	`--no-padding --no-seektable -1`
Xiph.Org	02	Subset	`--no-padding --no-seektable -2`
Xiph.Org	03	Subset	`--no-padding --no-seektable -3`
Xiph.Org	04	Subset	`--no-padding --no-seektable -4`
Xiph.Org	05	Subset	`--no-padding --no-seektable -5`
Xiph.Org	06	Subset	`--no-padding --no-seektable -6`
Xiph.Org	07	Subset	`--no-padding --no-seektable -7`
Xiph.Org	08	Subset	`--no-padding --no-seektable -8`
Xiph.Org	09	Subset	`--no-padding --no-seektable -l 12 -b 4096 -m -r 8 -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	10	Subset	`--no-padding --no-seektable -l 12 -b 4608 -m -r 8 -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	11	Subset	`--no-padding --no-seektable -l 12 -b 4608 -m -r 8 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	12	Subset	`--no-padding --no-seektable -l 12 -b 4608 -m -r 8 -e -p -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	13	Lax	`--no-padding --no-seektable --lax -l 32 -b 4096 -m -r 6 -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	14	Lax	`--no-padding --no-seektable --lax -l 32 -b 4096 -m -r 15 -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	15	Lax	`--no-padding --no-seektable --lax -l 32 -b 3072 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	16	Lax	`--no-padding --no-seektable --lax -l 32 -b 4096 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	17	Lax	`--no-padding --no-seektable --lax -l 32 -b 6144 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	18	Lax	`--no-padding --no-seektable --lax -l 32 -b 8192 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Xiph.Org	19	Lax	`--no-padding --no-seektable --lax -l 32 -b 4096 -m -r 15 -e -p -A tukey(0.5);partial_tukey(2);punchout_tukey(3)`
Flake	00	Subset	`-p 0 -6`
Flake	01	Subset	`-p 0 -7`
Flake	02	Subset	`-p 0 -8`
Flake	03	Subset	`-p 0 -9`
Flake	04	Subset	`-p 0 -10`
Flake	05	Subset	`-p 0 -10 -v 1`
Flake	06	Subset	`-p 0 -10 -v 2`
Flake	07	Lax	`-p 0 -11`
Flake	08	Lax	`-p 0 -12`
Flake	09	Lax	`-p 0 -12 -v 1`
Flake	10	Lax	`-p 0 -12 -v 2`
Flake	11	Lax	`-p 0 -12 -b 8192 -v 1`
Flake	12	Lax	`-p 0 -12 -b 8192 -v 2`
Flake	13	Lax	`-p 0 -12 -b 12288 -v 1`
Flake	14	Lax	`-p 0 -12 -b 12288 -v 2`
Nayuki	00	Subset	`baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=4 minLpcOrder=-1 maxLpcOrder=-1 lpcRoundVars=0 maxRiceOrder=8`
Nayuki	01	Subset	`baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=8 lpcRoundVars=0 maxRiceOrder=5`
Nayuki	02	Subset	`baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8`
Nayuki	03	Subset	`baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=4 minLpcOrder=1 maxLpcOrder=12 lpcRoundVars=4 maxRiceOrder=8`
Nayuki	04	Subset	`baseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8`
Nayuki	05	Subset	`baseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=1 maxRiceOrder=8`
Nayuki	06	Subset	`baseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=2 maxRiceOrder=8`
Nayuki	07	Subset	`baseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=3 maxRiceOrder=8`
Nayuki	08	Lax	`baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=22 lpcRoundVars=0 maxRiceOrder=15`
Nayuki	09	Lax	`baseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=0 maxRiceOrder=15`
Nayuki	10	Lax	`baseSize=2048 sizeMultiples=1,2,3,4 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8`
Nayuki	11	Lax	`baseSize=2048 sizeMultiples=1,2,3,4 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=0 maxRiceOrder=15`
Nayuki	12	Lax	`baseSize=1024 sizeMultiples=1,2,3,4,5,6,7,8,9,10,11,12 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8`
Nayuki	13	Lax	`baseSize=1024 sizeMultiples=1,2,3,4,5,6,7,8,9,10,11,12 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=0 maxRiceOrder=15`
Nayuki	14	Lax	`baseSize=1024 sizeMultiples=1,2,3,4,5,6,7,8,9,10,11,12 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=1 maxRiceOrder=15`

Note that I left a good chunk of options for Nayuki’s encoder unexplored due to immense increases in computation time. The running time is more or less independent of baseSize, but it is proportional to the sum of the sizeMultiples list, together which determine the set of usable block sizes. Also, the run time is more or less proportional to maxLpcOrder minus minLpcOrder, and proportional to 2 to the power of lpcRoundVars (exponential time!).

Conclusion

Xiph.Org’s FLAC encoder is a fine piece of software in many ways – feature-rich, very fast, and good ratio of bytes-saved-to-CPU-time-spent. I showed that it is possible in practice to beat its compressed output size by about 100 KB per song, by applying variable block sizes intelligently. However Nayuki’s FLAC encoder is slow, and the tiny savings in file size are probably not worth the extra CPU time. It is questionable whether Nayuki’s encoder serves any practical purpose other than to show what magnitude of benefit is achieved by switching block sizes in a near-optimal way.

This work nearly exhausts all the possibilities that the current FLAC file format offers for shrinking audio data. Certainly the stereo coding modes, LPC order, Rice partition order, and Rice partition parameters are all searched exhaustively. The only places with room for improvement (however small) are the choice of LPC coefficients (where Xiph.Org currently beats Nayuki), the precision of LPC coefficients (probably not important), and trying a more fine-grained set of variable block sizes (say {100,200,300,...,19900} versus {5000,10000,15000}). There already exists other lossless audio codecs with slightly better compression than FLAC, albeit with tradeoffs in CPU time, licensing, etc. It would be interesting to see an exploration in adding new coding methods to FLAC, such as Huffman coding (instead of Rice coding), better modeling of non-stationary distributions, and who knows what.

Footnotes

[0]: “--blocksize=#: [...] The reference encoder uses the same block size for the entire stream.” (Source)
[1]: “Blocking: [...] Though FLAC allows the block size to vary within a stream, the reference encoder uses a fixed block size.” (Source)
[2]: Except that the final block may be shorter than the constant block size. This desirable behavior ensures that the FLAC file has the exact same number of samples as the input, unlike some other codecs which may silently introduce padding.