Project Nayuki


Benchmark of Nayuki’s FLAC encoder

Introduction

I built my own FLAC encoder starting from basic principles, in order to gain a deep understanding of how the FLAC file format works. My codebase is still a work in progress with inadequate documentation, but runnable code can be found in its GitHub repository. The code is considered experimental, and any output should be verified with the Xiph.Org official FLAC tool’s test option (flac -t YourFile.flac) before any serious usage/distribution/archival.

The main feature I wanted to explore in my encoder was the use of variable block sizes. Xiph.Org’s documentation states in multiple places[0][1] that the reference encoder only implements logic to generate fixed/constant block sizes[2]. What I knew was that when given a set of possible block sizes to choose from (e.g. {2048, 3072, 4096, 6144} samples), it is possible to apply dynamic programming to find the optimal sequence of block sizes to use to encode the whole audio file.

In this benchmark I tested the output file size and execution time of FLAC encoders from these vendors:

Summary of the best compression results for each vendor, under the subset and lax modes:

Vendor Subset/Lax Song A Song B Song C Song D
Xiph.OrgSubset11 196 24511 539 71438 167 31131 371 358
FlakeSubset11 227 99511 563 03038 235 15531 402 095
NayukiSubset11 198 08111 333 59638 227 33631 366 408
Xiph.OrgLax11 105 53211 532 24138 093 81831 232 660
FlakeLax11 092 45811 433 92438 132 91531 115 846
NayukiLax11 032 06711 288 78738 081 57431 032 849

We can see that in the subset format, Xiph.Org and Nayuki each take the lead half of the time, and Flake trails behind noticeably. When there are no format constraints, Flake is often better than Xiph.Org, and Nayuki’s encoder consistently produces the smallest output – though at the cost of immense CPU time.

The FLAC subset format is produced by the official FLAC tool by default, and is recommended for best compatibility toward embedded music players with limited CPU/memory. Because the official tool forces you add the --lax option to escape the subset format constraints, it suggests that they don’t recommend you to do it. Therefore in this benchmark, I clearly label the results achieved under the subset format versus the results achieved under no constraints (“lax”). Having access to the full/lax FLAC format is important because it dramatically raises the maximum LPC order, Rice partition order, and block size. In practice this saves around 100 KB for a typical 5-minute song (or roughly 0.3 absolute percentage points).

Proof of smallest output files: nayuki-flac-encoder-benchmark-songs.torrent (total file size 91.4 MB)

Test clips

Song A (2 min 19 sec) (YouTube sample)

Title: Carmen: Prelude to Act 1
Composer: Georges Bizet
Album: Philips Classics – Music for the Millions – Disc 3

Comment: Orchestral non-vocal music, featuring many tonal instruments and some percussion.

Song B (3 min 38 sec) (YouTube sample)

Title: Sonata in A, K. 331: 3. Rondo “alla turca”
Composer: Wolfgang Amadeus Mozart
Album: Philips Classics – Music for the Millions – Disc 5

Comment: Pure piano music, with simple tones that FLAC can compress easily.

Song C (4 min 32 sec) (YouTube sample)

Title: CLICK
Vocals: ClariS
Anime series: Nisekoi (ニセコイ)

Comment: Loud pop music with layers of vocals, tonal instruments, and percussion – a relatively high-entropy situation that FLAC compresses poorly.

Song D (4 min 53 sec) (YouTube sample)

Title: Hello Alone -Yui Ballade-
Vocals: Touyama Nao (東山奈央)
Anime series: Yahari Ore no Seishun Love Comedy wa Machigatteiru. (やはり俺の青春ラブコメはまちがっている。)

Comment: Soft pop music with one voice and a small set of instruments – it compresses more readily than loud pop music.

Note: The links to YouTube copies of these songs are for listening only, not for encoder testing. Lossy and lossless copies of clips #2 and #3 can be found online (as they are pop music). But the exact versions of clips #0 and #1 produced by Philips Classics are hard or impossible to find – if you search online you will almost certainly encounter a different performance/recording of the same piece of classical music.


File size benchmark

The following table shows the output .flac file size (in bytes) when encoded with various parameters/switches. Note that the default padding is disabled (which saves about 8 KB). You can hover over cells in the “Profile” column to see the detailed list of encoding options. Cells are color-coded from red for largest/worst to blue for smallest/best.

Vendor Profile Song A Song B Song C Song D
Raw samples24 550 17638 542 22447 987 85651 767 520
Xiph.Org00 (Subset)12 012 64912 765 65040 260 34534 644 421
Xiph.Org01 (Subset)12 041 67912 605 76439 654 27333 656 746
Xiph.Org02 (Subset)11 989 31212 584 08439 642 05233 514 701
Xiph.Org03 (Subset)11 331 65712 124 43839 134 63932 687 770
Xiph.Org04 (Subset)11 335 39411 992 89038 318 77431 703 722
Xiph.Org05 (Subset)11 292 88811 916 55538 314 87031 615 686
Xiph.Org06 (Subset)11 260 18911 735 45238 301 52631 532 011
Xiph.Org07 (Subset)11 228 54211 759 30238 215 71831 449 049
Xiph.Org08 (Subset)11 217 56311 688 04138 210 76031 427 443
Xiph.Org09 (Subset)11 217 56111 688 04138 210 76031 427 443
Xiph.Org10 (Subset)11 219 01211 694 98938 202 89631 417 135
Xiph.Org11 (Subset)11 202 10211 567 04038 198 97031 382 581
Xiph.Org12 (Subset)11 196 24511 539 71438 167 31131 371 358
Flake00 (Subset)11 366 47411 918 23238 465 69531 836 311
Flake01 (Subset)11 305 45211 704 34638 354 06131 596 711
Flake02 (Subset)11 276 92711 703 58338 278 80531 478 365
Flake03 (Subset)11 251 06711 619 43038 241 26731 406 576
Flake04 (Subset)11 248 04111 593 40138 241 15031 404 713
Flake05 (Subset)11 231 74611 573 03538 245 38331 411 903
Flake06 (Subset)11 227 99511 563 03038 235 15531 402 095
Nayuki00 (Subset)11 948 30312 501 20739 562 74833 409 321
Nayuki01 (Subset)11 264 79211 402 79938 329 60331 502 982
Nayuki02 (Subset)11 223 62411 367 67538 246 67331 390 886
Nayuki03 (Subset)11 217 34811 356 58538 244 56731 382 724
Nayuki04 (Subset)11 200 34911 341 76138 229 10531 371 984
Nayuki05 (Subset)11 199 48311 338 10738 228 32031 369 881
Nayuki06 (Subset)11 198 81011 335 32138 227 89931 368 308
Nayuki07 (Subset)11 198 08111 333 59638 227 33631 366 408
Xiph.Org13 (Lax)11 157 75311 714 77938 157 74331 334 704
Xiph.Org14 (Lax)11 157 75111 714 77938 157 74331 334 704
Xiph.Org15 (Lax)11 139 53911 560 39538 197 49731 315 586
Xiph.Org16 (Lax)11 113 21011 559 05538 145 08831 258 435
Xiph.Org17 (Lax)11 114 83811 578 46338 108 50631 232 660
Xiph.Org18 (Lax)11 128 19111 610 08738 097 99431 235 965
Xiph.Org19 (Lax)11 105 53211 532 24138 093 81831 241 266
Flake07 (Lax)11 134 29111 592 34238 183 26731 218 520
Flake08 (Lax)11 130 25611 566 60638 180 27331 210 587
Flake09 (Lax)11 120 16711 548 12638 188 43731 227 148
Flake10 (Lax)11 120 45711 541 47138 178 25731 215 731
Flake11 (Lax)11 097 88511 471 29238 145 97131 139 616
Flake12 (Lax)11 097 54111 459 78438 141 01731 130 928
Flake13 (Lax)11 102 13711 453 55138 137 33631 122 361
Flake14 (Lax)11 092 45811 433 92438 132 91531 115 846
Nayuki08 (Lax)11 157 90011 370 58538 222 63531 294 038
Nayuki09 (Lax)11 115 96211 377 73338 261 96931 248 579
Nayuki10 (Lax)11 184 06811 317 17738 182 93931 319 823
Nayuki11 (Lax)11 050 14611 309 08638 126 94631 078 568
Nayuki12 (Lax)11 172 03611 305 83638 158 82431 301 848
Nayuki13 (Lax)11 033 03511 291 61438 081 69531 034 557
Nayuki14 (Lax)11 032 06711 288 78738 081 57431 032 849

Notes: The size of the raw samples includes only the audio data, but not even critical metadata like sample rate, bit depth, and number of channels. Dividing the .flac file size by the raw samples size gives the compression ratio, where smaller is better (e.g. 38081695÷47987856 ≈ 0.794 for Song C). A minimal WAV file is always 44 bytes larger than the raw samples alone, and does contain critical metadata necessary for playback.

Xiph.Org: The preset compression levels from -0 to -8 (represented in my profiles #00 to #08) are all so fast that you might as well use -8 to get the best compression. Looking at profiles #15 to #18, increasing the block size from 3072 to 8192 samples sometimes helps and sometimes hurts. By contrast, Nayuki’s encoder manages to use plenty of large 12288-sample blocks in the most efficient encoding of all the test clips. Exhaustive LPC order search (-e) is one of the better things you can do (e.g. #14 vs. #16). The -p option helps somewhat (#11 vs. #12, and #16 vs. #19). Increasing the maximum Rice partition order basically does nothing (#08 vs. #09, and #13 vs. #14).

Flake: Within the subset format, the best setting for Flake (even with variable block sizes enabled) cannot match for Xiph.Org’s smaller output size. However Flake pulls ahead under the lax format, and the best setting usually beats Xiph.Org under lax mode. We can see in profiles #08 to #10 that -v 0 (constant block size) is sometimes better than -v 1 (variable block size heuristic 1), but -v 2 (variable block size heuristic 2) is always better than both (but takes 5× more CPU time).

Nayuki, subset: When using the same constraints as Xiph.Org, such as the subset format with constant block size, Nayuki’s encoder basically cannot compete with Xiph.Org on output size (see Xiph.Org #12 vs. Nayuki #03). Because both encoders are exhaustively searching all the available LPC orders (up to 12), Rice partition orders (up to 8), and stereo coding modes, it implies that the only place where Nayuki’s encoder lags behind Xiph.Org is the calculation/choice of LPC coefficients. When only constrained to the subset format but allowing variable block size (with a maximum of 4608 samples), Nayuki’s encoder gains some ground (see Xiph.Org #12 vs. Nayuki #07), but the inconsistent results don’t justify the expenditure of CPU time.

Nayuki, lax: Under lax mode with the full power of the FLAC file format, Nayuki’s variable block size encoding can finally beat Xiph.Org’s best output consistently (Xiph.Org #19 vs. Nayuki #14), but again at a huge cost of CPU time. Presumably Nayuki’s encoder is still disadvantaged compared to Xiph.Org in terms of LPC coefficients; if an encoder can combine the qualities of near-optimal variable block sizing with superior LPC coefficient selection, it would be interesting to see how much smaller the output could be.

There are tools to examine what block sizes are used in a FLAC file, which is helpful to know when a file uses variable block size encoding. You can use the official FLAC tool’s -a option to analyze an existing .flac file, which generates a megabyte-long text file with information about every subframe of encoded audio. Also you can use Nayuki’s ShowFlacFileStats tool, which prints histograms/graphs to the console.

Encoding time benchmark

The table below shows the number of seconds to encode the uncompressed WAV file to a FLAC file. You can hover over cells in the “Profile” column to see the detailed list of encoding options. Cells are color-coded from red for slowest/worst to blue for fastest/best.

Vendor Profile Song A Song B Song C Song D
Audio length139.2218.5272.0293.5
Xiph.Org00 (Subset)0.20.30.40.4
Xiph.Org01 (Subset)0.20.30.40.4
Xiph.Org02 (Subset)0.20.30.50.5
Xiph.Org03 (Subset)0.20.30.40.4
Xiph.Org04 (Subset)0.20.30.50.5
Xiph.Org05 (Subset)0.30.40.60.6
Xiph.Org06 (Subset)0.30.50.80.8
Xiph.Org07 (Subset)0.40.60.80.8
Xiph.Org08 (Subset)0.50.81.21.2
Xiph.Org09 (Subset)0.61.01.41.4
Xiph.Org10 (Subset)0.61.01.41.4
Xiph.Org11 (Subset)2.74.15.76.0
Xiph.Org12 (Subset)20.131.642.644.7
Xiph.Org13 (Lax)2.12.74.34.8
Xiph.Org14 (Lax)2.33.04.95.2
Xiph.Org15 (Lax)30.144.463.466.6
Xiph.Org16 (Lax)30.444.564.067.1
Xiph.Org17 (Lax)29.943.662.865.7
Xiph.Org18 (Lax)30.243.763.766.4
Xiph.Org19 (Lax)225.3330.2474.1497.7
Flake00 (Subset)0.60.91.31.4
Flake01 (Subset)0.81.21.71.8
Flake02 (Subset)0.91.41.92.0
Flake03 (Subset)1.11.72.22.4
Flake04 (Subset)1.62.63.43.6
Flake05 (Subset)1.72.63.43.7
Flake06 (Subset)8.614.017.619.2
Flake07 (Lax)1.93.04.04.4
Flake08 (Lax)5.07.910.111.0
Flake09 (Lax)5.07.910.211.1
Flake10 (Lax)25.540.951.956.5
Flake11 (Lax)4.97.810.010.9
Flake12 (Lax)25.240.451.255.8
Flake13 (Lax)4.87.79.710.6
Flake14 (Lax)24.339.049.553.9
Nayuki00 (Subset)7.710.317.917.3
Nayuki01 (Subset)9.914.523.523.1
Nayuki02 (Subset)13.620.331.231.3
Nayuki03 (Subset)121.4193.4268.4283.0
Nayuki04 (Subset)357.8561.3764.9795.2
Nayuki05 (Subset)642.91003.11344.11426.9
Nayuki06 (Subset)1180.31862.02456.92583.3
Nayuki07 (Subset)2160.23465.14357.34678.9
Nayuki08 (Lax)26.440.858.160.2
Nayuki09 (Lax)40.563.087.091.1
Nayuki10 (Lax)110.2172.0234.4245.1
Nayuki11 (Lax)369.3583.9773.1821.0
Nayuki12 (Lax)831.11305.71758.21844.7
Nayuki13 (Lax)2775.74398.45816.86187.1
Nayuki14 (Lax)4430.07017.49189.09754.3

Notes: The audio length can be used to gauge how fast an encoder is relative to real time. In terms of raw data rate, all the test clips are 44100 Hz, 16-bit, stereo (CD quality). Benchmarks were performed on Windows 8.1 Pro 64-bit, Intel Core i5-4690, 3.50 GHz, single-core operation. For Xiph.Org FLAC 1.3.2, the 64-bit EXE was used. Flake 0.11 only has a 32-bit EXE. Nayuki’s FLAC encoder was run on the Oracle Java 1.8.0_45 64-bit JVM. Files are read from and written to an SSD, so I/O time is considered negligible.

Xiph.Org’s encoder is fast for all but the ludicrous exhaustive searches. The strongest standard (non-custom) option of -8 runs well over 100× real-time speed. Even in lax mode with the maximum LPC order of 32 (-l 32) and exhaustive LPC order search (-e), it still runs several times faster than real time. Only when exhaustive LPC coefficient optimization is enabled (-p) does the amount of time start to become unreasonable (e.g. profile #19 in my trials). Beyond the settings I explored, not many options remain to increase this encoder’s search space and run time – the only ones left are the block size and list of apodization functions. All in all, the reference encoder implementation is a solid, efficient piece of engineering.

Flake is also quite fast, in more or less the same order of magnitude as Xiph.Org’s encoder. Indeed the author stated fast encoding as a feature on their project page. I would say that for the same computation time it produces roughly the same compression ratio as Xiph.Org. It appears that the most intensive setting is still faster than some of Xiph.Org’s intensive settings, and I have exhausted the ways to make Flake search more thoroughly. The variable block size option -v 2 is about 5 times slower than -v 1 (see profile #05 vs. #06, #13 vs. #14, etc.), but almost always produces smaller files. The option value -v 1 takes the same amount of time as -v 0 (constant block size), but sometimes produces bigger files and sometimes smaller ones (see #04 vs. #05, and #08 vs. #09)

Nayuki’s encoder is slow for a myriad of reasons. It is written in Java instead of C plus assembly/SIMD. It hasn’t implemented multithreading for the embarrassingly parallel parts of the search. As a budding project it focuses on correctness and just getting a working implementation at all. It uses 64-bit arrays to support encoding 32-bit integer audio files (whereas Xiph.Org only supports up to 24 bits), but this is an overhead because 32-bit arrays would be enough to support 16- and 24-bit audio (and I don’t think anyone actually uses 32-bit integer audio). And it relies heavily on exhaustive search (for LPC order, Rice partition order, block size switching, etc.), without trying to implement any heuristics to reduce the search space (the other encoders deploy numerous heuristics).

Encoding profiles used

This table lists the details for the profiles used in benchmarking. For Xiph.Org and Flake, the command line options/flags are shown (plus --totally-silent or -q to suppress needless printing for batch processing). For Nayuki’s encoder, values passed into the Java API are shown (it doesn’t support command line options); their meanings won’t be explicitly explained until a proper article for the encoder/library is written.

Choosing which set of options to test/report is a search problem in a multi-dimensional space. Using human-based heuristics, I only explored a small part of the space to comparatively illustrate key points about how certain options (e.g. block size, LPC order) affect output size and run time.

Vendor Profile # Subset/Lax Details
Xiph.Org00Subset--no-padding --no-seektable -0
Xiph.Org01Subset--no-padding --no-seektable -1
Xiph.Org02Subset--no-padding --no-seektable -2
Xiph.Org03Subset--no-padding --no-seektable -3
Xiph.Org04Subset--no-padding --no-seektable -4
Xiph.Org05Subset--no-padding --no-seektable -5
Xiph.Org06Subset--no-padding --no-seektable -6
Xiph.Org07Subset--no-padding --no-seektable -7
Xiph.Org08Subset--no-padding --no-seektable -8
Xiph.Org09Subset--no-padding --no-seektable -l 12 -b 4096 -m -r 8 -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org10Subset--no-padding --no-seektable -l 12 -b 4608 -m -r 8 -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org11Subset--no-padding --no-seektable -l 12 -b 4608 -m -r 8 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org12Subset--no-padding --no-seektable -l 12 -b 4608 -m -r 8 -e -p -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org13Lax--no-padding --no-seektable --lax -l 32 -b 4096 -m -r 6 -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org14Lax--no-padding --no-seektable --lax -l 32 -b 4096 -m -r 15 -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org15Lax--no-padding --no-seektable --lax -l 32 -b 3072 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org16Lax--no-padding --no-seektable --lax -l 32 -b 4096 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org17Lax--no-padding --no-seektable --lax -l 32 -b 6144 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org18Lax--no-padding --no-seektable --lax -l 32 -b 8192 -m -r 15 -e -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Xiph.Org19Lax--no-padding --no-seektable --lax -l 32 -b 4096 -m -r 15 -e -p -A tukey(0.5);partial_tukey(2);punchout_tukey(3)
Flake00Subset-p 0 -6
Flake01Subset-p 0 -7
Flake02Subset-p 0 -8
Flake03Subset-p 0 -9
Flake04Subset-p 0 -10
Flake05Subset-p 0 -10 -v 1
Flake06Subset-p 0 -10 -v 2
Flake07Lax-p 0 -11
Flake08Lax-p 0 -12
Flake09Lax-p 0 -12 -v 1
Flake10Lax-p 0 -12 -v 2
Flake11Lax-p 0 -12 -b 8192 -v 1
Flake12Lax-p 0 -12 -b 8192 -v 2
Flake13Lax-p 0 -12 -b 12288 -v 1
Flake14Lax-p 0 -12 -b 12288 -v 2
Nayuki00SubsetbaseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=4 minLpcOrder=-1 maxLpcOrder=-1 lpcRoundVars=0 maxRiceOrder=8
Nayuki01SubsetbaseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=8 lpcRoundVars=0 maxRiceOrder=5
Nayuki02SubsetbaseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8
Nayuki03SubsetbaseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=4 minLpcOrder=1 maxLpcOrder=12 lpcRoundVars=4 maxRiceOrder=8
Nayuki04SubsetbaseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8
Nayuki05SubsetbaseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=1 maxRiceOrder=8
Nayuki06SubsetbaseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=2 maxRiceOrder=8
Nayuki07SubsetbaseSize=576 sizeMultiples=1,2,3,4,5,6,7,8 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=3 maxRiceOrder=8
Nayuki08LaxbaseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=22 lpcRoundVars=0 maxRiceOrder=15
Nayuki09LaxbaseSize=4096 sizeMultiples=1 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=0 maxRiceOrder=15
Nayuki10LaxbaseSize=2048 sizeMultiples=1,2,3,4 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8
Nayuki11LaxbaseSize=2048 sizeMultiples=1,2,3,4 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=0 maxRiceOrder=15
Nayuki12LaxbaseSize=1024 sizeMultiples=1,2,3,4,5,6,7,8,9,10,11,12 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=12 lpcRoundVars=0 maxRiceOrder=8
Nayuki13LaxbaseSize=1024 sizeMultiples=1,2,3,4,5,6,7,8,9,10,11,12 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=0 maxRiceOrder=15
Nayuki14LaxbaseSize=1024 sizeMultiples=1,2,3,4,5,6,7,8,9,10,11,12 minFixedOrder=0 maxFixedOrder=1 minLpcOrder=2 maxLpcOrder=32 lpcRoundVars=1 maxRiceOrder=15

Note that I left a good chunk of options for Nayuki’s encoder unexplored due to immense increases in computation time. The running time is more or less independent of baseSize, but it is proportional to the sum of the sizeMultiples list, together which determine the set of usable block sizes. Also, the run time is more or less proportional to maxLpcOrder minus minLpcOrder, and proportional to 2 to the power of lpcRoundVars (exponential time!).

Conclusion

Xiph.Org’s FLAC encoder is a fine piece of software in many ways – feature-rich, very fast, and good ratio of bytes-saved-to-CPU-time-spent. I showed that it is possible in practice to beat its compressed output size by about 100 KB per song, by applying variable block sizes intelligently. However Nayuki’s FLAC encoder is slow, and the tiny savings in file size are probably not worth the extra CPU time. It is questionable whether Nayuki’s encoder serves any practical purpose other than to show what magnitude of benefit is achieved by switching block sizes in a near-optimal way.

This work nearly exhausts all the possibilities that the current FLAC file format offers for shrinking audio data. Certainly the stereo coding modes, LPC order, Rice partition order, and Rice partition parameters are all searched exhaustively. The only places with room for improvement (however small) are the choice of LPC coefficients (where Xiph.Org currently beats Nayuki), the precision of LPC coefficients (probably not important), and trying a more fine-grained set of variable block sizes (say {100,200,300,...,19900} versus {5000,10000,15000}). There already exists other lossless audio codecs with slightly better compression than FLAC, albeit with tradeoffs in CPU time, licensing, etc. It would be interesting to see an exploration in adding new coding methods to FLAC, such as Huffman coding (instead of Rice coding), better modeling of non-stationary distributions, and who knows what.

Footnotes

  • [0]: “--blocksize=#: [...] The reference encoder uses the same block size for the entire stream.” (Source)
  • [1]: “Blocking: [...] Though FLAC allows the block size to vary within a stream, the reference encoder uses a fixed block size.” (Source)
  • [2]: Except that the final block may be shorter than the constant block size. This desirable behavior ensures that the FLAC file has the exact same number of samples as the input, unlike some other codecs which may silently introduce padding.