Project Nayuki


Fast Fourier transform in x86 assembly

This library calculates the fast Fourier transform. It uses the same equation as the implementation on the page Free small FFT in multiple languages, and the FFT is unscaled. The key feature is that a hand-optimized assembly language implementation is provided for x86-64 AVX (supported by Intel Sandy Bridge CPUs and later).

Source code

To compile, use one of these commands:

License: MIT (open source)

Source code notes:

Benchmark

For each implementation and FFT size, the number is the execution time in nanoseconds per iteration.

FFT size C impl, -O0 C impl, -O1 C model of AVX, -O1 AVX impl Speedup
4813315190.79×
81956635291.21×
1646813487461.89×
321 087278210922.28×
642 5005944942072.39×
1285 6381 2981 1385072.24×
25613 5422 9492 6341 6801.57×
51229 9106 9976 5603 2772.00×
1 02465 86515 71515 0057 2862.06×
2 048144 45236 74633 85917 0751.98×
4 096317 59886 16280 73347 5811.70×
8 192687 521198 965189 643112 0891.69×
16 3841 497 747405 952410 043256 4351.60×
32 7683 330 2581 055 371928 485666 3251.39×
65 5367 282 3052 426 0371 951 7041 435 6061.36×
131 07216 035 3975 546 1654 142 2703 028 6411.37×
262 14434 795 63611 811 3118 978 3976 598 8871.36×
524 28890 042 66935 911 57423 751 99618 866 3431.26×
1 048 576270 490 724151 551 51771 873 20065 876 3591.09×
2 097 152658 375 195375 815 597155 215 482152 884 6521.02×
4 194 3041 361 986 528782 913 943329 475 964323 074 0381.02×
8 388 6083 225 336 3751 879 716 450685 629 776660 085 4001.04×
16 777 2167 300 920 6624 457 742 4531 427 182 7531 379 910 7121.03×
33 554 43215 711 324 8589 599 613 8602 982 921 5422 808 719 6711.06×
67 108 86434 594 570 32321 420 158 8066 379 494 5085 967 280 4761.07×

The benchmark results are a mixed bag, but a few conclusions can be drawn from the numbers: