# Number-theoretic transform (integer DFT)

## Introduction

The NTT is a generalization of the classic DFT to finite fields. With a lot of work, it basically lets one perform fast convolutions on integer sequences without any round-off errors, guaranteed. Convolutions are useful for multiplying large numbers or long polynomials, and the NTT is asymptotically faster than other methods like Karatsuba multiplication.

## Review of the complex DFT

The classic discrete Fourier transform (DFT) operates on vectors of complex numbers:

1. Suppose the input vector has length $$n$$. The output vector will also have length $$n$$.

2. Let $$ω$$ (omega) be a primitive $$n$$th root of unity. In other words, $$ω^n = 1$$, but $$ω^k ≠ 1$$ for all integers $$1 ≤ k < n$$. The standard choice for the DFT is $$ω = e^{-2πi/n}$$.

3. Each output element equals a particular weighted sum of all input elements, using some powers of $$ω$$ as weights. Denoting the input vector as $$X$$ and the output vector as $$Y$$, we have (with 0-based indexing): $$Y(k) = X(0) ω^{0k} + X(1) ω^{1k} + X(2) ω^{2k} + ... + X(n-1) ω^{(n-1)k}$$.

4. The inverse transform, which restores the original vector, is given by: $$X(k) n = Y(0) ω^{-0k} + Y(1) ω^{-1k} + Y(2) ω^{-2k} + \cdots + Y(n-1) ω^{-(n-1)k}$$.

• Linear algebra guarantees that if $$ω$$ is indeed a primitive $$n$$th root of unity of the working field, then the inverse transform is correct.

• One application of the DFT is to compute a circular convolution of two same-length vectors $$X$$ and $$Y$$. To do so, compute the forward DFT on $$X$$ and $$Y$$, then point-wise multiply the two vectors together, compute the inverse transform on this vector, and finally divide by a scaling factor of $$n$$. Again, this procedure will work in any suitable field of numbers.

• The DFT’s most popular application of analyzing/representing a signal as a sum of sine waves doesn’t work in the number-theoretic transform.

## Procedure for the NTT

1. Suppose the input vector is a sequence of $$n$$ non-negative integers.

2. Choose a minimum working modulus $$M$$ such that $$1 ≤ n < M$$ and every input value is in the range $$[0, M)$$.

3. Select some integer $$k ≥ 1$$ and define $$N = kn + 1$$ as the working modulus. We require $$N ≥ M$$, and $$N$$ to be a prime number. Dirichlet’s theorem guarantees that for any $$n$$ and $$M$$, there exists some choice of $$k$$ to make $$N$$ be prime.

4. Because $$N$$ is prime, the multiplicative group of $$\mathbb{Z}_N$$ has size $$φ(N) = N-1 = kn$$. Furthermore, the group must have at least one generator $$g$$, which is also a primitive $$(N-1)$$th root of unity.

5. Define $$ω ≡ g^k \text{ mod } N$$. We have $$ω^n = g^{kn} = g^{N-1} = g^{φ(N)} ≡ 1 \text{ mod } N$$ due to Euler’s theorem. Furthermore because $$g$$ is a generator, we know that $$ω^i = g^{ik} \not≡ 1$$ for $$1 ≤ i < n$$, because $$ik < nk = N-1$$. Hence $$ω$$ is a primitive $$n$$th root of unity, as required by the DFT of length $$n$$.

6. The rest of the procedure for the forward and inverse transforms is identical to the complex DFT. Moreover, the NTT can be modified to implement a fast Fourier transform algorithm such as Cooley-Tukey.

• When convolving two vectors of length $$n$$ where each input value is at most $$m$$, the upper bound on each output value is $$m^2 n$$. Choosing a minimum working modulus of $$M = m^2 n + 1$$ is sufficient to always avoid overflow in the worst case.

• For the prime field $$\mathbb{Z}_N$$, we can either find a generator of the field then derive a primitive $$n$$th root of unity (as mentioned in the procedure), or directly find a primitive $$n$$th root of unity.

• Such a field has $$φ(φ(N)) = φ(N-1) = φ(kn)$$ generators but $$φ(n)$$ primitive $$n$$th roots of unity. The first number is greater than or equal to the second number. So if we are sampling random candidates in the range $$[0, N)$$, we are more likely (or equally likely) to succeed in finding a generator than finding a primitive $$n$$th root.

• To find a generator, first fully factorize $$N-1$$ and collect its set of unique prime factors. A candidate $$a ∈ (1, N)$$ is a generator of $$\mathbb{Z}_N$$ if and only if {for each $$p$$ in that set of unique prime factors, $$a^{(N-1)/p} \not≡ 1 \text{ mod } N$$}.

• To find a primitive root directly, first fully factorize $$n$$ and collect its set of prime factors. A candidate $$a ∈ (1, N)$$ is a primitive $$n$$th root of unity in $$\mathbb{Z}_N$$ if and only if {$$a^n ≡ 1 \text{ mod } N$$, and for each $$p$$ in that set of unique prime factors, $$a^{n/p} \not≡ 1 \text{ mod } N$$}.

• Although the procedure only describes prime fields for simplicity, it might also be possible to operate on composite rings (e.g. $$\mathbb{Z}_{100}$$) if a primitive $$n$$th root of unity exists. This could be useful if a composite modulus is much smaller than a prime modulus of the form $$N = kn + 1$$.

• Furthermore, it should be possible to operate on prime-power fields such as $$\text{GF}(2^8)$$. However, this is only useful if the input vector is composed of elements from that field, instead of being plain integers.

• Computing an NTT requires many modular multiplications. It is possible to apply Montgomery reductions (or the less efficient Barrett reductions) to speed up the modular arithmetic in an NTT.

## Examples

• Input vector $$X = (6, 0, 10, 7, 2)$$, with length $$n = 5$$. Compute a forward transform.

1. Define the minimum working modulus of $$M = 11$$. We have $$1 ≤ n < M$$, and every input value is in the range $$[0, 11)$$.

2. Select $$k = 2$$ so that with $$N = kn + 1 = 2 × 5 + 1 = 11$$, we have $$N ≥ M$$, and $$N$$ is prime.

3. Try to find a generator of $$\mathbb{Z}_{11}$$. Fully factorize $$N-1 = 10 = 2 × 5$$. Choose a candidate $$a = 6$$. We have {$$a^{10/2} = a^5 ≡ 10 \not≡ 1 \text{ mod } 11$$} and {$$a^{10/5} = a^2 ≡ 3 \not≡ 1 \text{ mod } 11$$}, so $$g = 6$$ is indeed a generator.

4. Calculate $$ω = g^k = 6^2 ≡ 3 \text{ mod } 11$$. This is our primitive $$5$$th root of unity.

5. Compute the output vector $$Y$$:
$$Y(0) = X(0)ω^{0×0} + X(1)ω^{0×1} + X(2)ω^{0×2} + X(3)ω^{0×3} + X(4)ω^{0×4} ≡ 3 \text{ mod } 11$$.
$$Y(1) = X(0)ω^{1×0} + X(1)ω^{1×1} + X(2)ω^{1×2} + X(3)ω^{1×3} + X(4)ω^{1×4} ≡ 7 \text{ mod } 11$$.
$$Y(2) = X(0)ω^{2×0} + X(1)ω^{2×1} + X(2)ω^{2×2} + X(3)ω^{2×3} + X(4)ω^{2×4} ≡ 0 \text{ mod } 11$$.
$$Y(3) = X(0)ω^{3×0} + X(1)ω^{3×1} + X(2)ω^{3×2} + X(3)ω^{3×3} + X(4)ω^{3×4} ≡ 5 \text{ mod } 11$$.
$$Y(4) = X(0)ω^{4×0} + X(1)ω^{4×1} + X(2)ω^{4×2} + X(3)ω^{4×3} + X(4)ω^{4×4} ≡ 4 \text{ mod } 11$$.

• Input vector $$Y = (3, 7, 0, 5, 4)$$, with length $$n = 5$$. Compute the corresponding inverse transform for the example above.

1. We must use the same values of $$k = 2$$, $$N = 11$$, and $$ω = 3$$ as in the forward transform. Note that $$ω^{-1} ≡ 4 \text{ mod } 11$$.

2. Compute the unscaled output vector $$X n$$:
$$X(0) n = Y(0)ω^{-0×0} + Y(1)ω^{-0×1} + \cdots + Y(4)ω^{-0×4} ≡ 8 \text{ mod } 11$$.
$$X(1) n = Y(0)ω^{-1×0} + Y(1)ω^{-1×1} + \cdots + Y(4)ω^{-1×4} ≡ 0 \text{ mod } 11$$.
$$X(2) n = Y(0)ω^{-2×0} + Y(1)ω^{-2×1} + \cdots + Y(4)ω^{-2×4} ≡ 6 \text{ mod } 11$$.
$$X(3) n = Y(0)ω^{-3×0} + Y(1)ω^{-3×1} + \cdots + Y(4)ω^{-3×4} ≡ 2 \text{ mod } 11$$.
$$X(4) n = Y(0)ω^{-4×0} + Y(1)ω^{-4×1} + \cdots + Y(4)ω^{-4×4} ≡ 10 \text{ mod } 11$$.

3. Finally, multiply each element by $$n^{-1} = 5^{-1} ≡ 9 \text{ mod } 11$$ to get back the original values: $$(8, 0, 6, 2, 10) × 9 ≡ (6, 0, 10, 7, 2) \text{ mod } 11$$.

• Input vectors $$X = (4, 1, 4, 2, 1, 3, 5, 6)$$ and $$Y = (6, 1, 8, 0, 3, 3, 9, 8)$$, with length $$n = 8$$. Compute their circular convolution.

1. Define the minimum working modulus of $$M = 9 × 9 × 8 + 1 = 649$$, because each input value is a single-digit integer (less than 10) and we want to avoid the convolution output from overflowing.

2. Select $$k = 84$$ so that with $$N = 84 × 8 + 1 = 673$$, we have $$N ≥ M$$, and $$N$$ is prime.

3. Try to find a primitive $$8$$th root of unity directly. Fully factorize $$n = 8 = 2 × 2 × 2$$. Choose a candidate $$a = 326$$. We have {$$a^8 ≡ 1 \text{ mod } 673$$} and {$$a^{8/2} = a^4 ≡ 672 \not≡ 1 \text{ mod } 673$$}, so $$ω = 326$$ is indeed a primitive $$8$$th root of unity.

4. Compute the forward transforms:
$$X' = \text{NTT}(X) = (26, 338, 228, 115, 2, 457, 437, 448)$$.
$$Y' = \text{NTT}(Y) = (38, 594, 224, 157, 14, 201, 433, 406)$$.

5. Compute the point-wise multiplication of the two vectors modulo $$N$$: $$Z' = X' Y' = (315, 218, 597, 557, 28, 329, 108, 178)$$.

6. Compute the inverse NTT with scaling:
$$Z = \text{INTT}(Z') = (123, 120, 106, 92, 139, 144, 140, 124)$$.

7. This result represents a circular convolution because, for example, $$Z(0) = X(0)Y(0) + X(1)Y(7) + X(2)Y(6) + \cdots + X(7)Y(1)$$. That is to say, each $$Z(i) ≡ \displaystyle\sum_{j=0}^{n-1} X(j)Y((i - j) \text{ mod } n) \text{ mod } N$$.

## Source code

Python
Java (int size)
Java (BigInteger size)

Library features:

• Finding suitable prime modulus
• Finding primitive $$n$$th root of unity
• Computing forward and inverse transforms (naive $$Θ(n^2)$$ algorithm)
• Computing fast forward transform (Cooley-Tukey $$Θ(n \log n)$$ algorithm)
• Computing circular convolution (using naive algorithm)
• Unit tests for all functions (some hard-coded vectors, some randomized cases)

## Proof of DFT/NTT correctness

1. The forward NTT is given by the following matrix:
$$A = \left[\begin{matrix} ω^0 & ω^0 & ω^0 & \cdots & ω^0 \\ ω^0 & ω^1 & ω^2 & \cdots & ω^{n-1} \\ ω^0 & ω^2 & ω^4 & \cdots & ω^{2(n-1)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ ω^0 & ω^{n-1} & ω^{2(n-1)} & \cdots & ω^{(n-1)^2} \end{matrix}\right]$$.

2. The unscaled inverse NTT is given by the following matrix:
$$B = \left[\begin{matrix} ω^{-0} & ω^{-0} & ω^{-0} & \cdots & ω^{-0} \\ ω^{-0} & ω^{-1} & ω^{-2} & \cdots & ω^{-(n-1)} \\ ω^{-0} & ω^{-2} & ω^{-4} & \cdots & ω^{-2(n-1)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ ω^{-0} & ω^{-(n-1)} & ω^{-2(n-1)} & \cdots & ω^{-(n-1)^2} \end{matrix}\right]$$.

3. We want to show that their product is a scaled identity matrix, i.e. $$C = AB = nI$$, where
$$I = \left[\begin{matrix} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{matrix}\right]$$.

4. Examine an arbitrary cell on the main diagonal:
$$C_{i,i} = [A \text{ row } i] · [B \text{ col } i] = \displaystyle\sum_{k=0}^{n-1} ω^{ik} ω^{-ik} = \displaystyle\sum_{k=0}^{n-1} ω^0 = \displaystyle\sum_{k=0}^{n-1} 1 = n$$.

5. Now examine an arbitrary cell off the main diagonal, with $$i ≠ j$$, letting $$δ = i - j$$, where $$-n < δ < n$$ and $$δ ≠ 0$$:
$$C_{i,j} = [A \text{ row } i] · [B \text{ col } j] = \displaystyle\sum_{k=0}^{n-1} ω^{ik} ω^{-jk} = \displaystyle\sum_{k=0}^{n-1} ω^{(i-j)k} = \displaystyle\sum_{k=0}^{n-1} ω^{δk}$$.

This sum is a geometric series. We know that in general,
$$1 + x + x^2 + x^3 + \cdots + x^{n-1} = \displaystyle\frac{x^n - 1}{x - 1}$$.

Hence $$C_{i,j} = \displaystyle\sum_{k=0}^{n-1} ω^{δk} = \displaystyle\frac{ω^{δn} - 1}{ω^δ - 1} = 0$$, which requires a bit more explanation.

In the numerator, $$ω^n = 1$$ because $$ω$$ is a (primitive) $$n$$th root of unity, so $$(ω^n)^δ = 1$$ and the overall numerator is $$0$$.

In the denominator, $$ω$$ is a primitive $$n$$th root of unity and $$-n < δ < n$$ and $$δ ≠ 0$$, thus $$ω^δ ≠ 1$$, which avoids division by zero.

6. Therefore we have proven that the matrix product $$AB$$ is a scaled identity matrix, which shows that the number-theoretic transform is invertible up to a scale factor.