# Area Efficient Radix $4^{\mathbf{2}} \mathbf{6 4}$ Point Pipeline FFT Architecture Using Modified CSD Multiplier 

F. Siddiq ${ }^{1^{*}}$, H. Jamal ${ }^{2}$, T. Muhammad ${ }^{1}$ and M. Iqbal ${ }^{1}$<br>${ }^{1}$ Electrical Engineering Department, University of Engineering and Technology, Taxila, Pakistan<br>${ }^{2}$ Ghulam Ishaq Khan Institute of Engineering Science and Technology, Topi KPK, Pakistan<br>(Received March 04, 2014 and accepted in revised form August 18, 2014)


#### Abstract

A modified Fast Fourier Transform (FFT) based radix $4^{2}$ algorithm for Orthogonal Frequency Division Multiplexing (OFDM) systems is presented. When compared with similar schemes like Canonic signed digit (CSD) Constant Multiplier, the modified CSD multiplier can provide a improvement of more than $36 \%$ in terms of multiplicative complexity. In Comparison of area being occupied the amount of full adders is reduced by $32 \%$ and amount of half adders is reduced by $42 \%$. The modified CSD multiplier scheme is implemented on Xilinx ISE 10.1 using Spartan-III XC3S1000 FPGA as a target device. The synthesis results of modified CSD Multiplier on Xilinx show efficient Twiddle Factor ROM Design and effective area reduction in comparison to CSD constant multiplier.


Keywords: FFT, Radix $4^{2}$, OFDM, CSD multiplier, Xilinx, Twiddle factor, FPGA

## 1. Introduction

The FFT based processor is commonly used in cellular communication for image and signal processing. The continuous processing makes pipelined FFT the major mechanism for low power applications [1]. In the frequency domain the filtering and correlation can be executed with lesser number of processes [2]. Pipelined mechanism is efficient for lower latency and low power consumption [3]. FFT is used to compute efficient DFT, and find its applications in digital spectrum analysis, ultra wide band and multirate filters [4, 5]. The FFT is a set of algorithms which are more computationally efficient than the DFT [6]. Table 1 represents the computational requirements of different FFT architectures as depicted in [7]. In radix $2^{4}$ algorithm complex multiplication is minimized [8]. Novel coefficient ordering based low power radix 4 FFT diminishes the switching activity between consecutive coefficients was presented in [9].

By choosing the input and output carefully, it leads to significant memory and latency savings [10]. In pipeline mechanism radix 4 algorithms is advantageous
as compared to radix 2 algorithms [11]. Computational and circuit complexity can be balanced by using radix 4 [12]. FFT processor is categorized as the pipeline mechanism, parallel mechanism and perfect systolic range [13]. Pipeline mechanism gives a trade of hardware complexity and dissension rate. Figure 1 depicts the pipeline architecture for FFT [14]. Figure 2 comprises the quantity of computational units scattered with delay commutator for inter stage data reallocation.

Pipeline FFT Architectures enlisted in Table 2 shows different structures alongwith computational complexities [15].

## 2. FFT Algorithm

The N pt. FFT algorithm derived from DFT framework is:

$$
\begin{equation*}
X(K)=\sum_{n=0}^{N-1} x(n) W_{N}^{n k} \quad \text { For } k=0,1 \ldots \ldots . N-1 \tag{1}
\end{equation*}
$$

The N pt. IFFT is:

$$
\begin{equation*}
X(K)=\frac{1}{N} \sum_{n=0}^{N-1} x(n) W_{N}^{-n k} \quad \text { For } k=0,1 \ldots \ldots . N-1 \tag{2}
\end{equation*}
$$

Table 1. Operations required for 64 -point FFT.

| Operations | Radix-2 | Radix-4 | Radix-8 | Split Radix | Wino grad |
| :--- | :---: | :---: | :---: | :---: | :---: |
| Real Additions | 1032 | 976 | 972 | 964 | 1394 |
| Real Multiples | 264 | 208 | 204 | 196 | 198 |
| Total | 1296 | 1174 | 1176 | 1160 | 1592 |

[^0]Stage 1
Stage V-1
Stage V


Figure 1. N point radix 4 pipelined FFT processor.


Figure 2. 64 pt. 4 parallel pipeline mechanism.
Table 2. Relationship of the pipeline FFT architectures.

|  | Complex Multipliers | Complex Adders | Memory | Control |
| :--- | :---: | :---: | :---: | :--- |
| R 2 MDC | $2 \log _{4} N-1$ | $4 \log _{4} N$ | $3 \frac{N}{2}-2$ | Simple |
| R 2 SDF | $2 \log _{4} N-1$ | $4 \log _{4} N$ | $N-1$ | Simple |
| R 4 SDF | $\log _{4} N-1$ | $8 \log _{4} N$ | $N-1$ | Medium |
| R 4 MDC | $3\left(\log _{4} N-1\right)$ | $8 \log _{4} N$ | $5 \frac{N}{2}-4$ | Simple |
| R 4 SDC | $\log _{4} N-1$ | $3 \log _{4} N$ | $2 N-2$ | Complex |
| $\mathrm{R} 2^{2} \mathrm{SDF}$ | $\log _{4} N-1$ | $4 \log _{4} N$ | $N-1$ | Simple |

Where $\quad W_{N}^{n k}=e^{-j}\left(\frac{2 \pi}{N}\right) k n$
FFT and IFFT processors have two alterations. FFT has inverted direction of the multiplier. IFFT has different standardization feature of $\frac{1}{\mathrm{~N}}$ in Eqn. (2). N samples FFT/IFFT computation needs $\mathrm{O}\left(\mathrm{N}_{2}\right)$ arithmetic operations so lop of chip area is required. By using radix-r fast Fourier formulation, it reduces to $\mathrm{O}\left(\log _{2}(\mathrm{~N})\right)$ arithmetic procedures in $\log _{\mathrm{r}}(\mathrm{N})$ phases. The ' $r$ ' can have be 2,4 or 8 value [15]. There are two decimation approaches, DIF (Decimation in frequency) and DIT (Decimation in Time). Radix-4 DIF based FFT represents the DFT equation as four additions, and also splits it into 4 equalities, and every equality computes every $4^{\text {th }}$ output sample. Sub-sequent equalities shows DIF for radix-4.
$X(K)=\sum_{n=0}^{N / 4-1} x(n) W_{N}^{n k}$

$$
\begin{align*}
& X(K)=\sum_{n=0}^{N / 4-1} x(n) W_{N}^{n k}+\sum_{n=\frac{N}{4}}^{\frac{N}{2}-1} x(n) W_{N}^{n k}+  \tag{4}\\
& \quad \sum_{n=\frac{N}{2}}^{\frac{3 N}{4}-1} x(n) W_{N}^{n k}+\sum_{n=\frac{3 N}{4}}^{N-1} x(n) W_{N}^{n k}  \tag{5}\\
& =\sum_{n=0}^{N / 4-1}\left[x(n)+W_{N}^{K\left(\frac{n}{4}\right)} x\left(n+\frac{N}{4}\right)+W_{N}^{K\left(\frac{n}{2}\right)} x\left(n+\frac{N}{2}\right)+\right. \\
& \left.W_{N}^{K(3 n / 4)} x(n+3 N / 4)\right] W_{N}^{n k} \tag{6}
\end{align*}
$$

The three twiddle factors are shown:
$\mathrm{W}_{\mathrm{N}}^{\mathrm{K}(\mathrm{N} / 4)}=\mathrm{e}^{-\mathrm{j}\left(\frac{2 \pi}{\mathrm{~N}}\right) \mathrm{K}(\mathrm{N} / 4)}=\mathrm{e}^{-\mathrm{j}\left(\frac{\pi}{2}\right)}=(-\mathrm{j})^{\mathrm{K}}$
$\mathrm{W}_{\mathrm{N}}^{\mathrm{K}(\mathrm{N} / 2)}=\mathrm{e}^{-\mathrm{j}\left(\frac{2 \pi}{\mathrm{~N}}\right) \mathrm{K}(\mathrm{N} / 2)}=\mathrm{e}^{-\mathrm{j}(\pi)}=(-1)^{\mathrm{K}}$
$\mathrm{W}_{\mathrm{N}}^{\mathrm{K}(3 \mathrm{~N} / 4)}=\mathrm{e}^{-\mathrm{j}\left(\frac{2 \pi}{\mathrm{~N}}\right) \mathrm{K}(3 \mathrm{~N} / 4)}=\mathrm{e}^{-\mathrm{j}(3 \pi / 2)}=(\mathrm{j})^{\mathrm{K}}$
Equation (4) can thus be expressed as:
$X(K)=\sum_{n=0}^{N / 4-1}\left[\begin{array}{c}x(n)+(-j)^{K} x\left(n+\frac{N}{4}\right)+(-1)^{K} x \\ \left(n+\frac{N}{2}\right)+(j)^{K} x\left(n+\frac{3 N}{4}\right)\end{array}\right] W_{N}^{n k}$
Four sub- output sequences can be generating by putting ' $\mathrm{k}=4 \mathrm{r}, \mathrm{k}=4 \mathrm{r}+1, \mathrm{k}=4 \mathrm{r}+2$ and $\mathrm{k}=4 \mathrm{r}+3$ ':
$X(K)=\sum_{n=0}^{\frac{N}{4}-1}\left[\begin{array}{c}x(n)+x\left(n+\frac{N}{4}\right)+x\left(n+\frac{N}{2}\right)+ \\ x\left(n+\frac{3 N}{4}\right) W_{N}^{0}\end{array}\right] W_{\frac{N}{4}}^{n r}$
$X(K)=\sum_{n=0}^{\frac{N}{4}-1}\left[\begin{array}{c}x(n)-j x\left(n+\frac{N}{4}\right)-x\left(n+\frac{N}{2}\right)+ \\ j x\left(n+\frac{3 N}{4}\right) W_{N}^{n}\end{array}\right] W_{\frac{N}{4}}^{n r}$

$$
\begin{gather*}
X(K)=\sum_{n=0}^{N / 4-1}\left[x(n)-x\left(n+\frac{N}{4}\right)+x\left(n+\frac{N}{2}\right)-\right.  \tag{13}\\
\left.x(n+3 N / 4) W_{N}^{2 n}\right] W_{N / 4}^{n \mathrm{n}}
\end{gathered} \quad \begin{gathered}
X(K)=\sum_{n=0}^{\frac{N}{4}-1}\left[\begin{array}{c}
x(n)+j x\left(n+\frac{N}{4}\right)-x \\
\left.\left(n+\frac{N}{2}\right)_{j x\left(n+\frac{3 N}{4}\right) W_{N}^{3 n}}\right] W_{\frac{N}{4}}^{n r}
\end{array} .\right.
\end{gather*}
$$

By putting from $\mathrm{r}=$ ' 0 ' to ' $\mathrm{N} / 4-1$ ':
$X(4 r), X(4 r+1), X(4 r+2)$, and $X(4 r+3)$ are $N / 4-$ point DFTs. Every $\mathrm{N} / 4$ output is a sum of four input samples $\left(x(n) \quad x\left(n+\frac{N}{4}\right), x\left(n+\frac{N}{2}\right)\right.$ and $x\left(n+\frac{3 N}{4}\right)$, all multiplied by $-1, \mathrm{j}$, or $+1-\mathrm{j}$. Twiddle Factor $\left(\mathrm{W}_{\mathrm{N}}^{0}, \mathrm{~W}_{\mathrm{N}}^{\mathrm{n}}, \mathrm{W}_{\mathrm{N}}^{2 \mathrm{n}}\right.$ or $\left.\mathrm{W}_{\mathrm{N}}^{3 \mathrm{n}}\right)$ is multiplied by above sum. Every N/4-sample DFTs is divided into four N/16sample DFTs. Every N/16 DFT is distributed further in four N/64-Pt.and so on.. A basic radix-4 Processing element (Butterfly) is represented in Figure 3.


Figure 3. Radix-4 DIF FFT butterfly.
Every sample is complex in processing element (Butterfly unit). A basic Processing Element graph between inputs and outputs is depicting in Figure 4. Real and imaginary parts of twiddle factor can be calculated as follows:
$\mathrm{W}_{\mathrm{N}}=\mathrm{e}^{-\mathrm{j}\left(\frac{2 \pi}{\mathrm{~N}}\right.}=\cos \left(\frac{2 \pi}{\mathrm{~N}}\right)-\mathrm{j} \sin \left(\frac{2 \pi}{\mathrm{~N}}\right)$.


Figure 4. Radix-4 DIF FFT Butterfly, Complex data.

Table 3. The second stage twiddle factors.

| $\mathrm{k} 1, \mathrm{k} 2$ | 0,0 | 0,1 | 1,0 | 1,1 |
| :---: | :---: | :---: | :---: | :---: |
| $\mathrm{n} 3, \mathrm{n} 4$ | 1 | 1 | 1 | 1 |
| 0,0 | 1 | $W_{16}^{2}$ | $W_{16}^{1}$ | $W_{16}^{3}$ |
| 0,1 | 1 | $W_{16}^{2}=-j$ | $W_{16}^{2}$ | $W_{16}^{6}=-j W_{16}^{2}$ |
| 1,0 | $W_{16}^{6}=-j W_{16}^{2}$ | $W_{16}^{3}$ | $W_{16}^{9}=-W_{16}^{1}$ |  |

Table 4. Modified CSD binary illustration ( $\overline{1}$ means -1 ).

| Coefficients |  | Dec. | 2's Complement | CSD | Modified CSD |
| :---: | :---: | :---: | :---: | :---: | :---: |
| mo | $\operatorname{Cos}(\mathrm{pi} / 8)$ | 0.9239 | 011101100100 | 100010100100 | $(0.3827+0.0793) * 2$ |
| m 1 | $\operatorname{Sin}(\mathrm{pi} / 8)$ | 0.3827 | 001100001111 | 010100010001 | $(0.5-0.1173)=(010000000000)$ <br> $-(000100010000)$ |

## 3. Radix- $\mathbf{4}^{\mathbf{2}}$ Algorithm

Using a 3-D linear index mapping of radix-4 FFT, the first two stages in cascade decomposition can be equated as:
$\mathrm{n}=\frac{\mathrm{N}}{4} \mathrm{n} 1+\frac{\mathrm{N}}{16} \mathrm{n} 2+\mathrm{n} 3 \quad\left\{0 \leq \mathrm{n} 1, \mathrm{n} 2 \leq 3, \mathrm{n} 3=0 \sim \frac{\mathrm{~N}}{16}-1\right\}$
$\mathrm{k} 1+4 \mathrm{k} 2+16 \mathrm{k} 3\left\{0 \leq \mathrm{k} 1, \mathrm{k} 2 \leq 3, \mathrm{k} 3=0 \sim \frac{\mathrm{~N}}{16}-1\right\}$
The DFT equations are:
$\mathrm{x}(\mathrm{k} 1+4 \mathrm{k} 2+16 \mathrm{k} 3)$
$=\sum_{\mathrm{n} 3=0}^{\frac{\mathrm{N}}{16}-1} \sum_{\mathrm{n} 2=0}^{3} \cdot \sum_{\mathrm{n} 1=0}^{3} \cdot x\left(\frac{N}{4} n 1+\frac{\mathrm{N}}{16} n 2+n 3\right)$
$\mathrm{W}_{\mathrm{N}}^{\left(\frac{\mathrm{N}}{4}_{\mathrm{n} 1+\frac{\mathrm{N}}{16}}^{\mathrm{n} 2+\mathrm{n} 3}\right)(\mathrm{k} 1+4 \mathrm{k} 2+16 \mathrm{k} 3)}$
$\left.\left.=\sum_{\mathrm{n} 3=0}^{\frac{\mathrm{N}}{16}-1} . \sum_{\mathrm{n} 2=0}^{3} \mathrm{~B}_{\frac{\mathrm{N}}{4}}^{\mathrm{k} 1} \frac{\mathrm{~N}}{16} \mathrm{n} 2+\mathrm{n} 3\right) \mathrm{W}_{16}^{(\mathrm{n} 2)(\mathrm{k} 1+4 \mathrm{k} 2)}\right\}$
$\mathrm{W}_{\mathrm{N}}^{(\mathrm{n} 3)(\mathrm{k} 1+4 \mathrm{k} 2)} \mathrm{W}_{\frac{\mathrm{N}}{16}}^{(\mathrm{n} 3)(\mathrm{k} 3)}$
$=\sum_{\mathrm{n} 3=0}^{\frac{\mathrm{N}}{16}-1} \mathrm{H}_{\frac{\mathrm{N}}{16}}^{\mathrm{k} 1 \mathrm{k} 2}(\mathrm{n} 3)\left(\mathrm{W}_{16}^{(\mathrm{n} 3)(\mathrm{k} 1+4 \mathrm{k} 2)}\right\} \mathrm{W}_{\frac{\mathrm{N}}{16}}^{(\mathrm{n} 3)(\mathrm{k} 3)}$
The first butterfly stage is:
$\Delta=\frac{\mathrm{N}}{16} \mathrm{n} 2+\mathrm{n} 3$
$B_{\frac{\mathrm{N}}{4}}^{\mathrm{k} 1}(\Delta)=\mathrm{x}(\Delta)+(-\mathrm{j})^{\mathrm{k} 1} \mathrm{x}\left(\Delta+\frac{\mathrm{N}}{4}\right)+(-1)^{\mathrm{k} 1} \mathrm{x}\left(\Delta+\frac{\mathrm{N}}{2}\right)+$
$(\mathrm{j})^{\mathrm{k} 1} \mathrm{X}\left(\Delta+\frac{3 \mathrm{~N}}{4}\right)$
The second butterfly has the structure, $\mathrm{H}_{\frac{\mathrm{N}}{16}}^{\mathrm{k} 1 \mathrm{k} 2}(\mathrm{n} 3)$ is expressed as
$\mathrm{H}_{\frac{\mathrm{N}}{16}}^{\mathrm{k} 1 \mathrm{k} 2}(\mathrm{n} 3)=\mathrm{B}_{\frac{\mathrm{N}}{4}}^{\mathrm{k} 1}(\mathrm{n} 3)+(-j)^{\mathrm{k} 2} \mathrm{~W}_{16}^{\mathrm{k} 1} \mathrm{~B}_{\frac{\mathrm{N}}{4}}^{\mathrm{k} 1}\left(\mathrm{n} 3+\frac{\mathrm{N}}{16}\right)+$
$(-1)^{\mathrm{k} 2} \mathrm{~W}_{16}^{2 \mathrm{k} 1} \mathrm{~B}_{\frac{\mathrm{N}}{4}}^{\mathrm{k} 1}\left(\mathrm{n} 3+\frac{2 \mathrm{~N}}{16}\right)+(\mathrm{j})^{\mathrm{k} 2} \mathrm{~W}_{16}^{3 \mathrm{k} 1} \mathrm{~B}_{\frac{\mathrm{N}}{4}}^{\mathrm{k} 1}\left(\mathrm{n} 3+\frac{3 \mathrm{~N}}{16}\right)$
Twiddle factor $\mathrm{W}_{16}^{\mathrm{n} 3(\mathrm{k} 1+4 \mathrm{k} 2)}$ decomposition is performed by fully dedicated multiplications. In Eq. (16). The Eq. (18) depicts the first processing element structure with twiddle factor $\mathrm{W}_{16}^{\mathrm{ik} 1}(\mathrm{i}=0,1,2,3)$ multiplication.
$W_{16}^{0}$ show the value of " 1 ". $W_{16}^{1}$ have value of $\cos$ ( $\mathrm{pi} / 8$-jsin ( $\mathrm{pi} / 8$ ). This demands a composite multiplier which can be realized with shift and add operations. $\mathrm{W}_{16}^{2}$ has value of $\operatorname{Cos}(\mathrm{pi} / 4-\mathrm{j} \sin (\mathrm{pi} / 4)$. Twiddle factor can be obtained by using given trigonometric function:
$\operatorname{Cos}(\mathrm{pi} / 4)=\sin (\mathrm{pi} / 4)=2 \sin (\mathrm{pi} / 8) \cos (\mathrm{pi} / 8)$.
$\mathrm{W}_{16}^{3}$ can be written as $\sin (\mathrm{pi} / 8)-\mathrm{j} \cos (\mathrm{pi} / 8)$ and executed by inverting real and imaginary parts of composite Multiplier $\mathrm{W}_{16}^{1}$. CSD representation is used for area and power consumption to about 33\% [17-18]. Twiddle factor $\mathrm{W}_{16}^{1}$ in Table 4 showing the 12-bit

Table 5. Constant multiplier results by control signals.

| S(1) | S(0) | $O_{1}$ | $\mathrm{O}_{2}$ | $\mathrm{O}_{3}$ | $\mathrm{O}_{4}$ | $O_{r e}+\boldsymbol{j} O_{\text {Im }}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | $\mathrm{x} m_{0}$ | $\mathrm{x} m_{1}$ | $\mathrm{ym}_{0}$ | $\mathrm{y} m_{1}$ | $(\mathrm{x}+\mathrm{j} y) W_{16}^{1}$ |
| 0 | 1 | $2 \times m_{0} m_{1}$ | $2 \times m_{0} m_{1}$ | $2 \mathrm{ym} m_{0} m_{1}$ | $2 \mathrm{ym} m_{0} m_{1}$ | $(\mathrm{x}+\mathrm{j} y) W_{16}^{2}$ |
| 1 | 0 | $\mathrm{x} m_{0}$ | $\mathrm{x} m_{1}$ | $\mathrm{y} m_{0}$ | $\mathrm{y} m_{1}$ | $(\mathrm{x}+\mathrm{jy}) W_{16}^{3}$ |
| 1 | 1 | Bypass mode |  |  |  | $(\mathrm{x}+\mathrm{jy}) W_{16}^{0}$ |



Figure 5. Proposed Modified CSD multiplier.
coefficients for both Modified CSD format and 2's complement. Here $\mathrm{x}+\mathrm{jy}$ is input, the Modified CSD multiplier for the twiddle factors $W_{16}^{i}$ is represented in Figure 5. Table 5 representing output product for the Modified CSD multiplier which can be produced by control signals (S).

## 4. R2 ${ }^{i}$ SDF and R4 ${ }^{i}$ SDC Pipeline Architecture

Figures 6 and 7 represent the R2 ${ }^{i}$ SDF pipeline approach and $\mathrm{R} 4^{\mathrm{i}} \mathrm{SDC}$ pipeline approach when samples are 256 . The sign, $\otimes$, characterizes the programmable multiplier and the mark, $\odot$, characterizes the Modified CSD multiplier. Figure 6 (a) shows $\mathrm{R} 2^{2} \mathrm{SDF}$ where the programmable memory for twiddle factor is allotted. Figure 6(b) shows R2 ${ }^{3}$ SDF in which 2 programmable multipliers along with two fix valued multipliers are allotted. One fix valued multiplier is almost equal to 0.4 programmable multipliers [18-20]. Figure 6(c) depicts the Modified $\mathrm{R} 2^{4}$ SDF architecture. Two modified CSD multipliers replace the programmable multiplier that is used in radix $2^{2}$ SDF. Figure 7(a) depicts the R4SDC in
which the programmable multiplier and memory for the twiddle factor is allotted for every column. Figure 7(b) depicts the Modified $\mathrm{R} 4^{2} \mathrm{SDC}$ in which two of the proposed CSD multipliers in the place of programmable multipliers.

## 5. Implementation Example

Both real as well as imaginary parts of 12 bit multipliers are implemented. Out of these multipliers is the CSD complex one having lesser number of partial products [17]. The second one is modified CSD multiplier. For the compensation of quantization error, less error constant width CSD multiplier [21] together with reduced error constant width Booth multiplier [22] is presented. When the input signal is ' X ', the designed modified CSD multipliers have $\mathrm{m}_{0}$ with $\mathrm{m}_{1}$ CSD coefficients as presented in Table 4. The implemented modified CSD multiplier is shown in Figure 8. Figure 8(a) depicts sin (pi/8); Figure 8 (b) depicts the model of 0.0793 that is used in the implementation of $\cos (\mathrm{pi} / 8)$ and similarly figure 8 (c) depicts $\cos (\mathrm{pi} / 8)$.

The comparison of full adders and half adders of modified Constant complex Multiplier and Constant Complex Multiplier is shown in Figure 9. When compared with similar schemes like CSD Multiplier [17], the modified CSD multiplier can provide improvement of more than $36 \%$ in terms of
multiplicative complexity. In terms of area occupied amount of Full adders is reduced approximately by $32 \%$ and amount of half adders is reduced by $42 \%$ when compared with [17]. The synthesis of Modified CSD Multiplier on Xilinx shows the efficient Twiddle Factor ROM Design in Table 6.


Figure 6. Radix-2i SDF 256 point pipeline FFT (a) $R 2^{2} \operatorname{SDF}$ (b) R23SDF(c) R24SDF.

(a)


Figure 7. Radix- $4^{\mathrm{i}}$ SDC 256 point pipeline FFT (a) R4SDC (b) R42SDC.


Figure $8 \quad$ (a) $\sin (\mathrm{pi} / 8)$ architecture, (b) 0.0793 architecture and (c) $\cos (\mathrm{pi} / 8)$ architecture.


Figure 9. Comparison of full adders and half adders.

Table 6. Synthesis results of proposed modified CSD constant complex multiplier.

|  | Device Utilization Summary |  |  |
| :--- | :---: | :--- | :---: |
| Logic Utilization | Used | Available | Utilization |
| No. of Slices | 7 | 7680 | $0 \%$ |
| No. of 4 inputs LUTs | 13 | 15360 | $0 \%$ |
| No. of bounded IOBs | 24 | 173 | $13 \%$ |
|  | Device Utilization Summary |  |  |
| Logic Utilization | 6 | Available |  |
| No. of Slices | 11 | 7680 | $0 \%$ |
| No. of 4 inputs LUTs | 24 | 15360 | $0 \%$ |
| No. of bounded IOBs | 173 | $13 \%$ |  |

## 6. Conclusion

The proposed pipeline architecture allows 50 percent of total multipliers to be swapped by CSD multipliers. Synthesis Results of modified CSD Multiplier shows the Efficient Twiddle Factor Design. The modified CSD multiplier scheme is implemented on Xilinx ISE 10.1 using Spartan-III XC3S1000 FPGA as a target device. When compared with similar schemes like CSD Multiplier, the modified CSD multiplier can provide a reduction of more than $36 \%$ in terms of multiplicative complexity. In terms of area occupied amount of Full adders is reduced approximately by $32 \%$ and amount of half adders is reduced by $42 \%$. Using this technique, long length FFT processor can be developed for wireless Applications.

## References

[1] W. Han, A. T. Erdogan, T. Arslan and M. Hasan, ETRI Journal 30 (2008) 451.
[2] J. G. Proakis and D. G. Manolakis, Digital Signal Processing, Prentice Hall of India Private Limited (2003).
[3] A. Saeed, M.Elbably, G. Abdelfadeel and M.I. Eladawy, Int. J. of Circuits, Systems and Signal Processing 3 (2009) 103.
[4] K. Harikrishna, T.R. Rao and. V.A. Labay, An Efficient FFT Architecture for OFDM Communication Systems, Asia Pacific Microwave Conference (APMC), Singapore, 7-10 Dec. (2009) 449.
[5] U. Rashid, F. Siddiq, T. Muhammad and H. Jamal, The Nucleus 50 (2013) 301.
[6] J.W. Cooley and J.W. Tukey, Math. Computation 19 (1965) 297.
[7] Chi-hau Chen, Signal Processing Handbook, CRC Press (1988).
[8] L. Jia, Y. Gao, J. Isoaho and H. Tenhunen, A New VLSI Oriented FFT Algorithm and Implementation, Proceedings of

Eleventh Annual IEEE International ASIC Conference (1998) p. 337.
[9] M. Hasan, T. Arslan and J.S. Thompson, IEEE Transaction on Consumer Electronics 49 (2003) 128.
[10] User Guide "FFT MegaCore Function," Version 8.1, Altera Corporation. Available: http://www. Altera.com, Nov. (2008).
[11] E.E. Swartzlander, VLSI Signal Processing Systems, Kluwer Academic Publishers (1998).
[12] Amphion.CS246064-Point Pipelined FFT/IFFT; Available from: http://www.datasheetarchive.com/ 64-Point-datasheet. html (2002).
[13] Y. Jung; H. Yoon and J. Kim, IEEE Transactions on Consumer Electronics 49 (2003) 14.
[14] W. Han, T. Arslan, A. T. Erdogan and M. Hasan, Proc. IEEE Int. Conf. on Acoustics Speech and Signal Processing 5 (2005) 45.
[15] S. He and M. Torkelson, Designing Pipeline FFT Processor for OFDM (de) Modulation, Proc. IEEE URSI Int. Symp. Sig. Syst. Electron (1998) 257.
[16] L. Jia, Y. Gao, Jouni and H. Tenhunen, A New VLSI-oriented FFT Algorithm and Implementation, IEEE International ASIC Conf.(1998) 337.
[17] Jung-yeol Oh and Myoung-Seob Lim, Area and Power Efficient Pipeline FFT Algorithm, IEEE Workshop on Signal Processing Systems Design and Implementation (2005) 520.
[18] J.Y. Oh, J. S. Cha, S. K. Kim and M. S. Lim, Implementation of Orthogonal Frequency Division Multiplexing using radix-N Pipeline Fast Fourier Transform (FFT) Processor, Jpn. J. Appl. Phys. 42 (2003) 1.
[19] K.K. Parhi, VLSI Digital Signal Processing Systems, John Wiley \& Sons, Inc., USA (1999).
[20] W. C. Yey and C. W. Jen, IEEE Trans. Sig. Proc. 51 (2003) 864.
[21] S. M. Kim, J. G. Chung and K. K. Parhi, IEEE Int. Symp. Cir. Syst. (2002) 69.
[22] K.J. Cho, K.C. Lee, J.G. Chung and K.K. Parhi, IEEE Trans. VLSI Syst. 12 (2004) 90.


[^0]:    * Corresponding author: faisal.siddiq@uettaxila.edu.pk

