# Realizing Real-Time Implementation of Coherent Optical OFDM Receiver with FPGAs

Noriaki Kaneda<sup>(1)</sup>, Qi Yang<sup>(1)(2)</sup>, Xiang Liu<sup>(3)</sup>, William Shieh<sup>(4)</sup>, Young-Kai Chen<sup>(1)</sup>

<sup>(1)</sup> Bell labs, Alcatel-Lucent, 600 Mountain Ave, Murray Hill, NJ 07974, USA. **kaneda@alcatel-lucent.com** (2) Victoria Research Laboratory, National ICT Australia, Dept. of Electrical and Electronic Engineering, the University of Melbourne, VIC 3010, Australia.

(3) Bell labs, Alcatel-Lucent, 791 Holmdel-Keyport Rd, Holmdel, NJ 07733, USA.

(4) ARC Special Research Centre for Ultra-Broadband Information Networks and National ICT Australia, Dept. of Electrical and Electronic Engineering, The University of Melbourne, Melbourne, VIC 3010, Australia.

**Abstract** *Recent results of real-time coherent optical OFDM receiver are reviewed. Requirements and challenges pertaining to high-speed real-time implementation of CO-OFDM receiver are discussed. An implemented receiver in a FPGA at 2.5GSamples/s is described.* 

## **Introduction**

As the advance in high-speed electronic circuits continue to accelerate, the application of such electronic circuits has enabled digital coherent detection approach with multi-level modulation formats in fiber optic communications. Optical multilevel modulation formats can be largely divided into time-domain modulation scheme such as singlecarrier  $QPSK$  modulation<sup>1,2</sup> or frequency domain modulation scheme such as orthogonal frequencydivision multiplexing  $(OFDM)^{3,4}$ . Both approaches have been demonstrated as viable solutions to significantly increase the data capacity in fiber optics communications as well as to enhance capabilities to mitigate various optical transmission impairments.

 One of the previously discussed advantages of coherent optical OFDM (CO-OFDM) format compared to optical single-carrier modulation format is its computational efficiency in compensating optical transmission impairment such as chromatic dispersion (CD) and polarization mode dispersion (PMD)<sup>5</sup>. While single-carrier modulation formats often require multi-tap equalization for channel estimation and compensation to compensate for CD and PMD and other inter symbol interference (ISI) effects $6,7$ , OFDM with large number of subcarriers can take advantage of low OFDM symbol rate to afford timedomain guard interval between symbols realized by cyclic prefix (CP) to accommodate various ISI and consequently compensate for it with a simple singletap frequency domain equalization. In addition to this CP-based ISI compensation, training symbol and pilot subcarrier assisted symbol synchronization, channel estimation, and frequency and phase estimation are commonly used techniques for optical OFDM<sup>8</sup>. While these techniques are also common to widely deployed OFDM systems in today's wireless local area networks<sup>9</sup>, special requirements of optical communication system such as several orders of magnitude higher data rate than its wireless counterpart demands additional attention and require careful studies in practical real-time implementation of

high-speed CO-OFDM systems.

 In this paper we review architectures of the realtime implementation of coherent optical OFDM receiver. We successfully demonstrate a fieldprogrammable gate array (FPGA) based real-time CO-OFDM receiver at a sampling speed of 2.5 GS/s, and show its performance in receiving a 3.55-Gb/s  $s$ ignal<sup>10</sup>.

## **OFDM frame structure**

Contrast to the coherent optical single-carrier modulation, CO-OFDM is constructed with multiple time and frequency domain design parameters such as the number of subcarriers, the fast Fourier transform (FFT) length, the frame length, and the overhead allocation. Those parameters have many degree of freedom and need to be designed such that the system meets the performance and resource objectives. Parameters specific to optical transmission system such as laser phase noise, and wavelength stability also strongly affect realizable architecture of OFDM along with more obvious limitation such as FPGA resource and speed. To begin with the discussion, time-frequency frame structure of OFDM signal needs to be defined. Fig. 1 shows time-domain OFDM frame and symbol



**Fig. 1:** Time-domain representation of OFDM frame and symbol.



index.

structures. In this example, the length of an OFDM symbol is 144 samples with 16 cyclic prefix followed by 128 modulated subcarriers. An OFDM frame consists of 1,024 samples for symbol synchronization, followed by 16 pilot symbols for channel estimation and data payload of 496 symbols. Fig. 2 shows the frequency-domain representation of the OFDM subcarriers. The horizontal axis is index of discrete Fourier transform, therefore the lowest positive frequency component is found at index 1 and the highest positive frequency is found in index 64 and so on. Out of 128 available subcarriers, 107 are used for data payload, 8 are distributed for pilot subcarriers and the center subcarrier (index 1) and 12 highest frequency subcarriers (indices 59 to 70) are unfilled.

#### **FPGA-based receiver**

Fig. 3 shows a typical opto-electronic block diagram of coherent-OFDM receiver with ADCs and a FPGA. The OFDM signal that has travelled through an optical channel is combined with a local oscillator (LO) laser in an optical 90-degree hybrid for I, Q separation. The I/Q signals are then detected by photodiodes followed by variable gain amplifiers (VOA) to balance and optimize the amplitude for analog-to-digital converters (ADC). Digitized I/Q signals are then fed to FPGA at the sampling rate. In our experiments, the sampling speed is at 2.5-GS/s and the number of bits used is 5. The lasers used



**Fig. 3:** Opto-electrical block diagram of OFDM receiver with FPGA. SE PD TIA indicates a single-ended photodiode with transimpedance amplifier.

here are external cavity lasers and their linewidth is approximately 100kHz.

#### **DSP for CO-OFDM**

Fig. 4 shows the block diagram of digital signal processing (DSP) for the real-time CO-OFDM receiver. 2.5-GS/s sampled data streams from I/Q channels are fed into FPGA through high-speed serial ports. Since the FPGA can only realistically run at a clock rate up to a few hundreds of megahertz, the high-speed sampled digital signals are first demultiplexed to 16 parallel channels. Parallelization by de-multiplexing lowers the required channel processing speed to 156.25 MHz. The next procedure for OFDM is symbol synchronization. Traditional offline processing uses the Schimdl approach<sup>11</sup>, where the autocorrelation of two identical patterns inserted at the beginning of each OFDM frame gives rise to a peak indicating the starting position of the OFDM frame and symbol. The autocorrelation function at sampling location *d* is represented as,

$$
P(d) = \sum_{k=0}^{L-1} r_{d+k}^* r_{d+k+L}
$$
 (1)

which can be efficiently obtained through the following recursive equation

$$
P(d+1) = P(d) + r_{d+L}^{*} r_{d+2L} - r_d^{*} r_{d+L}
$$
 (2)

where *L* indicates the length of synchronization pattern,  $r_{d}$  represents the complex received samples at location d, and  $P(d)$  indicates the autocorrelation term whose amplitude gives peak when the synchronization is found. An example of DSP implementation of equation (2) can be found in Fig. 5. The relatively simple Eq. (2) and the architecture in Fig. 5, however, assumes that the incoming signal is



**Fig. 4:** DSP diagram of real-time OFDM receiver.



**Fig. 5:** DSP block diagram of symbol synchronization based on autocorrelation taken on serial data samples.

serial stream and this implementation only works if the process clock rate is same as the sampling rate. This is because the moving window for autocorrelation needs to be taken sample by sample while multiple samples need to be processed simultaneously at a parallel process clock cycle. Locating the exact frame beginning would involve heavy computation that processes the data among all the channels.

 To improve computation efficiency, we took a 2 step approach. In the first step, we set the cyclic prefix length to be equal to the number of the demultiplexed channels, i.e., 16. This allows any of the first 16 points to be used as the starting point of an OFDM frame. In the second step, we replicated the synchronization pattern by the number of the demultiplexed channels (16), as well so that we can effectively downsample the incoming signals by 16 and still can access to the synchronization pattern at any one of the de-multiplexed channel. This 2-step approach gives us the convenience of needing only one channel to perform the auto-correlation utilizing exactly the same scheme illustrated in Fig. 5, without corrupting the subsequent OFDM symbols. The detailed signal processing flow of the symbol synchronization is shown in Fig. 6. The two identical synchronization patterns of length 32 (A1, A2, …, An; A1, A2, …, An, n=32) are first replicated 16 times so that after the de-multiplexing, the synchronization patterns could be found in every parallel channel. The following 16 points are the cyclic prefix for the first OFDM symbol. By performing the auto-correlation of the sampled data in channel 1, the resulting strong peak indicates the beginning of each frame. One of the drawback of this simple approach is the reduced tolerance to the frequency offset of the lasers as well as reduced tolerance to ISI. Since the frequency offset compensation is not implemented due to lack of FPGA resource, the same laser is split and used for the transmit and LO laser. Once the symbol synchronization is completed, a large proportion of FPGA's on-chip memory is used for first-in first-out (FIFO) to convert the parallel data into the natural sample order for each parallel lane, and group every 144 points as an OFDM symbol. After removing the cyclic prefix, 128-point FFT for each channel is performed to convert the signal back to the frequency domain. In this work, we used Altera's built-in FFT function to perform such signal processing. We used 8-bit resolution for FFT and the number of real multipliers required for each 128-point FFT is 24.

#### **Channel estimation**

Once in frequency domain, we still need to estimate and compensate for the channel transfer and the random phase rotation caused by laser phase noise in much the same way as explained in detail<sup>12</sup>. This received signal in frequency domain at each subcarrier can be represented as

$$
R_d(k) = H(k)B_d(k)C_d(k)
$$
 (3)

where *k* is the subcarrier index, *d* is the symbol index,  $H(k)$  is a slow moving channel transfer function that we can assume constant for the entire frame of 512 symbols,  $C_d(k)$  is the transmitted signal modulated in QPSK,  $R_{\mu}(k)$  is the received signal.  $B_{\mu}(k)$  is the Fourier transformed time variant random phase fluctuation that needs to be estimated and compensated for each symbol.

Channel transfer function can be estimated by comparing the received data with the pilot symbols and average over all pilot symbols per frame, we can estimate inverse of transfer function to be used for later compensation as



**Fig. 6:** The symbol synchronization scheme applied in the real-time CO-OFDM receiver. A measured auto-correlation trace is also shown on the right.



**Fig. 7:** Measured BER vs. OSNR of the real-time 3.55Gb/s CO-OFDM receiver.

$$
H^{-1}(k) = \frac{1}{M} \sum_{d=1}^{M} R_d^{*}(k) C_d(k)
$$
 (4)

where *M* is the number of pilot symbols per frame.

 Among 115 filled subcarriers, 8 pilot subcarriers are evenly spaced across OFDM spectrum for phase estimation. Fig. 2 shows the location of the pilot subcarriers with respect to the subcarrier indices. The equation (5) can be slightly modified for phase estimation such that,

$$
\stackrel{\wedge}{B_d} = \frac{1}{N_p} \sum_{k=1}^{N_p} R_d^*(k) C_d(k)
$$
 (5)

where  $N<sub>n</sub>$  is the number of pilot subcarriers and 8

pilots are used in this case before data subcarriers are compensated. Only 2 complex multipliers per parallel channel is needed for channel and phase estimation.

Finally, the recovered data is compared with transmitted data inside the FPGA to identify the errors, and the error distribution over a span of continuous 512 OFDM frames is recorded and output periodically through SignalTap II which is an embedded logic analyzer of Altera FPGA. The total numbers of real multipliers used are 500 close to the full resources of 504 multipliers of this FPGA.

### **Experimental results**

The inset right figure in Fig. 6 shows the measured amplitude of the autocorrelation. A strong peak is observed and the peak index is successfully recovered to indicate the beginning of the frame and symbol. Fig. 7 shows the measured and simulated BER as a function of optical signal-to-noise ratio (OSNR). A BER better than  $10^{-3}$  is observed at measured OSNR of 3 dB. Also there is no apparent error floor observed in this measurement. Fig. 12 shows two recorded BER measurements performed by the FPGA at high OSNR (of ~19 dB), one over a single OFDM frame and the other over 512 OFDM frames. In the first measurement, there is no error count recorded over the OFDM frame. In the second measurement, the measured error counts over 512 OFDM frames or  $\sim 5.4 \times 10^7$  bits are 2, indicating a



**Fig. 8:** Recorded BER measurement over a single OFDM frame (upper) and over 512 frames (lower).

 $=\frac{1}{M}\sum_{d}^{M}R_{d}^{*}(k)C_{d}(k)$  (4) expected if more bit resolution is employed. The expected if  $\sum_{d}^{M}R_{d}^{*}(k)C_{d}(k)$ BER floor of  $3.7 \times 10^{-8}$ . Further improvement can be overall data  $\bar{r}$  ate can potentially be increased with polarization-division multiplexing and/or higher level modulation format such as 16-QAM.

#### **Conclusions**

 $=\frac{1}{N}\sum R_{d}^{*}(k)C_{d}(k)$  (5) in a 11 CA and demonstrated at 2.50-samples, it is  $\sum R_{d}^{*}(k)C_{d}(k)$ We have discussed in detail architectures needed for the real-time implementation of a CO-OFDM receiver. The key architectures are successfully implemented in a FPGA and demonstrated at 2.5Gsamples/s to high-speed signal processing of optically transmitted OFDM data are discussed. It is expected that with future advances in high-speed electronic circuits, realtime OFDM supporting 40-Gb/s and 100-Gb/s data rates is technologically feasible.

#### **References**

- 1 D.-S. Ly-Gragnon et al., J. Lightwave Technol., **1**, pp.12-21 (2006).
- 2 R. Noe, J. Lightwave Technol., **2**, pp.802-808 (2006).
- 3 W. Shieh et al., Electron. Lett. 4**2**, pp.587-589 (2006).
- 4 A. J. Lowery and J. Armstrong, Optics Express, 14, pp.2079-2084 (2006).
- 5 S. L. Jansen et al., J. Lightwave Technol., **1**, pp.6- 15 (2008).
- 6 S. Savory, Optics Express, 16, pp. 804-817 (2008).
- 7 N. Kaneda and A. Leven, Photon. Technol. Lett., 4, pp. 203-205 (2009).
- 8 F. Buchali et al., Bell Labs Tech. Jour., 14, pp. 125- 146 (2009).
- 9 Wireless LAN medium access control (MAC) and physical layer specifications (PHY), IEEE Standard 802.11a, (1999).
- 10 Q. Yang et al., Proc. OFC/NFOEC'09, PDPC5 (2009).
- 11 T.M. Schmidl and D. C. Cox, IEEE Trans. Commun., 45, pp. 16113-1621 (1997).
- 12 W Shieh et al., J. Optical Networking, 7, pp. 234- 255 (2008).