# A Triple-Concatenated FEC using Soft-Decision Decoding for 100 Gb/s Optical Transmission

# Yoshikuni Miyata, Kenya Sugihara, Wataru Matsumoto, Kiyoshi Onohara, Takashi Sugihara, Kazuo Kubo, Hideo Yoshida, and Takashi Mizuochi

Information Technology R&D Center, Mitsubishi Electric Corporation, 5-1-1, Ofuna, Kamakura, Japan Miyata. Yoshikuni@ak.MitsubishiElectric.co.jp

**Abstract:** We propose a novel triple-concatenated forward error correction for 100Gb/s transmission. Simulation shows that a net coding gain of 10.8dB is obtained by a soft-decision LPDC code concatenated with the enhanced FEC listed in G.975.1.

©2010 Optical Society of America

OCIS codes: (060.4510) Optical communications

#### 1. Introduction

Intensive studies on digital coherent transceivers are now being conducted, following the emergence of dualpolarization quadrature phase-shift keyed (DP-QPSK) transmission for 100 Gb/s transport systems [1]. Even though 100 Gb/s DP-QPSK shows disruptively good performance, it does respectively require 1.3 dB and 2.7 dB higher optical signal-to-noise ratio (OSNR) compared to 40 Gb/s Differential Quadrature Phase Shift Keying (DQPSK) and 40 Gb/s Differential Phase Shift Keying (DPSK). This has increasingly revitalized research interest in powerful but nevertheless practical forward error correction (FEC) for the improvement of OSNR tolerance in 100 Gb/s digital coherent systems.

Current FEC LSIs for 10 to 40 Gb/s optical systems usually use a  $2^{nd}$  generation type approach, *i.e.* concatenated codes with iterative decoding. Several types of  $2^{nd}$  generation FEC are listed in ITU-T G.975.1, *e.g.* concatenated BCH(3860,3824) + BCH(2040,1930) codes having a net coding gain (NCG) of 8.99 dB at a post-FEC BER of  $10^{-15}$ , and concatenated RS(1023,1007) + BCH(2047,1952) codes having an NCG of 8.67 dB. Interleaving and iterative decoding techniques are used together with the concatenation to obtain improved error correction performance even with only 7% redundancy.

All of these 2<sup>nd</sup> generation FECs are based on hard decision decoding. It is becoming apparent that 3<sup>rd</sup> generation soft decision decoding is the most promising way to drastically improve the error correction capability. We demonstrated the first fully integrated 3<sup>rd</sup> generation FEC LSI operating at 12.4 Gb/s using a block turbo code with 3-bit soft decision decoding in 2006 [2]. Recently, we reported a 32 Gb/s demonstration in real time of the concatenation of a soft decision based low-density parity-check (LDPC) code with a Reed-Solomon (RS) code in a high speed FPGA prototype aimed at showing the feasibility of 3<sup>rd</sup> generation FEC for 100 Gb/s class optical communications [3]. It was shown that an NCG of 9.9 dB at a post-FEC BER of 10<sup>-15</sup> can be obtained with 2-bit soft decision and 16x iterative decoding. An RS(992,956) outer code is effective in cleaning up the unwanted error floor generated by the LDPC(9216,7936) inner code. This is currently one of the fastest and strongest soft decision FEC experiments known, but yet further improvement would be desired for 100 Gb/s digital coherent transceivers.

In this paper, we propose the idea of *triple-concatenated FEC*, which has the potential to realize an NCG of 10.8 dB with 20.5% redundancy. A very strong soft decision LDPC code is used as the inner code, and a hard decision based concatenated enhanced FEC (EFEC) code from G.975.1 follows as the outer code. This new FEC has the possibility to realize a practical, 1.3 dB  $\sim$  2.7 dB stronger FEC than G. 975.1 for the implementation of a DP-QPSK based 100 Gb/s coherent DSP (digital signal processing) LSI.

## 2. Proposal for Triple Concatenated FEC

In order to satisfy both higher FEC error performance and having no error flooring, either a longer codeword length or higher parity check redundancy is needed. This incurs unacceptable circuit complexity and latency.

Fig. 1 shows the block diagram of the proposed triple concatenated FEC for 100GbE transport systems employing a digital coherent transceiver. A soft decision FEC (code C) is embedded in a coherent DSP. This FEC yields extremely high error correction performance in the worst pre-FEC BER region, *i.e.*  $10^{-2}$ ,



Fig.1 Block diagram of the triple-concatenated FEC

in return for an unwanted error floor. The residual errors are completely eliminated by hard decision based code B concatenated with code A embedded in an OTU4 framer. The 7% EFECs recommended in ITU-T G.975.1 Appendix are suitable candidates for codes A and B.

In our previous report, we showed that it was very difficult to get good error-correction performance without error-flooring by using one LDPC code only [4]. However, if we accept the error-floor of an LDPC code, we can get better error-correction performance in the worst post-FEC BER region, *e.g.*  $10^{-2}$ , without using a very long codeword of several ten-thousands or a high redundancy ratio of >30%. In combination with powerful hard-decision based concatenated codes B and A as outer codes, we can clean up any unwanted error-floor. Extremely good waterfall performance of LDPC in the low BER region can be exploited by concatenating two codes efficiently. We call this architecture a triple concatenated FEC. We propose a novel LDPC code as code C, which is embedded in a coherent DSP. An EFEC from G.975.1 can be used for codes A and B, which are embedded in an OTU4 framer. By sacrificing error floor, the circuit complexity of the LDPC code can be held extremely low. The interconnection between the OTU4 framer and the coherent DSP becomes small.

#### 3. Error-Correcting Code and Algorithms for Strong Soft Decision FEC

We describe in detail the LDPC code for inner code C. An irregular Quasi-Cyclic (QC)-LDPC code having codeword and information lengths of 4608 and 4080 respectively was designed. The FEC redundancy is 12.94%. Since the codeword length is only 4608, and the redundancy is very low, the girth of this code is only 6. Such a small girth brings an unwanted error floor, but small circuit size can be expected.

It is known that the best performance for LDPC codes can be obtained by the *Shuffled Belief Propagation (BP)* algorithm [5]. However, the shuffled BP algorithm needs an enormous circuit. To reduce the complexity, the *Offset BP-based* algorithm was proposed [6], but its error-correction performance is not so good. So, we designed a novel decoding algorithm: the *variable offset BP-based algorithm*, so as to minimize the circuit complexity without sacrificing error correction performance. This algorithm originated from the offset BP-based algorithm.

In the case of the shuffled BP algorithm, we have to calculate the complex mathematical function  $a \oplus b = 2 \tanh^{-1} [\tanh(a/2) \cdot \tanh(b/2)]$ recursively. We expand this mathematical function to  $a \oplus b = \min(a,b) + g(a,b)$ , where,  $g(a,b) = \ln(1 + \exp(-(a+b))) - \ln(1 + \exp(-|a-b|))$ . The mathematical function and recursive calculation are approximated by a minimum function and subtraction using offset factor  $\beta$  as follows:  $a \oplus b \approx \min(a, b) - \beta$ . This approximation is rough, so we modified the calculation using a variable offset factor. If  $\min(a, b)$  is small, the value of the function g() is almost equal to zero, and if  $\min(a, b)$  is bigger than existing threshold  $\gamma$ , it is almost the same as constant  $\beta'$ . Our approximation is as follows:  $a \oplus b \approx \min(a,b) - \beta'$ , if  $\min(a,b) \ge \gamma$ , or  $a \oplus b \approx \min(a,b)$ , if  $\min(a,b) < \gamma$ . The values of the variable offset factor and threshold are different for each LDPC code, so these values are adjusted by simulation or theoretical calculation (density evolution).

We carried out a simulation to evaluate the error correction performance, assuming additive white Gaussian noise (AWGN) for the communication channel. The test code was the irregular QC-LDPC(4608, 4080) with 12.94% redundancy. The number of iterations was set to only 16 (including initialization), which is sufficiently small to hold down the latency. We compared the proposed variable offset BP-based algorithm against the shuffled BP algorithm as the ideal case. Fig. 2 shows the simulated error-corrected BER vs. input Q factor for the LDPC code under test.

The best performance is exhibited by the shuffled BP algorithm which is simulated under the conditions of maximum one hundred iterations and of infinite quantization bits, with which an input Q of 6.4 dB can be corrected to  $10^{-5}$  BER. The Q limit of the offset BP-based algorithm is inferior to the shuffled BP algorithm, the difference being about 0.4 dB. The performance of the variable offset BP-based algorithm is better than the offset BP-based



| TT 1 1 1  | NT 1     | C    |           |          |       |
|-----------|----------|------|-----------|----------|-------|
| I ahle I  | Number   | of o | nerations | ner iter | ation |
| 1 4010 1. | Trunnoer | 010  | perations | per ner  | auon  |

| Operation     | Shuffled BP                                        | Cyclic approx. δ-min.  | Offset BP    | Variable offset BP |  |  |
|---------------|----------------------------------------------------|------------------------|--------------|--------------------|--|--|
| Add           | 435,142                                            | 55,294                 | 40,320       | 55,296             |  |  |
| Compare       | -                                                  | 29,952                 | 14,976       | 14,976             |  |  |
| 1 bit EXOR    | 409,798                                            | 29,952                 | 29,952       | 29,952             |  |  |
| Table look-up | 424,774                                            | -                      | -            | -                  |  |  |
| _             | Table 2. Required number of bits of memory (Kbits) |                        |              |                    |  |  |
| -             | Shuffled BI                                        | P Cyclic approx. δ-mir | n. Offset BP | Variable offset BP |  |  |
| -             | 209                                                | 50                     | 42           | 42                 |  |  |
| -             |                                                    |                        |              |                    |  |  |

algorithm, the difference being about 0.1 dB.

We estimated the required number of operations per iteration, as between the shuffled BP algorithm, the cyclic approx.  $\delta$ -min. algorithm, and the offset BP-based algorithm. The weighted parameters of the parity check matrix were assumed to be 3.25, 4, 28.37, and 31 for the average column, maximum column, average row, and maximum row, respectively.

During the decoding procedure, each algorithm involves the following operations: add, compare, EXOR, and table look-up. Table 1 shows the numbers of operations per iteration. The required number of operations for the variable offset BP-based algorithm is about 10 times smaller than that for the shuffled BP algorithm. The add operations are increased relative to the offset BP-based algorithm, but the difference is small. The required number of bits of memory is also crucial in implementing a decoder LSI. Four algorithms were compared based on the same circuit parameters as above. The required memory for the variable offset BP-based algorithm is about 5 times less than that for the shuffled BP algorithm.

#### 4. Error Correction Performance and Circuit Implementation

The error correction performance of the LDPC(4608, 4080) alone and the concatenated LDPC(4608, 4080) + EFEC were evaluated by Monte Carlo simulation. As the EFEC, BCH(3860,3824) + BCH(2040,1930) or RS(1023,1007) + BCH(2047,1952), both listed in G.975.1, were selected.

Fig. 3 shows the simulated pre-FEC Q vs. post-FEC BER. The number of soft decision bits and the number of iterations of the LDPC code were set to 3 and 16 respectively, so as to keep the circuit complexity low. In the case of LDPC code alone, an unwanted error-floor appears clearly at a post-FEC BER of around  $10^{-5}$ . On the contrary, we see no error floor when the LDPC code is concatenated with the EFEC, at least down to a post-FEC BER of  $10^{-10}$ . The frame error ratio (FER) of LDPC code only in the water-fall region is about  $10^{-1}$ , but the frequency of the remaining under ten error bits in one codeword is about 60% and that of remaining over one hundred error bits is zero, so all the residual errors can be cleaned up. If we select another LDPC code which performs better in the water-fall region, the number of remaining error bits will increase, and will not be able



to be corrected by the EFEC. Consequently, we expect that the proposed concatenated codes can achieve a Q-limit of 6.4 dB and an NCG of 10.8 dB at a post-FEC BER of  $10^{-15}$ , which is 4.6 dB better than the standard RS(255,239).

Fig 4 shows the frame structure of the proposed triple concatenated FEC for OTU4V, which consists of the overhead, the payload for 100GbE, and parity check areas for codes A, B and C. The overhead + payload, the parity of the concatenated A +B codes, and the parity of code C are 3824, 4080, and 4608 bytes, respectively. The frame structure of the concatenated codes A and B is the same as OTU4, having 7% redundancy. Only the soft decision FEC code C is added as an inner code. If we implement the proposed FEC in a 40-45 nm class CMOS technology LSI with 512-parallel processing and a 250 MHz clock, the frame cycle is estimated to be 1.2 µsec. The latency for decoding LDPC codes with EFEC is estimated to be less than 35 µsec even for 16 iterations.

### 5. Conclusions

We proposed a novel triple-concatenated forward error correction for 100 Gb/s digital coherent systems. Simulation showed that an NCG of 10.8 dB was obtained by a soft decision LPDC code concatenated with EFEC codes from G.975.1. It is anticipated that the proposed FEC codes will be implemented in 100 Gb/s coherent DSP LSI in the near future.

This work was in part supported by the project of "Digital Coherent Optical Transceiver Technologies" of the Ministry of Internal Affairs and Communications (MIC) of Japan.

#### 6. References

- [1] K. Roberts, et al., IEEE J. Lightwave Technol., 27, 16, 3546-3559 (2009)
- [2] K. Ouchi, et al., OFC/NFOEC2006, Anaheim, CA, paper OTuK4, (2006)
- [3] T. Mizuochi et al., IEEE Photon. Technol. Lett., 21, 18, 1302-1304 (2009).
- [4] Y. Miyata et al., OFC/NFOEC 2008, San Diego, CA, OTuE4 (2008).
- [5] J. Zhang et al., IEEE Trans. Commun., 53, 2, 209-213 (2005).
- [6] J. Chen et al., IEEE Trans. Commun., 53, 8, 1288-1299 (2005).