# A 54 Mhz BiCMOS Digital Equalizer for Magnetic Disk Drives

K. Fisher†‡, P. Bednarz†, J. Kouloheris†, B. Fowler†, J. Cioffi†, A. El Gamal† †Information Systems Laboratory, Stanford University, Stanford, CA 94305 ‡Philips Research, 5600 JA Eindhoven, The Netherlands

#### Abstract

Until recently, only analog bit-detectors have been available for high-speed magnetic-disk recording. Because of high data-rates and strict power consumption limits, digital approaches have been deemed unrealistic. With emerging technologies such as BiCMOS, digital techniques can now more easily be utilized. This paper reports the first known fully-digital decision feedback equalizer (DFE) circuit applied to high-speed magnetic recording. Techniques such as DFE can potentially improve the reliability of detected bits, effectively making higher disk-drive storage capacities possible. The experimental RAM-DFE IC takes into account many aspects of the recording channel and digital implementation to achieve high performance and low complexity.

#### Introduction

A 54 MHz fully-digital adaptive decision feedback equalizer (DFE) circuit is fabricated in a  $0.8\mu\mathrm{m}$ , BiCMOS sea-ofgates technology (Figures 1, 2, and 8-11) [1] with a  $72\mathrm{mm}^2$  die. When used in conjunction with an analog interface (Figure 4), the circuit detects the stored digital data from a disk-drive read-signal. The filtering performed by the DFE permits increases in storage density (or equivalently decreases in error-rate), without any modifications to the disk-drive assembly.

The IC includes gain control, timing recovery, sector synchronization, an on-chip training sequence, and controlledtest features. The decision-aided timing recovery [2] operates at the bit-rate, making oversampling of the input signal unnecessary (54 Mbit/s equalizer throughput). The circuit executes 3.1 giga-operations-per-second (equivalent fixed-point additions) when operating at the nominal clock rate.

#### Architecture and Features

The DFE datapath (Figure 3) uses nonlinear (RAM-based) feedback and is consequently called the RAM-DFE[3] <sup>1</sup>.

The IC implements

$$y_k = \sum_{l=0}^{4} w_l x_{k-l} + r(\hat{\mathbf{a}}_{k-1}) ,$$
 (1)

$$\hat{a}_k = \operatorname{sgn}(y_k) \tag{2}$$

where  $w_l$  are feedforward filter coefficients,  $x_k$  are data inputs,  $\hat{a}_k$  is the current equalizer decision,  $\hat{\mathbf{a}}_{k-1}$  is the address specifying the vector of past equalizer decisions and  $r(\hat{\mathbf{a}}_{k-1})$  is the corresponding look-up table output.

The fixed-point RAM-DFE uses a 7-bit data input, $x_k$ , and produces a 6-bit timing-error output, and a 4-bit gain-error output. All RAM-DFE internal variables are stored as 10-bit values, including the 5-tap finite impulse response (FIR) filter,  $\boldsymbol{w}$ , and the 128 location feedback RAM.

Each coefficient in the FIR filter is adapted on each clock cycle using the signed-LMS algorithm. A dual-ported RAM is implemented to also allow a RAM coefficient adaptive update on every clock cycle. An equalizer error signal,

$$e_k = \hat{a}_k - y_k \,, \tag{3}$$

is used to drive the adaptive update operations. The gains of all adaptive loops are controlled by user programmable step-sizes.

The decision-aided timing recovery of the RAM-DFE circuit uses the FIR filter output and the equalizer error signal to determine the correct sampling instant for the front-end A/D converter. Different loop-gain parameters are used during timing acquisition and steady-state operation to allow both rapid convergence and accurate tracking of timing variation.

Because of its intended experimental uses, the integrated circuit is fully programmable. All internal equalizer parameters, including equalizer coefficients, adaptation stepsizes, and training sequence, may be read and written both before and after equalization of read-signals with a TTL bidirectional bus. In addition, an ECL bus can output (in real-time) either the feedforward filter output signal, the decision element input signal, or the equalizer error signal for monitoring equalizer performance during active operation

<sup>&</sup>lt;sup>1</sup>patent pending.

## Hardware Design

To reduce delay in the RAM-DFE feedback datapath, the IC takes advantage of the fact that all but one RAM address bit is available early. The RAM is divided into two halves and two RAM words are output at the start of each clock cycle. The most recent decision, which arrives late in the clock cycle, is used to control a multiplexor and select the correct RAM word (Figure 3).

To eliminate arithmetic bias and overflow in the RAM-DFE, rounding and saturation are implemented on all numerical operations. Use of a 3-input saturating adder proved indispensable in implementation (Figure 6). For example the filter output,  $f_k$ , is represented in carry-save (dual-bit) form; the decision-element input,  $y_k$ , is then computed using the 3-input saturating adder.

In the RAM-DFE design, care was taken to reduce circuit size with only small reductions in equalized SNR. The FIR filter coefficient update equations were simplified from Equation 4 to 5 (Figure 7)

$$\mathbf{w}_{k+1,old} = \beta_{frac} \mathbf{w}_k + \mu_f e_k \mathbf{x}_k , \qquad (4)$$

$$\mathbf{w}_{k+1,old} = \beta_{frac} \mathbf{w}_k + \mu_f e_k \mathbf{x}_k ,$$

$$\mathbf{w}_{k+1,new} = (1 - \beta_{bin} 2^{-8}) \mathbf{w}_k + 2^{-n} e_k \operatorname{sgn}(\mathbf{x}_k),$$
(5)

with positive integer, n, tap leakage factor,  $\beta_{frac}$ , and the simplified binary version,  $\beta_{bin}$ . This 6-fold reduction in circuit area results in only a .2 dB loss in equalizer SNR.

For the RAM-DFE decision-aided timing recovery[2], adaptive performance is often limited by latency. Because the timing error-signal is based on the FIR filter output, latency in the FIR filter strongly affects adaptive performance of the bit-detector. Therefore, it was crucial to implement the FIR filter without pipelining. Consequently, the FIR filter became the critical-delay path in the RAM-DFE design, and considerable effort was spent reducing this delay.

#### FIR Filter Design

A filter generation tool [4] [1] is used to implement the fully-parallel FIR filter, without any pipelining. When provided with process-dependent delay parameters (circuit and wiring propagation delays) and six user parameters (such as number of taps and numerical precision of the filter inputs and outputs), the FIR tool automatically generates a speed-optimized netlist and layout that will implement the filter.

An FIR filter consists of a sum of products,

$$f_k = \sum_{l=0}^{L-1} w_l x_{k-l} , \qquad (6)$$

where each  $w_l$  and  $x_k$  are fixed-point values and  $f_k$  can therefore be computed with a netlist of many one-bit fulladders. Note that in an adaptive filter, each  $w_l$  can change on every clock cycle and partial-results cannot be precomputed.

The filter layout can consist of rows of one-bit fulladders, with a variable-sized wiring channels between adjacent rows. Each row corresponds to a bit-weight in the filter sum-of-products output (row i has weight  $2^{i}$ ).

The tool uses an iterative algorithm that generates a functionally correct and speed-optimized filter netlist, given the above layout of adders. Such a tool is necessary because different signal paths through an adder will normally have different (process-dependent) delays, e.g. carry outputs are available before sum outputs. Therefore, a height-balanced tree of full-adders will potentially exacerbate critical path delays because the slowest delay path will determine the overall filter performance. Instead, we prefer a "delay-balanced" tree of full-adders.

The FIR tool systematically attempts to balance the delays at each adder output, taking into account cumulative wiring and adder propagation delays of all input nets. This balancing technique can result in increased average wirelength inside the FIR filter, but it is ideally suited to BiC-MOS circuits that are relatively insensitive to loading capacitance. The RAM-DFE filter delay was reduced by at least 20% by using a tool-generated design, instead of a simpler height-balanced tree. Details are available in [1].

## Circuit Technology

The RAM-DFE IC is a fully-static system, using a singlephase clock. All circuits are designed using a BiNMOS sea-of-gates array. BiNMOS was first reported in [5], and the statistics for the basic cell are given in Figure 9. The BICMOS process is described in Figures 10 and 11 ([6]).

Figure 5 shows one bit of the BiCMOS dual-ported RAM that was utilized in the RAM-DFE design. The singleended output drives a precharged (to gnd) bus. Each bit of RAM can be implemented in one BiNMOS basic cell.

### Fabrication and Testing

A first fabrication of the RAM-DFE IC (provided by Philips Components/Signetics) has already occurred (Figure 2). Wafer testing revealed one minor error in the RAM interface, and a refabrication is in process. A test-chip using the same gate-array transistor layout has already been fabricated and tested, at speed. The test-chip was found to be fully-functional, therefore we expect the second fabrication of the RAM-DFE to produce working silicon.

#### Acknowledgements

Thanks to Pat O'hearn of Philips/Signetics for donation of fabrication, thanks to Dr. V. Akylas and M. El Diwany of Signetics for help in layout and fabrication of the chip, to W. Abbott for timing recovery and synchronization algorithms, to J. Burr for assistance with CAD tools, and to Mentor Graphics for use of the GDT IC Design tools.



Figure 1: Die photograph of the RAM-DFE Integrated Circuit, fabricated in a  $0.8\mu m$  BiCMOS process.



Figure 2: RAM-DFE Integrated Circuit: location of major functional blocks is shown.

## References

- K.D. Fisher. "A Digital Equalizer for High-Speed Magnetic Recording". PhD Thesis, Stanford University, August 1991.
- [2] W.L. Abbott, P.S. Bednarz, K.D. Fisher, and J.M. Cioffi. "A High-Speed Adaptive Equalizer for Magnetic Disk Drives". In *IEEE ICASSP'90*, Albuquerque, NM, April 1990.
- [3] K.Fisher, J.Cioffi, P.Bednarz, W.Abbott, and C.M.Melas. "A RAM-DFE for Storage Channels". *IEEE Transactions on Communications*, COM(11), November 1991.
- [4] A. El Gamal. "A CMOS 32b Wallace Tree Multiplier Accumulator". In *IEEE International Solid State Circuits Conference*, pages 194-195, February 1986.
- [5] A. El Gamal, J. Kouloheris, D. How, and M. Morf. "BiN-MOS: A Basic Cell for BiCMOS Sea-of-Gates". In IEEE Custom Integrated Circuits Conference, San Diego, CA, May 1989.
- [6] M. El-Diwany and et.al. "Low Voltage Performance of an Advanced CMOS/BiCMOS Technology". In IEDM Tech. Digest, 1990.



Figure 3: RAM-DFE Equalizer datapath, showing technique used to reduce delay in the feedback path. Each RAM word is twice as wide as an equalizer feedback coefficient, effectively creating two RAM sections,  $RAM_0$  and  $RAM_1$ . A multiplexer is used to select between the two sections.



Figure 4: RAM-DFE system, including companion analog components.



Figure 5: BiCMOS ram cell schematic.



Figure 6: Schematic for a 3-input carry-select saturating adder.



Figure 7: RAM-DFE coefficient update simplifications.

| Chip Area          | 72.2mm <sup>2</sup> |
|--------------------|---------------------|
| Array Area         | $37 \mathrm{mm}^2$  |
| ECL Inputs         | 9                   |
| ECL Outputs        | 21                  |
| TTL Inputs         | 8                   |
| TTL Bi-directional | 12                  |
| Clock Frequency    | 54 MHz              |

Figure 8: Integrated Circuit Statistics.

| Dimensions          | $28.8 \mu \text{m} \times 45 \mu \text{m}$ |
|---------------------|--------------------------------------------|
| Total Transistors   | 17.5                                       |
| NMOS transistors    | 9.5                                        |
| PMOS transistors    | 7                                          |
| Bipolar transistors | 1                                          |

Figure 9: BiCMOS Basic Cell Features. One of the N transistors is shared between two basic cells, explaining 0.5 transistors.

| Parameter                 | BiPolar           |
|---------------------------|-------------------|
| β                         | 100               |
| $V_{ceo}$                 | 4.5               |
| $f_T @ I_c = 500 \ \mu A$ | $18~\mathrm{GHz}$ |

Figure 10: BiPolar Process Parameters.

| Parameter                             | NMOS                              | PMOS                              |
|---------------------------------------|-----------------------------------|-----------------------------------|
| $L_{eff}$                             | $.55~\mu\mathrm{m}$               | $.43~\mu\mathrm{m}$               |
| $V_T$ (linear)                        | .77 V                             | .70 V                             |
| $I_d @ V_{ds} = V_{gs} = 5 \text{ V}$ | $420~\mu\mathrm{A}/\mu\mathrm{m}$ | $230~\mu\mathrm{A}/\mu\mathrm{m}$ |

Figure 11: MOS Process Parameters.