# Design and Comparison of Three 20-Gb/s Backplane Transceivers for Duobinary, PAM4, and NRZ Data

Jri Lee, Member, IEEE, Ming-Shuan Chen, and Huai-De Wang

Abstract—A full study of three data formats including duobinary, PAM4, and NRZ is proposed to estimate the performance of the corresponding transceivers under different conditions. Transceiver prototypes designed and optimized for the three signalings are presented to evaluate their performance as well as the feasibility. The three transceivers have been tested thoroughly in Rogers and FR4 boards. Fabricated in 90-nm CMOS technology, all three transceivers achieve error-free operation with 20-Gb/s  $2^{31}$ —1 PRBS data over 40-cm Rogers and 10-cm FR4 channels. General comparison reveals that the NRZ data still achieves the best performance at 20 Gb/s.

*Index Terms*—Duobinary, pulse-amplitude modulation (PAM4), non-return-to-zero (NRZ), bit error rate (BER), backplane transceiver.

## I. INTRODUCTION

• HE pursuit of higher data rate in wireline communications has been demonstrated in the past and will continue in the future. Recent research on high-speed (> 10 Gb/s), short-range (< 100 m) serial links over electrical backplanes or optical fibers have revealed the design trends for next generation, e.g., the chip-to-chip and board-to-board communication are moving toward 20 Gb/s, and 100-Gb/s Ethernet is also on the way [1]. Fig. 1 shows the simulated power dissipation as a function of bandwidth of a typical differential pair in 90-nm CMOS with inductive peaking and fanout-of-4 loading. With the device sizes labeled in the inset, the interconnect is also taken into consideration by extracting the parasitic capacitance from layout. Drawing a best-fit curve, we conclude that a good power efficiency can be maintained up to 15 GHz. That is, the on-chip design margin for 20-Gb/s data is reasonably adequate. However, contemporary backplane materials and connectors fail to provide sufficient bandwidth for such high-speed data transmissions, encouraging research on signal processing and/or data coding to overcome the poor channel properties. The original idea is based on the fact that modifying the chips is always easier and cheaper than altering the board itself. Over the years, engineers have been dealing with different data formats that can satisfy bandwidth requirement with acceptable complexity. Among the existing solutions, non-return-to-zero (NRZ), duobinary, and 4-level pulse-amplitude modulation (PAM4) are most commonly used in various applications. The NRZ transceiver

Digital Object Identifier 10.1109/JSSC.2008.2001934



Fig. 1. Power efficiency of high-speed buffer in 90-nm CMOS technology.

can be realized in a relatively simple way, providing another advantage in high-speed I/O links when the power budget is limited. As the data rate increases, the ubiquitous NRZ data would gradually hit the bandwidth limit, and the duobinary and PAM4 signals are considered as substitutes due to the efficient utilization of bandwidth. As can be shown in the next section, the spectra of duobinary and PAM4 are exactly half as wide as that of the NRZ data, making these formats potentially favorable in high-speed links. Generally speaking, the duobinary signaling is further superior to PAM4 because it makes use of the intrinsic roll-off bandwidth of the channel as part of the desired transfer function, requiring even less boost for the equalizers and alleviating the stringent requirement at high frequencies.

In this paper, we design and analyze three different transceiver topologies for the duobinary, PAM4, and NRZ signals. Operating at 20 Gb/s, all of the three transceivers are optimized to achieve the best performance with reasonable power consumption. Both Rogers and FR4 boards with different channel lengths are tested thoroughly to characterize the behavior of the transceivers. A careful comparison among the different data formats is conducted and verified by the experimental results.

This paper is organized as follows. Section II reviews the fundamental operation of duobinary signal and its implementation issues. The design details of the duobinary, PAM4, and NRZ transceivers are described in Sections III, IV, and V, respectively. Section VI summarizes the measurement results, and Section VII draws a conclusion.

# II. DUOBINARY SIGNALING

Having been used in optical communications and recently moving into electrical systems [2]–[4], duobinary modulation can achieve a data rate theoretically twice as much as the channel bandwidth. Intersymbol interference (ISI) is introduced in a controlled manner such that it can be cancelled

Manuscript received December 22, 2007; revised March 27, 2008. Current version published September 10, 2008.

The authors are with the Electrical Engineering Department, National Taiwan University, Taipei, Taiwan, R.O.C. (e-mail: jrilee@cc.ee.ntu.edu.tw).



Fig. 2. (a) Linear model of duobinary signaling. (b) Composition of duobinary spectrum.



Fig. 3. Output spectra and waveforms for different data formats passing through an ideal filter. (a) NRZ. (b) Duobinary. (Data rate = 20 Gb/s.)

out to recover the original signal. Unlike PAM4 and NRZ signals, duobinary signals incorporate the channel loss as part of the overall response [5], substantially reducing the required boost and relaxing the equalizer design. A duobinary signal is originally defined as the sum of the present bit and the previous one of a binary sequence [6]:

$$w[n] = x[n] + x[n-1].$$
 (1)

It correlates two adjacent bits to introduce the desired ISI. Considering the equivalent linear model as shown in Fig. 2, we have the transfer function  $H_1(f)$  as

$$H_1(f) = \frac{W(f)}{X(f)} = \frac{1}{2} [1 + \exp(-j2\pi f T_b)]$$
(2)

where  $T_b$  denotes the bit period, and the attenuating factor 1/2 is used to equalize the total power of x(t) and w(t). It can be also shown that the duobinary spectrum  $S_W(f)$  is given by

$$S_W(f) = \cos^2(\pi f T_b) \cdot T_b \left[ \frac{\sin(\pi f T_b)}{\pi f T_b} \right]^2$$
(3)  
=  $T_b \left[ \frac{\sin(2\pi f T_b)}{2\pi f T_b} \right]^2.$  (4)

As shown in Fig. 2(b),  $S_W(f)$  is still a sinc function but with only half the bandwidth as compared with  $S_X(f)$ . In other words, the duobinary coding "squeezes" the spectrum toward the dc line, and reduces the required channel bandwidth by 50%. Note that almost 90% of the signal power stays in the main lobe of a sinc function. To further clarify the analysis, we apply the NRZ and duobinary data through a brickwall filter cutting off at half data rate  $1/(2T_b)$ . As can be shown in Fig. 3, the received NRZ data suffers from 81.8% ISI and 0.8-UI jitter, whereas the duobinary is almost unaffected. It is because the former loses 51.4% of the power but the latter loses only 10%.

It is worth noting that although the PAM4 signal possesses the same spectral efficiency as the duobinary does, the latter can further take advantage of the channel response as part of the transfer function. Fig. 4(a) illustrates the operation of duobinary signaling, where the transmit preemphasis and receive equalizer cooperate to reshape the low-pass response of the channel so that the overall transfer function approximates the first lobe of  $H_1(f)$ . In other words, a duobinary transceiver "absorbs" significant amount of channel loss and makes it useful in the overall response, allowing more relaxed preemphasis and equalizer design. Fig. 4(b) shows the simulated results for the required boost



Fig. 4. (a) Concept of duobinary signal formation. (b) Required boost at Nyquist frequency. (Data rate P = 20 Gb/s.)



Fig. 5. Complete transceiver design and timing diagram of important nodes.

at Nyquist frequencies for duobinary, PAM4, and NRZ codes.<sup>1</sup> It can be shown that with a data rate of 20 Gb/s, the required equalization for duobinary is lower than that for PAM4 and NRZ by 2.8 and 8.9 dB in a 20-cm FR4 channel, and by 4.0 and 6.8 dB in a 40-cm Rogers channel, respectively. The simulation is conducted in SpectreRF as follows. First, we measure the s-parameter of the backplane traces with different lengths and deliver a pulse into the channels. Next, we convert the coefficients<sup>2</sup> into a transfer function, obtaining the corresponding boost in different conditions. Note that the duobinary transmitter may need a suppression (rather than a lift) in the vicinity of  $1/(3T_b)$  for proper spectrum shaping.

In reality, a precoder  $H_2(z) = 1/(1 + z^{-1})$  must be implemented in the transmit side. Here, we follow the design of [7], and the complete duobinary transceiver is shown in Fig. 5. The reshaped duobinary data gets decoded by an LSB distiller that takes the LSB as the output, recovering the binary NRZ data



Fig. 6. Conceptual illustration of duobinary transceiver.

as y[n]. The waveforms of important nodes are also depicted in Fig. 5.

# **III. DUOBINARY TRANSCEIVER**

The proposed duobinary transceiver is illustrated in Fig. 6. This prototype conceptually resembles the structure in Fig. 5 but employs no equalizer in the receiver for simplicity. The transmitter consists of a skew-tolerant precoder and 3-tap feedforward equalizer, and the receiver contains a self-adjusted three-levels (1.58-bits) ADC. We present the design details in this section.

<sup>&</sup>lt;sup>1</sup>The Nyquist frequency of duobinary, PAM4, and NRZ signals are  $1/(3T_b)$ ,  $1/(4T_b)$ , and  $1/(2T_b)$ , respectively. More details can be found in [4].

<sup>&</sup>lt;sup>2</sup>The calculation for duobinary signal is illustrate in Section III.



Fig. 7. Duobinary precoder design. (a) Conventional. (b) Proposed.

# A. Transmitter

Although it looks simple and feasible, the precoder in Fig. 5 is difficult to implement, primarily due to the stringent timing requirement in the feedback loop. Cascading active or passive devices to develop a precise delay of  $T_b$  in an open loop is not an option because of the high power, large area, and uncertain PVT variations. Using a clock-driven flipflop seems to be the only choice, but it suffers from severe phase requirement as well. This effect can be clearly explained by Fig. 7(a), where the XOR gate and the flipflop experience a delay of  $T_{XOR}$  and  $T_{D\rightarrow Q}$ , respectively. To make this precoder work properly, these two delays must comprise an exact bit period  $T_b$ :

$$T_{\rm XOR} + T_{D \to Q} = T_b. \tag{5}$$

That is, the input clock  $CK_{in}$  has very little margin for phase movement in order to produce a proper D-to-Q delay for the flipflop. Such a timing issue becomes aggravated at high speed and requires a complex control scheme.

To overcome the difficulties, we realize the procoder in an alternative way as illustrated in Fig. 7(b), [8]. The input data and clock pass through an AND gate, which is followed by a divided-by-2 circuit. The output thus toggles whenever a data ONE arrives, leading to the following operation:

$$y_1[n] = y_1[n-1] \oplus D_{\rm in}[n].$$
 (6)

This structure provides advantages over that in Fig. 7(a) in breaking the loop and allowing much more relaxed phase relationship between the input clock and data. The clock  $CK_{in}$  now reveals a margin as wide as  $180^{\circ}$  for skews, which is no longer a limiting factor in most designs. Note that the initial state of the divider has no influence on the final result;  $y_1[n]$  with opposite polarity still yields the same output after decoding. The popular feedforward equalizer also proves useful in duobinary systems. At 20 Gb/s, the number of taps becomes quite limited. Here, 4 taps are considered for the waveform reshaping. All the FIR equalizing methods and techniques that have been extensively used for NRZ data can be applied in duobinary, except that a single pulse ONE (preceded and followed by successive ZEROs) is expected to generate two consecutive bits of 1/2 at the far end. With a pulse response shown in Fig. 8(a),<sup>3</sup> the coefficients  $\alpha_k$  are readily available by solving the following equations:

$$\begin{bmatrix} x_0 & x_{-1} & x_{-2} & x_{-3} \\ x_1 & x_0 & x_{-1} & x_{-2} \\ x_2 & x_1 & x_0 & x_{-1} \\ x_3 & x_2 & x_1 & x_0 \end{bmatrix} \begin{bmatrix} \alpha_{-1} \\ \alpha_0 \\ \alpha_1 \\ \alpha_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 1/2 \\ 1/2 \\ 0 \end{bmatrix}.$$

Fig. 8(b) and (c) summarize the optimal coefficients at 20 Gb/s data rate as a function of channel length for Rogers and FR4 boards. It can be shown that  $\alpha_2$  is relatively small in both cases, urging us to omit it (and the corresponding flipflop) for an agile design. The complete transmitter design is depicted in Fig. 8(d), where all blocks are implemented in current-mode logic (CML) to increase the operation speed.

# B. Receiver

Suggested by Fig. 5, the duobinary receiver could be as simple as a quantizer with only the LSB taken out to convert the duobinary signal back to the NRZ data. It is equivalent to discriminating the middle level (logic ONE) from the two side levels (logic ZERO), as shown in Fig. 9(a). Here, a 3-level (1.58-bit) flash ADC is followed by an XOR gate to distill the LSB. However, this simple topology suffers from a number of drawbacks. The linearity and input common-mode level

<sup>&</sup>lt;sup>3</sup>The example pulse response shown in Fig. 8(a) is obtained from a 20-cm Rogers channel.



Fig. 8. (a) Typical pulse response for 20-Gb/s data. (b) Normalized FIR coefficients for Rogers. (c) Normalized coefficients for FR4. (d) Duobinary transmitter.



Fig. 9. (a) Conventional duobinary receiver. (b) Proposed receiver.

need precise reference voltage  $V_{\rm TH,H}$  and  $V_{\rm TH,L}$ , otherwise the signal integrity degrades. The pulsewidth of the output may also get distorted, resulting in significant jitter or ISI.

The proposed architecture alleviates the above difficulties by incorporating a reference-free comparator and a servo controller that dynamically optimizes the output data eye. As shown in Fig. 9(b), the comparator compares the input with two threshold levels virtually equivalent to  $V_{\rm TH,H}$  and  $V_{\rm TH,L}$ , generating two outputs  $V_{\rm out1}$  and  $V_{\rm out2}$ . Amplified to logic level by the subsequent hysteresis buffers [9],  $V_{\rm out1}$  and  $V_{\rm out2}$  are then XORed to produce the final output  $D_{\rm out}$ . The recovered data inevitably bears jitter, since (1) the threshold levels may drift due to mismatches and PVT variations; (2) the threshold-crossing points for the rising and falling would differ intrinsically. Here, the pulsewidth distortion associated with the first issue is corrected by means of a negative feedback loop, which contains a low-pass filter (LPF), and a V/I converter. With the assumption that the input data is purely random, the high loop gain forces the thresholds to stay at the optimal positions such that the waveform of  $D_{out}$  reaches an equal pulsewidth for ZEROs and ONEs. In contrast to the design in [4], this arrangement recovers the data without extracting the clock, providing a compact solution. If necessary, the remaining jitter due to the second issue can be further removed by placing a regular CDR circuit behind it. Note that for simplicity, no receive-side equalization is used in this prototype.

The comparator and V/I converter design is depicted in Fig. 10(a), where the input quad  $M_1-M_4$  along with the tail currents and loading resistor form two zero-crossing thresholds for  $V_{\text{out1}}$  and  $V_{\text{out2}}$ . Mirrored from the V/I converter, the two variable current  $\alpha I_A$  and  $(1 - \alpha)I_A$  create a threshold tuning range of 205 mV for  $\alpha = 0.1$ -0.9. Fig. 10(b) illustrates the



Fig. 10. (a) Comparator and V/I converter in duobinary Rx. (b)  $V_{\rm TH,H}$  and  $V_{\rm TH,L}$  as a function of  $\alpha.$ 



Fig. 11. PAM4 transmitter.

variation of threshold levels as a function of  $\alpha$ . The key point here is that the threshold adjustment is fully symmetric with respect to the input common-mode level. It not only eliminates reference offset issue but facilitates the pulsewidth equalization. The low-pass filter in Fig. 9(b) is realized as a simple *RC* network with a corner frequency of 20 kHz, implying a voltage drifting of less than 1.5 mV for 31 consecutive bits. A single-stage opamp is employed here, achieving 32-dB voltage gain, 85° phase margin, and 2.6-GHz unity-gain bandwidth with a power consumption of 1.2 mW. Reiterative simulation under severe PVT variations ensures the loop stability. Note that the performance could be affected by different kinds of mismatch, including imbalanced rising/falling times of the signal and comparator offsets. Monte Carlo simulation reveals that the threshold levels would deviate from the optimal positions by 12.5  $mV_{rms}$ , which corresponds to additional jitter of 0.5 ps. The device sizes here are properly chosen to minimize the deviation.

As compared with [4], this approach simplifies the circuit complexity especially the CDR design. The robust architecture indeed facilitates high-speed operation and saves power. More detail can be found in [10].

### IV. PAM4 TRANSCEIVER

#### A. Transmitter

Fig. 11 illustrates the PAM4 transmitter design. It incorporates a demultiplexer (DMUX) to deserialize the original input, two signal paths (MSB and LSB) to independently preemphasize the data, and two joint combiners to construct the PAM4 signal. Serving as a 3-tap feedforward equalizer, each signal path performs FIR equalization with identical coefficients  $\alpha_{-1}, \alpha_0$  and  $\alpha_1$  [11]. The two preemphasis results are combined together (with the MSB twice as large as the LSB) in current mode and converted to voltage output by means of the inductively-peaked terminations. The combiner design is depicted in Fig. 12(a), where the weighting factor tuning is realized by adjusting the tail currents. Due to the limited testing facilities, only a single-ended clock at 20 GHz is applicable for the transmitter. To drive the differential  $\div 2$  circuit, we employ a single-ended-to-differential (S/D) converter as depicted in Fig. 12(b). Here,  $M_1$  and  $R_1$  create a self-biased input level, that along with  $M_2$  form a local feedback to increase the gain and minimize the waveform distortion. Compared with typical topology such as that in [12], this structure achieves higher



Fig. 12. PAM4 Tx building blocks. (a) Combiner. (b) S/D converter.



Fig. 13. PAM4 receiver.

gain and lower magnitude distortion between the two output nodes. All the blocks are implemented as standard CML with the overall bandwidth and power consumption optimized.

# B. Receiver

The receiver design is shown in Fig. 13. Owing to the multilevel input, no buffer can be placed in the very front end unless it possesses high linearity over a wide range. Similar to a 2-bit flash ADC, prior arts such as [13] utilize three slicers with programmable offsets to discriminate the four levels. Here, we propose a single preamplifier that generates three thermometer codes simultaneously. These codes reach the full logic level ( $\approx 500 \text{ mV}$ ) by means of amplification (hysteresis buffers) and regeneration (flipflops). Subsequently, the PAM4 decoder translates the thermometer codes into binary codes  $b_1$  and  $b_0$ . In order to evaluate the signal integrity, we serialize them again by a 2-to-1 MUX to recover the 20-Gb/s data output. Although the final muxing is unnecessary in real design, it does facilitate the testing of this prototype.

The preamplifier is illustrated in Fig. 14(a), where the inductively-peaked termination ensures a broadband matching at the input. The switching quad  $M_1-M_4$ , loading resistor  $R_1$  and  $R_2$ , and tunable current source  $I_{\text{SSA}}$  and  $I_{\text{SSB}}$  produce three outputs  $V_{\text{out1}}-V_{\text{out3}}$  with three different threshold levels, and the upper and lower ones are symmetric with respect to the middle one, i.e., the input common-mode level. Note that the total current of  $I_{\rm SSA}$  and  $I_{\rm SSB}$  is kept constant so as to minimize the output common-mode variation. The hysteresis buffers [9] again amplify the outputs  $V_{\rm out1}$ - $V_{\rm out3}$  while cleaning up ambiguous transitions, and clear thermometer codes are presented to the decoder after the retime and regeneration of the flipflops. Fig. 14(b) reveals the decoder design, where complementary operation is imposed in the current-mode logics. With a supply of 1.8 V, it is possible to accommodate multiple stacks at 10 Gb/s with 250-mV overdrive for each stage. Note that  $M_1$ - $M_8$  need not maintain in saturation all the time, since the circuit functions properly as long as the current can be completely switched from one arm to the other. Auxiliary pair  $M_9$ - $M_{10}$  helps to speed up the operation with moderate gain boosting during transition.

# V. NRZ TRANSCEIVER

The NRZ transceiver is depicted in Fig. 15. As a vehicle for comparison, the transmitter is identical to the duobinary circuit in Fig. 8(d) with the precoder removed. In contrast to the multilevel signals such as duobinary and PAM4, the binary input here allows nonlinear amplification in the receiver front end to increase the signal-to-noise ratio (SNR). A transimpedance amplifier (TIA) is employed as the receiver front-end buffer, converting the signal current into voltage more efficiently. It achieves 15% larger bandwidth as compared with typical input



Fig. 14. PAM4 Rx building blocks. (a) Preamplifier. (b) Decoder.



Fig. 15. NRZ transceiver.

buffer made of a simple differential pair. All of the three transceivers are fully differential, and building blocks (e.g., flipflops) are reused as much as possible so as to make a fair comparison.

#### VI. EXPERIMENTAL RESULTS

All the transceivers have been fabricated in 90-nm CMOS technology and tested in chip-on-board assemblies. High-speed I/Os are co-designed with pads and routing traces to achieve  $50-\Omega$  termination precisely. Fig. 16(a) depicts the photos of the chips with their dimensions listed below. The testing setup is illustrated in Fig. 16(b), and the photo of a testing board (40-cm Rogers) is shown in Fig. 16(c). Three important points are specified to demonstrate the waveforms: position A (transmitter's output), B (far end), and C (receiver's output). Fig. 17 shows the measured frequency response of the channels and the corresponding pulse response at 20 Gb/s. The duobinary, PAM4, and NRZ transceivers consume 195 mW, 408 mW, and 126 mW from supplies of 1.5 V, 1.8 V, and 1.5 V, respectively.<sup>4</sup> Unless otherwise specified, the following measurements are obtained

<sup>4</sup>To achieve better performance, the PAM4 transceiver requires a higher supply because of the four levels.

with pseudo-random bit sequence (PRBS) of  $2^{31}-1$ . We discuss the measured results below.

*Duobinary:* Fig. 18 depicts the transmitter's output (position A) with minimum ( $\approx 0$  dB) and maximum ( $\approx 9.5$  dB) boost at 20 Gb/s and have them compared with simulations. The optimized duobinary waveforms at position B for different channels are shown in Fig. 19. The recovered data at the receiver's output (position C) with longest traces are shown in Fig. 20, suggesting jitters of 3.41 ps,rms/29.11 ps,pp (Rogers) and 4.34 ps,rms/24.22 ps,pp (FR4). Fig. 21 plots the BER as a function of channel length for different media.

*PAM4:* Fig. 22 shows the far-end (position B) waveforms for different channels, and Fig. 23 depicts the receivers' output (position C). Note that the finite clock skew in the receiver causes pulsewidth distortion on the output of the MUX, resulting in eye diagrams with dual transition traces as shown in Fig. 23. Since the MUX is used only for testing here, it will not be an issue in real design. The BER performance is summarized in Fig. 24.

*NRZ*: The same testing procedure has been applied to NRZ transceiver as well. The waveforms at positions B and C are plotted in Figs. 25 and 26, respectively. Again, Fig. 27 depicts the BER performance.



Fig. 16. (a) Chip micrographs and dimensions. (b) Testing setup. (c) Photo of the 40-cm testing board.



Fig. 17. (a) Measured  $\rm S_{21}$  for different channels, and (b) their pulse responses at 20 Gb/s.



Fig. 18. Boosting performance of duobinary Tx measured at position A. (a) Minimum (0 dB). (b) Maximum (9.5 dB). (Data rate = 20 Gb/s, vertical scale: 50 mV/div, horizontal scale: 10 ps/div.)



Fig. 19. Far-end (position B) waveforms with duobinary signals for (a) 15-cm Rogers, (b) 3-cm FR4, and (c) 10-cm FR4 channels. (Data rate = 20 Gb/s, vertical scale: 50 mV/div, horizontal scale: 10 ps/div.)



Fig. 20. Eye diagram of recovered data for duobinary transceiver. (Data rate = 20 Gb/s, vertical scale: 100 mV/div, horizontal scale: 10 ps/div.)



Fig. 21. BER measurements for duobinary transceiver in (a) Rogers, and (b) FR4 board.

Fig. 28 presents the spectra of different data formats at position B. As expected, the duobinary and PAM4 signals reveal notches at half data rate. Note that for duobinary signal, the notch slightly deviates from 10 GHz, primarily because the physical circuits can only mimic the first lobe of the transfer function. The 9.3-Hz spacing shown in the inset corresponds to the  $2^{31}$ -1 PRBS at 20 Gb/s.

In order to fairly compare the signal integrity, we operate the three transceivers with the same supply voltage of 1.5 V and examine the far-end (position B) eye opening after a 40-cm Rogers channel (Fig. 29). It can be clearly shown that the duobinary signal presents the largest magnitude (200 mV) and eye opening (35 mV), whereas the NRZ signal exhibits the smallest (i.e., 60-mV magnitude and 10-mV opening). However, the



Fig. 22. Far-end (position B) waveforms with PAM4 signals for (a) 15-cm Rogers, (b) 5-cm FR4, and (c) 10-cm FR4 channels. (Data rate = 20 Gb/s, vertical scale = 100 mV/div, horizontal scale = 20 ps/div.)



Fig. 23. Eye diagrams of recovered data for PAM4 transceiver. (Data rate = 18 Gb/s, vertical scale = 10 mV/div, horizontal scale = 10 ps/div.)



Fig. 24. BER measurements for PAM4 transceiver in (a) Rogers, and (b) FR4 board.



Fig. 25. Far-end (position B) waveforms with NRZ signals for (a) 15-cm Rogers, (b) 5-cm FR4, and (c) 25-cm FR4 channels. (Data rate = 20 Gb/s, vertical scale = 50 mV/div, horizontal scale = 10 ps/div.)

NRZ signal can still achieve an outstanding BER primarily due to the simple receiver structure. In other words, the NRZ data can be amplified without considering the linearity, improving the signal integrity substantially.

As mentioned earlier, a regular CDR circuit can be adopted in the proposed duobinary transceiver. A lower loop bandwidth is thus expected in such a CDR in order to suppress the input data jitter. Basically, it is possible to acquire the noise profile from the recovered data with pre-compiled pattern (e.g., 0101...), and put it into the bandwidth optimization procedure like other phase-locking systems [14]. Since the receiver may create deterministic jitter because of the clock-free architecture, it is desirable to codesign the receiver and CDR so as to optimize the overall performance. The PAM4 receiver, on the contrary, suffers from complicated CDR design as compared with the other two.

It is also instructive to compare the overall performance of the three circuits. The NRZ signal continues to play an important role in different systems owing to its plain structure and power efficiency, whereas the duobinary provides an alternative solution for long-distance, high-speed communications. The NRZ data actually achieves the best performance in terms of BER



Fig. 26. Eye diagrams of recovered data for NRZ transceiver. (Data rate = 20 Gb/s, vertical scale = 100 mV/div, horizontal scale = 10 ps/div.)



Fig. 27. BER measurements for NRZ transceiver in (a) Rogers, and (b) FR4 board.



Fig. 28. Spectra of duobinary, PAM4, and NRZ signals at 20 Gb/s.



Fig. 29. Comparison of far-end waveforms. (Data rate = 20 Gb/s, supply voltage = 1.5 V, 40-cm Rogers.)

TABLE I Performance Summary

|                                  | [4]                                  |                      | This Work                                     |                      |                                               |                      |                                               |                      |
|----------------------------------|--------------------------------------|----------------------|-----------------------------------------------|----------------------|-----------------------------------------------|----------------------|-----------------------------------------------|----------------------|
| Data Rate                        | 12 Gb/s                              |                      | 20 Gb/s                                       |                      | 20 Gb/s                                       |                      | 20 Gb/s                                       |                      |
| Data Format                      | Duobinary                            |                      | Duobinary                                     |                      | PAM4                                          |                      | NRZ                                           |                      |
| BER<br>(2 <sup>31_</sup> 1 PRBS) | N/A                                  |                      | 40-cm Rogers                                  | 10-cm FR4            | 40-cm Rogers                                  | 10-cm FR4            | 40-cm Rogers                                  | 25-cm FR4            |
|                                  |                                      |                      | < 10 <sup>-12</sup>                           | < 10 <sup>-12</sup>  | < 10 <sup>-12</sup>                           | < 10 <sup>-12</sup>  | < 10 <sup>-12</sup>                           | < 10 <sup>-12</sup>  |
| Power                            | Тх                                   | Rx                   | Тх                                            | Rx                   | Тх                                            | Rx                   | Тх                                            | Rx                   |
|                                  | 133 mW                               | 97 mW                | 120 mW                                        | 75 mW                | 150 mW                                        | 258 mW               | 100 mW                                        | 26 mW                |
| Supply Voltage                   | 1.0 V                                |                      | 1.5 V                                         |                      | 1.8 V                                         |                      | 1.5 V                                         |                      |
| Far–End                          | 73.5 mV                              |                      | 35 mV                                         |                      | 18 mV                                         |                      | 10 mV                                         |                      |
| Eye Opening                      | (12 Gb/s, 75–cm Low– $\epsilon$ PCB) |                      | ( <i>V<sub>DD</sub></i> =1.5 V, 40–cm Rogers) |                      | ( <i>V<sub>DD</sub></i> =1.5 V, 40–cm Rogers) |                      | ( <i>V<sub>DD</sub></i> =1.5 V, 40–cm Rogers) |                      |
| Area                             | Тх                                   | Rx                   | Тх                                            | Rx                   | Тх                                            | Rx                   | Тх                                            | Rx                   |
|                                  | 0.18 mm <sup>2</sup>                 | 0.06 mm <sup>2</sup> | 0.21 mm <sup>2</sup>                          | 0.11 mm <sup>2</sup> | 0.19 mm <sup>2</sup>                          | 0.24 mm <sup>2</sup> | 0.23 mm <sup>2</sup>                          | 0.04 mm <sup>2</sup> |
| Technology                       | 90–nm CMOS                           |                      | 90–nm CMOS                                    |                      |                                               |                      |                                               |                      |

and power dissipation. The NRZ data also manifests itself if a CDR needs to be included in the receiver design, although it is clear that with the proposed architecture, the CDR design for duobinary signal could be as simple as that for a NRZ signal. On the other hand, the PAM4 signal may need linear amplification in the receiver front end to increase the SNR if the input signal is too small. This is not a trivial work in any technology. Besides, the retiming flipflops are almost mandatory in a PAM4 receiver [13], complicating the clock recovery and causing high power consumption. For these reasons, PAM4 becomes less attractive in modern transceiver designs. Table I compares the performance of these three transceivers with prior art.

# VII. CONCLUSION

A complete comparison and design analysis regarding three popular data signalings are presented. Novel architectures and circuit techniques have been introduced in the three transceiver prototypes targeting duobinary, PAM4, and NRZ signals, and all of them achieve error free operation for at least 40-cm Rogers and 10-cm FR4 channels at 20 Gb/s. The advantages and disadvantages for different topologies are proposed, providing empirical information for future backplane transceiver design.

## REFERENCES

- [1] 100 Gigabit Ethernet Forum 100G Ethernet Forum NG Ethernet Forum [Online]. Available: http://ng-ethernet.com/ethernet\_forum/index.php?c=2
- [2] A. Lender, "The duobinary technique for high-speed data transmission," *IEEE Trans. Commun. Electron.*, vol. 82, pp. 214–218, May 1963.
- [3] J. H. Sinsky et al., "High-speed electrical backplane transmission using duobinary signaling," *IEEE Trans. Microw. Theory Tech.*, vol. 53, no. 1, pp. 152–160, Jan. 2005.
- [4] K. Yamaguchi et al., "12 Gb/s duobinary signaling with × 2 oversampled edge equalization," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, 2005, pp. 70–71.
- [5] J. Sinsky *et al.*, "10 Gb/s duobinary signaling over electrical backplanes–Experimental results and discussion," Lucent Technologies, Bell Labs [Online]. Available: http://www.ieee802.org/3/ap/public/ jul04/sinsky\_01\_0704.pdf
- [6] F. Stremler, Introduction to Communication System, 3rd ed. Reading, MA: Addison-Wesley, 1990.

- [7] M. Tomlinson, "New automatic equalizer employing modulo arithmetic," *Electron. Lett.*, vol. 7, pp. 138–139, Mar. 1971.
- [8] H. Shankar, "Duobinary modulation for optical systems," Inphi Corp. [Online]. Available: http://www.inphi-copr.com/products/whitepapers/DuobinaryModulationForOpticalSystems.pdf
- [9] J. Lee, "A 75-GHz PLL in 90-nm CMOS," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2007, pp. 432–433.
- [10] J. Lee et al., "A 20-Gb/s duobinary transceiver in 90-nm CMOS," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2008, pp. 102–103.
- [11] C. Menolfi et al., "A 25 Gb/s PAM4 transmitter in 90-nm CMOS SOI," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2005, pp. 72–73.
- [12] B. Razavi, Design of Integrated Circuits for Optical Communications. New York: McGraw-Hill, 2002.
- [13] T. Toifl et al., "A 22-Gb/s PAM-4 receiver in 90-nm CMOS SOI technology," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 954–965, Apr. 2006.
- [14] H. Tao *et al.*, "40–43-Gb/s OC-768 16:1 MUX/CMU chipset with SFI-5 compliance," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2169–2180, Dec. 2003.



**Jri Lee** (S'03-M'04) received the B.Sc. degree in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, R.O.C., in 1995, and the M.S. and Ph.D. degrees in electrical engineering from the University of California, Los Angeles (UCLA), both in 2003.

After two years of military service (1995–1997), he was with Academia Sinica, Taipei, from 1997 to 1998, and subsequently Intel Corporation from 2000 to 2002. He joined National Taiwan University (NTU) in 2004, where he is currently an Associate

Professor of electrical engineering. His research interests include high-speed wireless and wireline transceivers, phase-locked loops, and data converters.

Dr. Lee is currently serving in the Technical Program Committees of the International Solid-State Circuits Conference (ISSCC), Symposium on VLSI Circuits, and Asian Solid-State Circuits Conference (A-SSCC). He received the Beatrice Winner Award for Editorial Excellence at the 2007 ISSCC, the Takuo Sugano Award for Outstanding Far-East Paper at the 2008 ISSCC, and the NTU Outstanding Teaching Award in 2007 and 2008. **Ming-Shuan Chen** was born in Taipei, Taiwan, R.O.C., in 1984. He received the B.S. degree in electrical engineering from National Tsing-Hua University, Hisnchu, Taiwan, in 2006, and the M.S. degree in electronics engineering from National Taiwan University, Taipei, Taiwan, in 2008.

His research interests focus on mixed-signal integrated circuit design for high-speed communication systems.



Huaide Wang was born in Taipei, Taiwan, R.O.C., in 1984. He received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 2006. He is currently pursuing the Ph.D. in the Graduate Institute of Electrical Engineering, National Taiwan University, Taipei.

His research interests are phase-locked loops and high-speed transceivers for wireline communication.