## 21.5 A 20Gb/s Full-Rate Linear CDR Circuit with Automatic Frequency Acquisition

Jri Lee, Ke-Chung Wu

National Taiwan University, Taipei, Taiwan

A linear CDR circuit [1] manifests itself in easy modeling and minimal activity on phase adjustment under locked condition. However, linear PDs face a speed limitation at around 10Gb/s, primarily because of the required pulsewidth comparison and finite flip-flop *CK*-to-*Q* delay. Parallelism could relax the stringent speed requirement, but it also introduces other issues such as clock skews and jitters. Frequency acquisition without an external reference such as Pottbacker FD [2] and other similar approaches [3,4] require quadrature clocks, potentially leading to higher phase noise as well. This paper presents the design and analysis of a 20Gb/s full-rate CDR circuit in 90nm CMOS, which completely eliminates the conventional issues by using an alternative linear PD structure and a referenceless FD with automaticity.

The presented CDR architecture is shown in Fig. 21.5.1. It incorporates a fullrate VCO, a mixer-based linear PD, an automatic FD, and the corresponding Vto-I converters. The input data passes through a chain of delay cells, providing a total delay (from  $V_A$  to  $V_E$ ) approximately equal to 25ps. An XOR gate examines this fixed phase difference, creating a pulse nominally equal to 25ps upon occurrence of data transitions. Acting as the reference for phase detection, this pulse sequence and the clock from the VCO are mixed subsequently. When a data edge is present, the mixer produces an output pulse whose width is proportional to the phase difference between the XOR output and the clock. This result can be used for phase alignment. Figure 21.5.2(a) illustrates the waveforms of important nodes under locked condition. During long runs, the mixer generates a periodic signal which is in phase with the clock. This signal has a zero average if the duty cycle of the clock is 50%. In other words, the mixer provides an average output proportional to the phase error between the two inputs. A V-to-I converter thus translates the voltage into current and injects it into the loop filter. As a result, the center tap  $V_{\rm c}$  aligns with the clock, providing perfect sampling on the falling edges for the retiming flip-flop.

To achieve a truly tristate PD,  $\overline{CK}$  is applied to  $(V/I)_{PD}$ , as depicted in Fig. 21.5.2(b), where two differential pairs steer two identical current sources. Since the clock and the mixer outputs are in phase, it completely cancels out the periodic disturbance for consecutive bits. As illustrated in Fig. 21.5.2(a),  $I_{P1}$ , the output current of  $(V/I)_{PD}$ , presents pure zero output during long runs. Note that the + and – pulse of  $I_{P1}$  on data transitions is equivalent to a 20GHz periodic phase modulation, which is rejected by the limited loop bandwidth of the CDR (~15MHz). The average output current as a function of phase error is also depicted in Fig. 21.5.2(c), presenting a PD gain [together with (V/I)PD] of 300µA/rad. The bandwidth comparison of the presented PD and the Hogge PD is also shown in Fig. 21.5.1. The former achieves a flat operation range of 180° from DC to 35Gb/s, whereas the latter drops dramatically after 1Gb/s. It is noteworthy that under locked condition, the clock edges always align with the center of the generated pulses, whether or not the delay from  $V_{\rm A}$  to  $V_{\rm E}$  $(\Delta T_{A \rightarrow F})$  is exactly 25ps. Moreover,  $V_c$  always coincides with the clock, keeping an optimal phase for data retiming. The simulated sampling-point deviation over  $\pm 10\%$  V<sub>DD</sub> and 70°C is less than 57mUI.

The nominally 12.5ps delay from  $V_{\rm B}$  to  $V_{\rm D}$  facilitates frequency difference detection. The presented FD is shown in Fig. 21.5.3(a). Here, the clock is sampled by  $V_{\rm B}$  and  $V_{\rm D}$ , producing two outputs  $Q_1$  and  $Q_2$ , respectively.  $Q_1$  is further sampled by  $Q_2$  through another flip-flop (*FF*<sub>3</sub>), and the polarity of frequency error  $Q_3$  is therefore obtained. The up/down signal is subsequently applied to a second V-to-I converter (V/I)<sub>FD</sub> to inject a current accordingly. A simple modification can be made to realize an automatic frequency acquisition. Recall that when the phase lock is accomplished,  $Q_1$  and  $Q_2$  would stay low and high, respectively. Following the design in [5],  $\overline{Q_2}$  (the reverse of  $Q_2$ ) is applied to (V/I)<sub>FD</sub> to achieve the automatic switching off when the frequency acquisition

is completed. This spontaneous operation saves considerable power and area for lock detector, logic controller, and other auxiliary circuits. The (V/I)<sub>FD</sub> bears a pumping current, twice larger than that in (V/I)<sub>PD</sub>, to ensure the FD loop dominates during frequency acquisition. Figure 21.5.3(b) shows the the capture range of the presented FD for a  $2^7$ -1 input data sequence.

While the 20GHz LC-tank VCO design is straightforward, the clock distribution is relatively challenging because it needs to drive a total capacitance of 120fF including the routing. Techniques such as inductive peaking and pure inductive loads suffer from large parasitic capacitance, incomplete swing, and uncontrollable Q. In this work, an underdamped peaking buffer, as shown in Fig. 21.5.4, is introduced. Converting the series L-R network to parallel, the output swing response is obtained, starting with a lower value of  $IR_{\rm s}$  from DC and presenting a gentle peaking of  $IR_{\rm p}$  at  $1/\sqrt{LC}$ . Since the physical resistor  $R_{\rm s}$ is fully under control, the peaking and bandwidth become well behaved. The underdamped buffer also presents less phase shift, relaxing the alignment issue in the retiming flip-flop. To further increase the bandwidth, one can cascade two staggered stages with different peaking frequencies. In this design, a moderate peaking of 2.7dB is used resulting in an overall -3dB bandwidth of 24.6GHz. Note that the two-stage topology also achieves a good isolation for VCO, protecting it from being disturbed by the sampling flip-flop and the frequency detector. A reversed isolation  $(S_{12})$  of -74dB is observed in simulation. The sampling flip-flop, mixer, XOR gate, and delay cell are implemented as CML topologies with regular inductive peaking to increase the operation speed.

The CDR circuit is designed and fabricated in a 90nm CMOS technology. The operation range is 950Mb/s without any external adjustment. Error-free operation (BER<10<sup>-12</sup>) for 2<sup>31</sup>-1 PRBS input is achieved for supply voltage from 1.3 to 1.7V (nominally 1.5V). The circuit consumes a total power of 154mW from a 1.5V supply, of which 65mW is dissipated in the PD, 66mW in the VCO and clock buffers, and 23mW in the FD. The phase noise plots of the free-running VCO, the phase-locked VCO and the input data are shown in Fig. 21.5.5. The output phase noise under locked condition measures -105dBc/Hz at 1MHz offset. Jitter generation of 475fs<sub>rms</sub> is obtained by integrating the phase noise from 100Hz to 1GHz offset frequencies, which roughly matches the timedomain measurement result, i.e., the rms and peak-to-peak jitter of the recovered clock are 407fs and 3.00ps, respectively. Figure 21.5.6(a) depicts the waveforms of the recovered data and clock in response to a PRBS of 2<sup>31</sup>-1, suggesting data jitter of 1.24ps<sub>rms</sub> and 5.3ps<sub>pp</sub>, respectively. The jitter tolerance test is depicted in Fig. 21.5.6(b). The measured jitter tolerance exceeds the extrapolated GR-253 mask by at least 0.43UI<sub>PP</sub> for all jitter frequencies. Figure 21.5.7 shows the die micrograph (0.97×0.88mm<sup>2</sup>), the test setup, and performance summary.

## Acknowledgment:

The authors thank MediaTek, NSC, and TSMC for support.

## References:

[1] C.R. Hogge, "A Self-Correcting Clock Recovery Circuit," *IEEE J. Lightwave Tech.*, vol. 3, no. 6, pp. 1312-1314, Dec., 1985.

[2] A. Pottbacker, U. Langmann, and H.-U. Schreiber, "A Si Bipolar Phase and Frequency Detector for Clock Extraction up to 8Gb/s," *IEEE Journal of Solid-State Circuits*, vol. 27, no. 12, pp. 1747-1751, Dec., 1992.

[3] S. Anand and B. Razavi, "A 2.75Gb/s CMOS Clock Recovery Circuit with Broad Capture Range," *ISSCC Dig. Tech. Papers*, pp. 214-215, Feb., 2001.

[4] J. Savoj and B. Razavi, "A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Binary Phase/Frequency Detector," *IEEE J. Solid-State Circuits*, vol. 38, no. 1, pp. 13-21, Jan., 2003.

[5] Jri Lee, "High-Speed Circuit Designs for Transmitters in Broadband Data Links," *IEEE J. Solid-State Circuits*, vol. 41, no. 5, pp. 1004-1015, May, 2006.

[6] J. Takasoh, T. Yoshimura, H. Kondoh, and N. Higashisaka, "A 12.5Gbps Half-rate CMOS CDR Circuit For 10Gbps Network Applications," *IEEE Symp. VLSI Circuits*, pp. 268-271, June, 2004.

[7] C. Kromer, G. Sialm, C. Menolfi, et al, "A 25-Gb/s CDR in 90-nm CMOS for High-Density Interconnects," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2921-2929, Dec., 2006.

[8] Y. Ohtomo, K. Nishimura, and M. Nogawa, "A 12.5-Gb/s Parallel Phase Detection Clock and Data Recovery Circuit in 0.13-µm CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 7, pp. 2052-2057, Sep., 2006.



## **ISSCC 2009 PAPER CONTINUATIONS**

