# Low-Power Clock-Deskew Buffer for High-Speed Digital Circuits

Shen-Iuan Liu, Jiunn-Hwa Lee, and Hen-Wai Tsao

Abstract—An IC containing four clock deskew buffers using the delay-locked-loop technology has been fabricated in a 0.6- $\mu$ m single poly double metal CMOS process. The core chip area is  $0.9 \times 0.9 \text{ mm}^2$ . The maximum operating frequency is 80 MHz, and the total power dissipation of the four deskew buffers is 59 mW for a 3-V supply voltage. The maximum clock skew after deskewing is less than 300 ps, and the peak-to-peak clock jitter is less than 170 ps. The deskew range is 0.5-3.8 ns.

Index Terms - Clock, delay-locked loops, skew.

### I. INTRODUCTION

[7ITH THE rapid advances in technology, synchronous data transfer between boards at rates up to 100 MHz may be required [1]. Clock-skew reduction issues will be more important in future designs that feature large die size and higher system clock frequencies. Reducing the clock skew can not only increase system clock frequency but also avoid system malfunction [2]. The major reasons for clock skew come from the system clock distribution and the propagation delay of the clock chip and the clock traces on the board, etc. The propagation delay is dependent on process, voltage, temperature, and loading (PVTL), which make the clock skew more complicated. Several clock-deskew buffers [1]–[7] have been presented to circumvent these problems. There are two major methods: analog and digital. The analog method is primarily based on analog phase-locked loops (PLL's) and delay-locked loops (DLL's). Several analog PLL/DLL's [3]-[7] have been developed. CY7B991 [7] requires only a single terminated transmission line [6] for each load, which can minimize the clock distribution difficulty while allowing maximum system clock speed and flexibility. However, it is necessary to know the precise value of clock skew, which is difficult due to PVTL effects. Some configurations using two matched transmission lines for each load have been developed [1]–[6] to reduce the clock skew adaptively. Alternatively, various digital DLL's [1], [2] have also been developed. The advantages of the analog approaches are low power and small chip area. However, the digital approaches can provide more robustness for power-supply noise and PVTL effects. In this paper, the analog DLL approach is adopted.

The power dissipation for data transmission between high-speed digital systems will be quite large using the conventional buffers [8]. Some methods have been presented to reduce the power dissipation between interconnection by dynamic termination [8] and impulse transmission [9]. In this paper,

Manuscript received October 10, 1997; revised October 1, 1998.

The authors are with the Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617 R.O.C. (e-mail: lsi@cc.ee.ntu.edu.tw).

Publisher Item Identifier S 0018-9200(99)02436-1.

a low-power CMOS clock-deskew buffer with impulse transmission is presented.

### II. CIRCUIT DESCRIPTION

The block diagram of the proposed clock-deskew buffer is shown in Fig. 1. It consists of a precharge-type phase frequency detector [10], a charge-pump circuit with an on-chip capacitor, two voltage-controlled delay lines (VCDL's) [3], a reference phase delay circuit, an impulse clock driver [9], a clock-level restorer, and a two-to-one multiplexer. Assuming that two VCDL's are matched, as are the two transmission lines, the loop delay  $T_{\rm loop}$  can be defined as the time for the external clock to propagate through two VCDL's, two transmission lines, the clock driver and multiplexer, and the clock restorer to arrive at the phase detector. It is given as

$$T_{\text{loop}} = 2T_{\text{delay}} + T_{\text{pd1}} + 2T_{\text{line}} + T_{\text{pd2}} \tag{1}$$

where  $T_{\rm delay}$  is the time delay for the clock going through the VCDL,  $T_{\rm pd1}$  is the delay caused by the impulse clock driver and the two-to-one multiplexer,  $T_{\rm line}$  is the delay time of the external transmission line, and  $T_{\rm pd2}$  is the delay time of the clock-level restorer. In the steady state, the phase detector and charge-pump circuit will adjust the delay time of VCDL's,  $T_{\rm delay}$ , so that the reference delay time will be equal to  $T_{\rm loop}$ , i.e.,

$$T_{\text{ref\_delay}} = T_{\text{loop}}.$$
 (2)

Therefore, the time for the external clock to propagate to the input terminal B of the remote chip is equal to  $T_{\rm delay} + T_{\rm pd1} + T_{\rm line}$ . By adding  $T_{\rm pd1}$  to and subtracting  $T_{\rm pd2}$  from each side of (1), replacing  $T_{\rm loop}$  in (1) by  $T_{\rm ref\_delay}$ , and dividing by two, the following equation can be obtained:

$$T_{\text{delay}} + T_{\text{pd1}} + T_{\text{line}} = (T_{\text{ref\_delay}} + T_{\text{pd1}} - T_{\text{pd2}})/2.$$
 (3)

That is to say, for different lengths of transmission lines (i.e., different  $T_{\rm line}$ ), the VCDL's  $T_{\rm delay}$  will be adjusted to make the clock to arrive at the remote chips with the same delay time. From the right side of (3), it is a constant for different  $T_{\rm line}$ . It can be concluded that the clock skew has been reduced.

The detailed circuit descriptions are given as follows. The VCDL consists of eight delay cells [3], as shown in Fig. 2. The reference delay circuit is composed of 12 RC delay cells with  $V_{\rm Ctl}$  connected to the positive supply voltage. A precharge-type phase frequency detector [10] is adopted, as shown in Fig. 3(a). If the in1 signal leads the in2 signal, there is a signal to denote their phase difference. Otherwise, the output signal is low. By using two such circuits, the precharge-type phase frequency detector and its timing diagram are shown



Fig. 1. The block diagram of the clock-deskew buffer.



Fig. 2. The voltage-controlled delay line.



Fig. 3. The precharge-type phase frequency detector.

in Fig. 3(b). The dead band of this precharge-type phase frequency detector is small enough to not be an issue for our applications. The UP and DW signals are used to control the charge-pump circuit. The current-type charge-pump circuit is shown in Fig. 4. The structure of the on-chip capacitor  $C_{\rm CP}$  used in the charge-pump circuit, which can provide larger capacitance in a limited chip area, is also shown in Fig. 4.

To reduce the power dissipation in the transmission lines, the impulse transmission technique [9] is adopted. The impulse clock driver and its timing diagram are shown in Fig. 5(a). Once input signal IN changes from low to high, signal N1 is delayed by time t1 to arrive at N2. In this period, N3 will be pulled down temporarily and then pulled high again. Thus, there is an impulse signal occurring at the output. To



Fig. 4. The current-type charge pump and its capacitor.



Fig. 5. (a) The impulse clock driver, (b) its timing diagram, and (c) the comparison of the power dissipation between the conventional clock driver and the impulse clock driver.

compare with the conventional driver as shown in Fig. 5(b), their power dissipation values are both shown in Fig. 5(c) for a 15-pF capacitive load. It achieves a 25% power savings for the impulse transmission at 100 MHz.

The clock restorer to recover an impulse signal into a CMOS-level signal is a Schmitt trigger [11]. However, it is different from the conventional six-transistor CMOS Schmitt trigger. Its hysteresis feature is determined by the threshold-voltage difference of the CMOS inverters, which is less sensitive to the process and supply-voltage variations [11]. The two-to-one multiplexer, as shown in Fig. 1, is used for testing considerations. Once Select = 0, the clock is directly connected to the clock driver without the deskew buffer. When Select = 1, the clock is connected to the VCDL, then to the clock driver with the deskew buffer.

The stability of this clock-deskew buffer can be analyzed by using the block diagram [5] in Fig. 6. The phase  $\theta_R$  is the



Fig. 6. Equivalent circuit for the clock-deskew buffer.

phase shift experienced by the external clock going through the reference delay. The phase  $\theta_{\rm FB}$  is a phase shift experienced by the external clock's going through the VCDL's, multiplexer, clock driver, transmission lines, and clock restorer. The phase detector can be modeled as an adder, and the charge-pump circuit can be modeled as a constant gain block  $K_{\rm CP}$  and an



Fig. 7. Photograph of the four clock-deskew buffers.

integrator. The phase error  $\theta_e$  can be expressed as

$$\theta_e = \theta_R - \theta_{\rm FB}.\tag{4}$$

The gain of the charge-pump circuit can be expressed as

$$K_{\rm CP} = \frac{I_{\rm CP}}{C_{\rm CP}\omega_{\rm clk}} \tag{5}$$

where  $\omega_{\rm clk}$  is the clock frequency,  $I_{\rm cp}$  is the charge-pump current, and  $C_{\rm CP}$  is its capacitor load. The VCDL can be modeled with a gain of  $K_D$  (rad/V) and a  $z^{-m}$  block. The  $z^{-m}$  block is defined as the time elapsed from the generation of an UP or DW signal until the corresponding clock phase shift is produced after the loop delay time  $(T_{\rm loop})$  [5]. Therefore, the phase transfer function can be expressed as

$$H(z) = \frac{\theta_{\rm FB}(z)}{\theta_R(z)} = \frac{K}{z^m - z^{m-1} + K}$$
 (6)

where  $K = K_{\rm CP} K_D$ , m is the integer part of  $[1 + T_{\rm loop}/T_{\rm clk}]$ , and  $T_{\rm clk}$  is the clock period.

For m=1 and m=2, the stability constraint is  $K=K_DI_{\rm CP}/C_{\rm CP}\omega_{\rm clk}<1$ ; and for m=3, K<0.625 is required. The charging/discharging current of the charge-pump circuit is 210  $\mu{\rm A}$ , and the maximum  $K_D$  of the phase detector is  $2\pi\cdot 3$  ns/ $T_{\rm CLK}$ . For m=1 and 2, the minimum capacitor  $C_{\rm CP}$ , is 0.63 pF, and for m=3, it is 1 pF. The larger the capacitor that the DLL's have, the smaller the bandwidth and the better the jitter performance. According to simulation results,  $C_{\rm CP}$  is chosen to be 6 pF, which can provide better jitter performance. In this paper, four deskew buffers were realized. However, the scheme depends on all the loops' containing the same number of clock cycles, since this will be an important constraint for operating at higher frequencies.

TABLE I
MEASURED RESULTS FOR VARIOUS TRANSMISSION
LINES AT A CLOCK FREQUENCY OF 66 MHz

| L1 (cm) | L2(cm) | S=0 (clock skew) | S=1(deskewed) |
|---------|--------|------------------|---------------|
| 60      | 10     | 2.6ns            | 280ps         |
| 60      | 20     | 2.15ns           | 250ps         |
| 60      | 30     | 1.7ns            | 210ps         |
| 60      | 40     | 1.16ns           | <100ps        |
| 60      | 50     | 0.6 <b>ns</b>    | <100ps        |

TABLE II
MEASURED RESULTS FOR TWO DIFFERENT TRANSMISSION
LINES (ONE 60 cm and the Other 10 cm)

| Clock frequency<br>(MHz) | S=0 (clock skew) | S=1(deskewed) | Deskewed clock<br>jitter<br>(peak-to-peak) |
|--------------------------|------------------|---------------|--------------------------------------------|
| 10                       | 2.6ns            | <100ps        | 120ps                                      |
| 20                       | 2.6ns            | <100ps        | 130ps                                      |
| 30                       | 2.6ns            | <100ps        | 130ps                                      |
| 40                       | 2.6ns            | 200ps         | 150ps                                      |
| 50                       | 2.6ns            | 240ps         | 150ps                                      |
| 60                       | 2.6ns            | 220ps         | 150ps                                      |
| 70                       | 2.6ns            | 200ps         | 150ps                                      |
| 80                       | 2.6ns            | 260ps         | 170ps                                      |

TABLE III
SUMMARY OF THE PROPOSED CLOCK-DESKEW BUFFER

| Supply Voltage                  | 3V                     |  |
|---------------------------------|------------------------|--|
| Power                           | 59mW@80MHz             |  |
| Clcok frequency                 | 10~80MHz               |  |
| Deskew range                    | 0.5~3.8ns              |  |
| Max. clock skew<br>after deskew | 300ps                  |  |
| Max. clock jitter after deskew  | 170ps (p-p)            |  |
| Technology                      | 0.6μmCMOS              |  |
| Chip area (core)                | 0.9x0.9mm <sup>2</sup> |  |
| Transistor count                | 848                    |  |

# III. EXPERIMENTAL RESULTS

The proposed deskew buffer has been fabricated in a 0.6- $\mu$ m SPDM CMOS process. The photograph of the IC containing four deskew buffers is shown in Fig. 7. Transmission lines with five different lengths have been tested. Fig. 8 shows a typical impulse signal and the corresponding recovered signal at 50 MHz. For example, the comparisons of two deskew buffers with different transmission lines are listed in Table I at an operating frequency of 66 MHz. The comparisons under different operating frequencies are listed in Table II. Fig. 9 shows the output signals with two transmission lines (one 60 cm and the other 10 cm) before and after the deskew buffer at a clock frequency of 80 MHz. The jitter (peak-to-peak) characteristics of the output signals after the deskew



Fig. 8. Typical impulse signal (a) before (500 mV/div) and (b) after (100 mV/div) the clock-level restorer at a clock frequency of 50 MHz. The horizontal scale is 5 ns/div.



Fig. 9. The output signals (a) before and (b) after the deskew buffer with two transmission lines (one 60 cm and the other 10 cm) at a clock frequency of 80 MHz. The horizontal scale is 5 ns/div, and the vertical scale is 100 mV/div.

buffer are also listed in Table II. Last, the measured characteristics of the proposed deskew buffer are summarized in Table III.

## IV. CONCLUSIONS

In this paper, a set of clock-deskew buffers using impulse transmission has been presented and has also been fabricated in a 0.6- $\mu$ m CMOS technology. Experimental results have verified the feasibility of the proposed clock-deskew buffer.

## REFERENCES

- R. B. Watson, Jr., and R. B. Iknaian, "Clock buffer chip with multiple target automatic skew compensation," *IEEE J. Solid-State Circuits*, vol. 30, pp. 1267–1276, Nov. 1995.
- [2] Y. Okajima, M. Taguchi, M. Yanagawa, K. Nishimura, and O. Hamada, "Digital delay locked loop and design technique for high-speed synchronous interface," *IEICE Trans. Electron.*, vol. E79-C, pp. 798–807, June 1996.

- [3] M. G. Johnson and M. E. Hudson, "A variable delay line PLL for CPUcoprocessor synchronization," *IEEE J. Solid-State Circuits*, vol. 23, pp. 1218–1223, Oct. 1988.
- [4] H. Sutoh, K. Yamakoshi, and M. Ino, "Circuit techniques for skew-free clock distribution," in *IEEE Custom Integrated Circuits Conf.*, 1995, pp. 163–166.
- [5] D. E. Brueske and S. H. K. Embabi, "A dynamic clock synchronization technique for large systems," *IEEE Trans. Comp., Packag., Manufact. Technol. B*, vol. 17, pp. 350–361, Aug. 1994.
- [6] T. Knight and H. M. Wu, "A method for skew-free distribution of digital signals using matched variable delay lines," in *IEEE Dig. Tech. Papers Symp. VLSI Circuits*, 1993, pp. 19–20.
- [7] "CY7B991," Cypress Semiconductor Corp., San Jose, CA, 1995.
- [8] T. Kawahara, M. Horiguchi, J. Etoh, K. Kimura, and M. Aoki, "Low-power chip interconnection by dynamic termination," *IEEE J. Solid-State Circuits*, vol. 30, pp. 1030–1034, Sept. 1995.
- [9] M. Nogawa, Y. Ohtomo, and M. Ino, "A low-power and high-speed impulse-transmission CMOS interface circuit," *IEICE Trans. Electron.*, vol. E78-C, pp. 1722–1736, Dec. 1995.
- [10] H. Notani, H. Kondoh, and Y. Matsuda, "A 622-MHz CMOS phase-locked loop with precharge-type phase frequency detector," in *Proc. IEEE Symp. VLSI Circuits*, June 1994, pp. 129–130.
- [11] D. Kim, J. Kih, and W. Kim, "A new waveform-reshaping circuit: An alternative approach to Schmitt trigger," *IEEE J. Solid-State Circuits*, vol. 28, pp. 162–164, Feb. 1993.