## 9.5 An 87GHz QPSK Transceiver with Costas-Loop Carrier Recovery in 65nm CMOS

Shih-Jou Huang, Yu-Ching Yeh, Huaide Wang, Pang-Ning Chen, Jri Lee

National Taiwan University, Taipei, Taiwan

Modern high-speed wireless data links such as 60GHz RF and point-to-point communications activate research on Gb/s transceivers for V-band (50 to 75GHz) and W-band (75 to 110GHz). Conventional approaches in SiGe or III-V compounds simply consume too much power and occupy too large area, in addition to the integration difficulty. In CMOS realizations, baseband processors and high-speed digitizers also increase design complexity and power consumption. This paper introduces a fully integrated CMOS QPSK transceiver with baseband-less carrier- and data-recovery circuitry, significantly reducing power consumption while achieving up to 3.5Gb/s data rate.

The QPSK transmitter is shown in Fig. 9.5.1. Providing a prototype for the 81to-86GHz band and 94GHz band, we set up the carrier at 87GHz. An integer-*N* frequency synthesizer provides 77.3GHz LO and 9.7GHz IF signals, forming an 8/9-1/9 up-convert architecture. It contains a VCO running at 77.3GHz,  $\div$ 8 and  $\div$ 16 circuits, a type-IV PFD, a V-to-I converter, and an on-chip 2nd-order loop filter. The frequency arrangement is also applied to the RX, making no potential frequency offset between the two. A QPSK modulator generates 4 phases (0°, 90°, 180°, and 270°) of the 9.7GHz IF signal and sends them to the up-convert mixer, creating the 87GHz RF signal. A mm-Wave balun converts differential RF inputs into single-ended mode to drive the PA. With the input capacitance of  $M_1$ and  $M_2$  absorbed by the mixer's resonance network, the balun achieves conversion loss of 1.4dB if the transformer coupling factor is 0.6. The PA's output is fed into a matched microstrip line, which connects to the waveguide adapter.

To perform coherent demodulation, the LO clock in the RX must be synchronized in phase and frequency. Unlike typical baseband approaches that require highspeed ADCs (several GS/s) and signal processors (>1GHz), we realize the carrier recovery in the analog domain (i.e., Costas loop). As depicted in Fig. 9.5.2, the receiver is composed of an LNA, a down-convert mixer, an IF amplifier, a 77.3GHz clock source (VCO) and its  $\div$ 8 descendants (9.7GHz *I/Q* signals), and a phase detector (PD) and a frequency detector (FD) for carrier recovery. After LNA amplification, the incoming RF signal is downconverted into IF by the VCO's clock. Assuming the signal at node *A* is given by

 $V_{\mathsf{A}}(t) = D_{\mathsf{I}}(t) \mathsf{cos}(\omega_{\mathsf{IF}}t + \Delta \phi) + D_{\mathsf{Q}}(t) \mathsf{sin}(\omega_{\mathsf{IF}}t + \Delta \phi),$ 

where  $D_{\rm I}$ ,  $D_{\rm Q}$  denote the embedded data,  $\omega_{\rm IF}$  is the  $LO_2$  frequency (= $2\pi \times 9.7$ GHz), and  $\Delta \phi$  is the phase error between IF and  $LO_2$ . After mixing and limiting, the two outputs  $V_{\rm D}(t)$  and  $V_{\rm E}(t)$  become  $D_{\rm I}(t)$  and  $D_{\rm Q}(t)$ , respectively, given that -45°< $\Delta \phi$ <45°. The  $V_{\rm C}$  ( $V_{\rm B}$ ) and  $V_{\rm D}$  ( $V_{\rm E}$ ) are further mixed together, creating the final output  $V_{\rm F}$  proportional to  $\sin(\Delta \phi)$ . As a result, the (V/I)<sub>1</sub>'s output  $I_{\rm CP1}$  presents a sinusoidal characteristic as shown in the upper-left corner of Fig. 9.5.2. Denoting pumping current as  $I_{\rm P}$  when  $\Delta \phi = \pi/4$ , the PD [together with (V/I)<sub>1</sub>] gain is given by  $\sqrt{2}I_{\rm P}$ . The approximately linear behavior in the vicinity of the origin makes itself a linear PLL with two correlated phase-adjusting mechanisms, i.e.,  $LO_1$  loop and  $LO_2$  loop. Upon lock, the two demodulated data are found in nodes *D* and *E*, as expected.

In addition to phase alignment, the VCO frequency has to be corrected at power on. It is possible to form another loop and push the VCO frequency into the lock range by means of an external crystal. A much better way to do so is to extend the Costas loop one step further. As shown in Fig. 9.5.2, examining the crossand auto-correlation of signals  $V_{\rm B}$  and  $V_{\rm C}$  twice gives rise to an output containing the frequency error information. Denoting the frequency error between  $LO_2$ and IF signals as  $\Delta\omega$ , we follow the same calculation procedure and obtain the signals in nodes *G* and *H*:

where  $A_V$  represents the swing magnitude designed in our CML blocks. These two signals are separated by 90°. Whether  $V_G$  is leading or lagging  $V_H$  depends on the polarity of  $\Delta \omega$ , which can be easily detected by sampling one signal with the other [1]. As a result, the two limiters sharpen the sinusoids as square

functions, and send them to an edge-triggered flip-flop. The error polarity is therefore applied to  $(V/I)_2$ , which provides pumping current  $I_{CP2}$  (=5 $I_{CP1}$ ) to adjust the VCO control voltage accordingly. Note that to minimize undesired disturbance on the control line, the FD loop must be disabled when the loop is locked. Observing that  $V_G$  will stay low when  $\Delta \omega \approx 0$ , we apply  $V_K$  ( $V_G$  after limiting) to (V/I)<sub>2</sub> to automatically shut it down when the frequency locking is achieved. Similar to that in [1], this automatic shut-off mechanism saves significant power and area.

The QPSK modulator is illustrated in Fig. 9.5.3(a). Here, a DMUX parallelizes the input data and a mapping logic rearranges the sequences. After retiming, the two outputs are created to control the IF 2-to-1 selector and the clock synthesization mixer, where the former picks the proper phase (0°, 90°) and the latter the polarity (180°, 270°). To achieve such a high frequency, the VCO and dividers involve mm-Wave techniques [Fig. 9.5.3(b)]. It incorporates a cross-coupled *LC* oscillator with thick-oxide varactors, which is followed by two stages of tuned amplifiers ( $M_3$ - $M_5$ ) as buffers. A direct injection-locked frequency divider is employed here as the 1st divider stage, where the injection signal is ac-coupled to the gate of the switch  $M_7$  [2]. Simulation shows that its lock range is approximately equal to 4GHz when  $V_b = 0.8V$ .

The interconnection between chip and antenna is of great importance. As shown in Fig. 9.5.4(a), we realize the coplanar strip to waveguide by a transition fabricated on a single-layer dielectric substrate [3]. With the chip flipped onto the microstrip line, the RF signal is coupled through the substrate to the matching element, which is connected to the waveguide (entrance of horn) tightly. The measured insertion loss of the transition is depicted in Fig. 9.5.4(b), where the maximum loss from 81 to 86GHz is less than 5.3dB. Figure 9.5.4(c) illustrates a photo of the assembly. The LNA, PA, and RF mixer designs are based on our previous work in 60 and 77GHz [2,4]. The 3-stage LNA achieves gain of 18.5dB, and the 5-stage PA 13dB gain and 6dBm  $P_{1dB}$ .

The transmitter and receiver are fabricated in 65nm CMOS technology. The TX and RX consume 212mW and 166mW, respectively, from a 1.2V supply. Figure 9.5.5(a) shows the TX's output spectrum under 650Mb/s QPSK modulation with bit length of  $2^{31}-1$ , revealing a sinc function centered at the carrier frequency (87GHz). The TX presents a phase noise of -85.8dBc/Hz at 1MHz offset. The spectrum of the carrier recovered in the RX is also demonstrated in Fig. 9.5.5(b), which reveals a phase noise of -77.7dBc/Hz at 1MHz offset and -88dBc/Hz at 10MHz offset. Figure 9.5.6(a) shows the recovered (and demuxed) data in the RX. With  $D_{in} = 2.5$ Gb/s, the recovered data jitter measures 34ps,rms and 200ps,pp, respectively. In this testing, both the TX and the RX are connected to 24dBi horn antennas, separated by 1 meter of distance. The BER as a function of data rate is also investigated. Figure 9.5.6(b) depicts the results for different data patterns. BER <  $10^{-11}$  can be obtained for input of 3.5Gb/s  $2^7$ -1 PRBS. Figure 9.5.7 shows the die micrograph, which occupies 1.32×1mm<sup>2</sup>. In estimation, the longest distance for  $D_{in}$  = 2.5Gb/s is approximately equal to 2km given that 48dBi dish antennas are available. Note that a conventional TRX without carrier recovery [5] suffers from synchronization difficulty and cannot operate independently. A table summarizing the performance of this work is also included in Fig. 9.5.7.

## Acknowledgment:

The authors thank the TSMC University Shuttle Program for chip fabrication.

## References:

[1] J. Lee et al., "A 75-GHz Phase-Locked Loop in 90-nm CMOS Technique," *IEEE J. Solid-State Circuits*, vol. 43, pp. 1414-1426, June 2008.

[2] Y. Li et al., "A Fully-Integrated 77GHz FMCW Radar System in 65nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 216-217, Feb. 2010.

[3] H. lizuka et al., "Millimeter-Wave Microstrip Line to Waveguide Transition Fabricated on a Single Layer Dielectric Substrate," *IEICE Tran. Commun.*, pp. 1169-1177, June 2002.

[4] J. Lee et al., "A Low-Power Fully Integrated 60GHz Transceiver System with OOK Modulation and On-Board Antenna Assembly," *ISSCC Dig. Tech. Papers*, pp. 316-317, Feb. 2009.

[5] C. Marcu et al., "A 90 nm CMOS Low-Power 60 GHz Transceiver with Integrated Baseband Circuitry," *IEEE J. Solid-State Circuits*, vol. 44, pp. 3434-3447, Dec. 2009.



9

## **ISSCC 2011 PAPER CONTINUATIONS**

