# W-Band BPSK and QPSK Transceivers With Costas-Loop Carrier Recovery in 65-nm CMOS Technology

Shih-Jou Huang, Yu-Ching Yeh, Huaide Wang, Pang-Ning Chen, and Jri Lee, Member, IEEE

Abstract—This paper presents two fully integrated binary phase-shift keying (BPSK) and quadrature phase-shift keying (QPSK) transceivers operating at *W*-band [carrier frequency = 84 GHz (BPSK), and 87 GHz (QPSK)]. Including RF front-end, Costas-loop-based carrier and data recovery, and antenna assembly technique, the BPSK transceiver prototype achieves a 2.5-Gb/s data link with BER <  $10^{-9}$  while consuming 202 mW (Tx) and 125 mW (Rx) from a 1.2-V supply. The QPSK TRx achieves a 2.5-Gb/s data link with BER <  $10^{-11}$  while consuming 212 mW (Tx) and 166 mW (Rx) from a 1.2-V supply. Both cases are measured with link distance of 1 m and antenna gain of 24 dBi.

*Index Terms*—Binary phase-shift keying (BPSK), Costas loop, low-noise amplifier (LNA), power amplifier (PA), quadrature phase-shift keying (QPSK), transceiver (TRx), W-band.

## I. INTRODUCTION

I N 2003, the U.S. Federal Communication Commission (FCC) leased the millimeter-wave spectrum in the 71–76, 81–86, and 92–95 GHz frequency bands for commercial usage [1]. Current solid-state device technologies such as gallium nitride (GaN)-based HEMTs have achieved significant advances in transmit output power at the W-band (75–110 GHz), which serves as an enabling capability for high-data-rate communication. Recent advances in solid-state device frequency performance will similarly enable significant performance gains in low-noise amplifiers (LNAs) and W-band receivers. W-band developments have also led to seek the development of solid-state power amplifier technology for Navy all-weather radar, surveillance, reconnaissance, electronic attack, communications, and asymmetric warfare systems.

Being the highest frequency bands ever leased for commercial exploitation, these new territories evoked quite a few applications. For example, a wireless point-to-point link can be established as a substitute or backup for underground optical fibers, as illustrated in Fig. 1(a). A metro network service can be developed among densely situated buildings in urban areas. W-band also provides a possible wireless backhaul solution for the ever-growing bandwidth-intensive applications such as

Manuscript received April 08, 2011; revised August 02, 2011; accepted August 02, 2011. Date of publication October 06, 2011; date of current version November 23, 2011. This paper was approved by Guest Editor Yorgos Palaskas.

The authors are with the Electrical Engineering Department, National Taiwan University, Taipei, Taiwan (e-mail: f97943038@ntu.edu.tw; butyl0210@hot-mail.com; b91901125@ntu.edu.tw; howwhatwho@hotmail.com; jrilee@cc.ee. ntu.edu.tw).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2011.2166469

Worldwide Interoperability for Microwave Access (WiMAX) and other systems. The co-existence of 70- and 80-GHz bands allows a 5-GHz full-duplex transmission bandwidth, which can be used to transmit high-speed data ( $\geq 1$  Gb/s) even with a very simple modulation scheme, e.g., on-off keying (OOK) [2], [3]. If more spectrally efficient modulations are applied, full-duplex data rates of more than 10 Gb/s can be reached [4]. On the other hand, unlike the 60-GHz band whose transmission distance may be deleteriously affected due to oxygen absorption, wireless communication at these frequencies can reach a distance of a few kilometers. Signal loss at these high frequencies through the atmosphere is quite insignificant, i.e., usually less than several dB/km unless in extreme conditions. The table in Fig. 1(b) summarizes signal loss under different weather situations. The pencil-beam radiation requires high-gain antennas on both sides, which in turn relaxes the power amplifier (PA) design, i.e., output power can be as low as a few dBm. Such a feature justifies the use of CMOS technology realization.

However, there are lots of difficulties in building up such a high-end system. First of all, propagation loss is of great concern. It can be proven that at 80 GHz, the isotropic loss at 1 km is greater than 130 dB, suggesting the need for high-gain antennas (usually in dish-type) and accurate alignment. A link budget estimating the situation for 2 km is shown in Fig. 1(c). With 40-dBi antenna gain and 10-dB noise figure for a low-noise amplifier and mixer, we need a PA output around 5 dBm. It is also desirable to minimize the receiver's noise figure as much as possible so as to leave some margin for possible interconnection loss. In such a straight point-to-point link, simplifying the hardware complexity (and thus the cost) is of great interest. It is unnecessary to accommodate complicated modulation or costly beamforming techniques in baseband, since more typical communication issues such as channel interference or fading do not exist here. Using conventional wireless transceiver architecture requires at least 6-bit 5-GSample/s analog-to-digital converters (ADCs) for 2.5-Gb/s data rate, and all of the digital circuits in baseband are operated at 5 Gb/s (unless with further parallelizing). In this design, we obviate the use of baseband and interface circuitry. We synchronize transmitter and receiver by means of a Costas loop [5], performing coherent demodulation in analog domain. As will be presented in the following sections, this novel architecture significantly reduces power consumption, design complexity, and cost.

In contrast to other analog BPSK/QPSK modulator/demodulators, our designs present several advantages. For example, 10-Gb/s QPSK modulator/demodulator has been proposed in [6]. However, the direct QPSK modulator in [6] requires a



Fig. 1. (a) Illustration of wireless point-to-point link between high-rise buildings. (b) Mm-wave signal (71–86 GHz) loss for different weather conditions (dB/km). (c) Link budget estimated for 2-km link (40-dBi antenna gain included).

phase shifter to generate I/Q signals. The demodulator also needs delay detection to obviate carrier recovery circuitry, which in turn requires a reference signal to ensure one-symbol delay time. In wireless systems, the finite frequency offset between Tx and Rx makes this approach very difficult to realize. The power consumption (850 mW/650 mW) and chip area  $(2 \times 2 \text{ mm}^2)$  in [6] also make it less competitive. Here, we have two prototypes, both verified in 65-nm CMOS technology: one implements binary phase-shift keying (BPSK) and the other quadrature phase-shift keying (QPSK). Although not as widely used in today's wireless communications, the Costas loop actually provides an efficient solution for frequency offset issues between transmitter and receiver. Both cases achieve low-error communications (bit error rate (BER)  $< 10^{-9}$ ) while consuming total power (Tx+Rx) of no more than 327 and 378 mW, respectively.

This paper is organized as follows. Section II illustrates the carrier recovery technique and data demodulation for BPSK and QPSK transceivers. Section III presents the transceiver architectures, and Section IV describes the building blocks. Section V analyzes Costas loop behavior, and Section VI covers the chipantenna assembly techniques. Complete measurement results are illustrated in Section VII, and consideration for future work is discussed in Section VIII. Finally, Section IX concludes this work.

## II. CARRIER RECOVERY AND DEMODULATION

## A. BPSK

In ideal phase-shift keying systems, there is no spectral line at carrier frequency. However, actual wireless systems always contain mismatches, leading to finite carrier spectral line. Theoretically, it is possible to distill the carrier frequency by placing a high-Q filter in the Rx front-end. This method is actually not feasible because (1) the radio frequency (RF) signal captured from the antenna is usually too small to be processed even after the amplification of LNA; and (2) it is almost impossible to realize a high-Q (>50) filter on chip; or (3) even if the carrier frequency can be obtained, the phase relationship between RF and local-oscillation (LO) signals is still unknown. As a result, we have to look for other nonlinear solutions, and one of the best candidates is just such a Costas loop.

A Costas loop basically mixes the quadrature LO signals with the incoming RF, and further blends the two results to obtain the phase relationship. It provides reasonable performance at lower frequency bands [7]. At millimeter-wave bands, however, the original Costas loop design is quite difficult to apply, because generating very high-speed quadrature signals is not at all trivial. Meanwhile, clock tuning, loop filter design, and layout arrangement become challenging as well. The limited capture range requires a frequency acquisition loop to bring the frequency of the voltage-controlled oscillator (VCO) to the desired value before conducting phase locking. To the best of the authors' knowledge, these issues have never been discussed. We here propose our novel Costas loop topology in this section.

In order to optimize a coherent demodulation, the LO signal must be in-phase with the RF signal. Following the idea of a mixer-based linear clock and data recovery (CDR) circuit [8], we can obtain the phase relationship by using an additional mixer. Fig. 2 illustrates this idea. Denoting input data, operation frequency, phase error, and mixer conversion gain as  $D_{in}(t), \omega$ ,  $\Delta\theta$ , and A, respectively, we get two results  $AD_{\rm in}(t)\cos(\Delta\theta)$ and  $AD_{in}(t)\sin(\Delta\theta)$  at the outputs of the two down-conversion mixers, given that the second-order harmonics are fully suppressed by the low-pass RC filters. After further multiplication, we have an output that is proportional to  $\sin(2\Delta\theta)$ . With enough gain, if a feedback loop is placed around the VCO (and/or the RF input), the phase error  $\Delta \theta$  will be locked at the origin. This leads to a maximum signal-to-noise ratio (SNR) for the demodulation. Since this approach is purely linear and is operated in near-dc mode, the well-defined linear phase-locked loop (PLL) model can be directly applied. As a result, the output data is obtained at node P on phase locking. This phase-locking behavior is very much like a linear CDR circuit. Owing to the relatively low interference in a pencil-beam radiation system, a stable phase detection can be achieved.



Fig. 2. Phase detection for BPSK.



Fig. 3. Frequency detection for BPSK (scaling factor not included)

But how to synchronize the frequency in the first place? Here, we resort to auto-correlation to acquire the frequency information. Suppose the frequency error between RF and LO is defined as  $\Delta \omega$ . As shown in Fig. 3, we multiply the two temporary outputs  $D_{\rm in}(t)\cos(\Delta\omega t)$  and  $D_{\rm in}(t)\sin(\Delta\omega t)$  by themselves, arriving at an output proportional to  $\cos(2\Delta\omega t)$ . Here, for simplicity we neglect scaling factor such as mixer gain. This result, together with  $\sin(2\Delta\omega t)$  that we just obtained from the phase detector (PD), reveals frequency information, that is, these two signals (or equivalently their squared version  $V_{\rm E}$  and  $V_{\rm F}$ ) are separated by 90°. Whether  $V_{\rm E}$  is leading or lagging  $V_{\rm F}$  depends on the polarity of  $\Delta \omega$ . In other words, we can detect the sign of frequency error by sampling one signal with the other through a D-flip-flop. The output subsequently drives another V-to-Iconverter, which pumps proportional current into the loop filter. Note that we do need limiters here to sharpen the transition of  $\cos(2\Delta\omega t)$  and  $\sin(2\Delta\omega t)$ , otherwise the D-flip-flop may not function properly as the beat frequency approaches zero. Meanwhile, as the phase is locked,  $V_{\rm E}$  stays high forever. We thus take  $V_{\rm E}$  to automatically turn off the frequency acquisition loop upon phase lock [9]. This function obviates complex lock detection circuitry and saves significant power.

## B. QPSK

The carrier recovery approach for BPSK transceivers can be modified to accommodate QPSK systems. As shown in Fig. 4, we add limiters and cross correlation to resolve the phase relationship embedded in the quadrature input. Suppose  $V_{\rm RF}(t) = D_{\rm I}(t)\cos(\omega t + \Delta\theta) + D_{\rm Q}(t)\sin(\omega t + \Delta\theta)$ , where  $D_{\rm I}(t)$  and  $D_{\rm Q}(t)$  denote the input data, and LO signals  $V_{\rm LO,I} = \cos(\omega t)$  and  $V_{\rm LO,Q} = \sin(\omega t)$ . We therefore have two tentative outputs  $V_{\rm D}$  and  $V_{\rm E}$ , respectively, after mixing and limiting. These two signals are further mixed with each other, creating the final output  $V_{\rm F}$  which is proportional to  $\sin(\Delta\theta)$ , that is,

$$V_{\rm B}(t) = D_{\rm I}(t)\cos(\Delta\theta) + D_{\rm Q}(t)\sin(\Delta\theta) \tag{1}$$

$$V_{\rm C}(t) = -D_{\rm I}(t)\sin(\Delta\theta) + D_{\rm Q}(t)\cos(\Delta\theta)$$
(2)

for  $-45^{\circ} < \Delta\theta < 45^{\circ}$ . In such a phase region, the limiters would force  $\cos(\Delta\theta)$  to approach unity and  $\sin(\Delta\theta)$  to approach zero.<sup>1</sup> In other words,  $V_{\rm D}(t) \cong D_{\rm I}(t)$ ,  $V_{\rm E}(t) \cong D_{\rm Q}(t)$ , and  $V_{\rm F} \cong 2\sin(\Delta\theta)$ . Note that for simplicity here we omit mixer gain. A V-to-I converter is placed following the phase detector to adjust the VCO's frequency (and its phase) accordingly. Once again, we obtain a sinusoidal input-output characteristic (Fig. 4), and the approximately linear behavior in the vicinity of origin makes the loop a linear PLL.

The frequency-locking mechanism for QPSK is also shown in Fig. 5. Again, by means of examining the cross and autocorrelation of  $V_{\rm B}$  and  $V_{\rm C}$  twice, we obtain an output containing

<sup>&</sup>lt;sup>1</sup>A limiter "digitizes" the input signal. Voltage higher than the middle-line threshold is considered logic 1 (otherwise logic 0).



Fig. 4. Phase detection for QPSK.



Fig. 5. Frequency detection for QPSK.

the frequency error information. By the same token, if the signal magnitude is given by  $A_V$ , we have

$$V_{\rm G}(t) = -A_{\rm V}\cos(4\Delta\omega t) \tag{3}$$

$$V_{\rm H}(t) = A_{\rm V} \sin(4\Delta\omega t). \tag{4}$$

It can be easily derived from the self-explanatory illustration in Fig. 5. The frequency error is obtained by sampling the two quadrature signals with each other. The final frequency detection result  $V_{\rm FD}$  drives another V-to-I converter to tune the frequency. Note that  $V_{\rm K}$  can be used for automatic shut-off function since it stays low when the phase locking is achieved.

## **III. TRANSCEIVER ARCHITECTURE**

Although theoretically attainable, the realization of such Costas loops involves many physical issues. We implement two transceivers and present the two cases separately.

## A. BPSK

Fig. 6(a) depicts the BPSK transmitter, which includes a fullrate frequency synthesizer providing 84-GHz carrier frequency. A Gilbert-cell-based modulator [10] with rail-to-rail data swing is responsible for 180° phase modulation, and a power amplifier delivers the RF output to antenna. Such a full-rate direct modulation saves significant power and design complexity. The 84-GHz VCO and first divider design will be illustrated in Section IV-C. To further optimize power and performance, the second to fourth stages in the divider chain are made of current



Fig. 6. BPSK transceiver architecture. (a) Tx. (b) Rx.

mode logic (CML) static topology [11], whereas the last three stages are realized as true single phase clock (TSPC) structure [12]. A typical type-IV phase and frequency detector (PFD) is adopted in this prototype for simplicity. Second-order loop filter is implemented on chip.



Fig. 7. QPSK transceiver architecture. (a) Tx. (b) Rx. (c) Output data under I/Q mismatches.

The BPSK receiver is also illustrated in Fig. 6(b). To facilitate quadrature LO signals, we use heterodyne architecture in the receiver. The RF signal captured from the antenna gets amplified by an LNA with gain of 18.5 dB and mixed with  $LO_1$ (74.7 GHz) for down conversion. After amplification, the intermediate frequency (IF) signal enters the PD for phase locking. The  $LO_2$  frequency is chosen to be one eighth that of the  $LO_1$ , that is, the VCO operates at 74.7 GHz, leaving the  $\div$ 8 clocks for  $LO_2$ . Note that the two phase-locking loops through  $LO_1$  and LO<sub>2</sub> are actually constructive, i.e., they approach phase locking in the same direction. Frequency acquisition at power-up is also included through the frequency detection loop. The frequency detector (FD) and PD loops employ two independent V-to-I converters, where  $(V/I)_2$  provides 4 times higher current than  $(V/I)_1$  so that the FD loop dominates in frequency acquisition mode. The V-to-I converter design follows that in [13].

## B. QPSK

The QPSK transmitter is shown in Fig. 7(a). In this prototype, the carrier frequency (87 GHz) slightly deviates from the desired value (84 GHz), due to an inaccurate device model, which does no harm to the proof of concept.<sup>2</sup> Following the 1:8 frequency

ratio scenario, we have an integer-N frequency synthesizer providing 77.3-GHz LO frequency and 9.7-GHz IF signals. The synthesizer contains a fundamental frequency VCO, a ÷8 circuit followed by a ÷16 one, a type-IV PFD, a charge pump, and an on-chip second-order loop filter. The QPSK modulator generates quadrature phases of the 9.7-GHz IF clock and sends them to the up-conversion mixer to create the 87-GHz RF signal. Since the horn antenna is in single-ended format, we place an on-chip balun to convert differential RF inputs into single-ended mode. With the input capacitance of  $M_1$  and  $M_2$  absorbed into the mixer's resonance network, the balun achieves conversion loss of only 1.4 dB. The simulated output return loss of the balun buffer is shown in Fig. 7(a) as well. Maximum  $S_{22} < -20$  dB reveals good output matching to the PA. An on-chip mm-wave PA delivers the RF signal to the antenna for radiation. Building block details will also be illustrated in Section IV. A matched microstrip line is laid out on board to convert transmission line to waveguide, which will be shown in Section VI.

The QPSK receiver follows the same frequency arrangement as the transmitter. As illustrated in Fig. 7(b), the receiver is composed of an LNA, a down-conversion mixer, an IF amplifier, a 77.3-GHz VCO, and the  $\div 8$  circuits. Again, the two constructive loops (through LO<sub>1</sub> and LO<sub>2</sub>) help each other to lock the phase. Once the phase is locked, the two demultiplexed data can

<sup>&</sup>lt;sup>2</sup>Actually, the QPSK architecture proposed here provides natural frequency alignment between Tx and Rx.



Fig. 8. Simulation of locking behavior for (a) BPSK and (b) QPSK TRxs.



Fig. 9. (a) LNA. (b) IF amplifier. (c) Down-conversion mixer. (d) BPSK modulator.

be obtained at nodes D and E. The PD and FD are responsible for phase and frequency detection as described in the previous section. In circuit realization, the finite I/Q mismatch would cause jitter in the output data. As shown in Fig. 7(c), output data jitter increases from 55 ps (perfectly balance I/Q) to 60 ps (5% mismatch) and 70 ps (10% mismatch).

In addition to higher data rate, another important advantage of QPSK structure is that the transmitter's VCO design can be reused in the receiver, eliminating possible frequency offset between Tx and Rx.<sup>3</sup> Again, the automatic FD shut-down function is preserved, and  $(V/I)_2$  carries 4 times larger current than the  $(V/I)_1$  does to ensure a good frequency acquisition procedure. All blocks are implemented on chip including the loop filter. To verify the locking behavior, we plot the simulated control voltage around start-up points for both BPSK and QPSK transceivers in Fig. 8. Phase locking can be achieved within approximately 3.2  $\mu$ s and 150 ns, respectively, if the loop filters have no charge initially (i.e.,  $V_{ctrl} = 0$  at t = 0).

## **IV. BUILDING BLOCKS**

Here, we introduce circuit details of building blocks and their design considerations.

## A. LNA

Fig. 9(a) shows the LNA design. Modified from [14], three gain stages are cascaded with conjugate matching in between to achieve power gain of 18.5 dB. Since the skin depth at 87 GHz (0.22  $\mu$ m) is larger than the thickness of metal 1 (0.18  $\mu$ m), we shunt two bottom layers (M1 and M2) as a ground plane to reduce leakage.  $M_1$  and  $M_2$  are laid out with shared junction to minimize the parasitic capacitance [15].

<sup>&</sup>lt;sup>3</sup>Of course, external issues such as temperature difference between Tx and Rx are not considered here.



Fig. 10. (a) VCO and first frequency divider. (b) Power amplifier. (c) QPSK modulator.

## B. IF Amplifier, Mixer, and BPSK Modulator

The IF amplifier utilizes five stages with ac-coupling and self-biasing to suppress offset, and under-damped loading is used to increase the gain while saving power [Fig. 9(b)] [16]. The first and fourthe stages incorporate local feedback resistor  $R_f$  to fulfill self-biasing input, whereas the second, third, and fifth stages employ regular differential pair structure with inductive peaking. Overall power consumption for the five stages is equal to 25 mW. The down-conversion mixer is realized as shown in Fig. 9(c) [17], where the gain stage  $(M_1 - M_2)$  and switching stages  $(M_3 - M_8)$  are separated with ac-coupling in between. The single-ended RF signal is sent to one commonsource node P, whose parasitic capacitance is absorbed into the matching network. This topology manifests itself in advanced CMOS technology, where the supply voltage is as low as  $1.0 \sim$ 1.2 V. The mixer gain is given by 4.5 dB. The BPSK modulator is illustrated in Fig. 9(d), where a Gilbert-cell-based structure [10] is used but with rail-to-rail data switching and no tail current. Only a single end is taken out as RF output since the PA is single-ended as well. Consuming 16 mW of power, this modulator presents saturated power of 4 dBm. Other possible realization can be found in [18]. Simulated output waveform under data transition is also shown in Fig. 9(d).

## C. VCO and First Divider

The VCO and the first frequency divider are shown in Fig. 10(a). The VCO incorporates a cross-coupled LC oscillator with thick-oxide (5.6 nm) varactors, which is followed by two tuned amplifiers as buffers. The 84- [BPSK Tx, Fig. 6(a)], 74.7- [BPSK Rx, Fig. 6(b)], and 77.3-GHz (QPSK TRx, Fig. 7) VCOs achieve tuning range of 0.9, 1.0, and 1.0 GHz, respectively. Note that the Tx and the Rx VCOs in the QPSK system are made in identical sizes due to our frequency arrangement. The first frequency divider is realized as a direct injection-locked topology, where the injection signal is ac-coupled to the gate of the switch  $(M_7)$ . Dummy loadings are carefully added to balance the parasitics. The minimum divider lock range in the two TRx prototypes here is approximately equal to 3.8 GHz with  $V_b = 0.8$  V. The lock range and self-oscillation frequency of the divider with respect to  $V_b$  are also plotted in Fig. 10(a). Simulation shows that locking can be ensured over PVT variation by setting  $V_b = 0.8$  V.

## D. Power Amplifier and QPSK Modulator

The power amplifier design [Fig. 10(b)] is also modified from that in [14]. To ensure stability, a local bypass network is placed at each of the five gain stages, where  $R_1$ ,  $C_2$  are used to lower



Fig. 11. (a) 2:1 selector for QPSK. (b) Up-conversion mixer.



Fig. 12. (a) Costas loop behavior model. (b) Input-output characteristic for both cases.

both the impedance and quality factor Q of the network. The PA is designed to have power gain of 14.8 dB while consuming 129 mW from a 1.2-V supply.

The QPSK modulator is shown in Fig. 10(c). A demultiplexer parallelizes the input data into two half-rate data streams and a mapping logic rearranges the data sequences. After retiming, two outputs are created, controlling the IF 2:1 selector and the multiplier. As a result, 9.7-GHz clocks with four possible phases are produced. Note that the multiplier is placed after the selector to suppress the I/Q mismatch, because the distance of separate I/Q paths can be minimized by doing so.

#### E. 2:1 Selector, Multiplier, and Up-Conversion Mixer

The 2:1 selector and multiplier are shown in Fig. 11(a). They are modified from standard Gilbert cells with tail currents taken off, i.e., making data swing rail-to-rail to speed up the operation. Again, to accommodate low supply voltage, the selection through  $D_{in1}$  and  $D_{in2}$  are separated in two stages. Switches  $M_{13} - M_{20}$  are added to block the clock feedthrough, otherwise the unwanted signal would penetrate into the output port.

Fig. 11(b) illustrates the up-conversion mixer. It has a similar structure but with inductive loading, which resonates at 87 GHz. With class-AB biasing [19] in the lower stage and ac-coupled inputs, the mixer achieves a gain of -0.4 dB, and a 3-dB bandwidth of 10 GHz while consuming 9 mW of power. The following balun buffer reveals 5-dB gain, and its 3-dB bandwidth is simulated to be 17 GHz.

#### V. COSTAS LOOP MODELING

The Costas loop behavior needs further study in our particular system to ensure stability. As depicted in Fig. 12(a), with one feedback going to RF mixer and the other to PD, we derive phase relationship as

$$\theta_{\rm ref} = \theta_{\rm in} - \theta_{\rm out} \tag{5}$$

$$\theta_{\text{out}} = K_{\text{PD}} K_{\text{I}} \left( \theta_{\text{ref}} - \frac{\nu_{\text{out}}}{8} \right) \cdot F(s) \cdot \frac{\kappa_{\text{VCO}}}{s} \tag{6}$$

where  $K_{\text{PD}}$ ,  $K_{\text{I}}$ , and  $K_{\text{VCO}}$  denote the gain of PD, V-to-I converter, and VCO, respectively. It follows that

$$\frac{\theta_{\text{out}}}{\theta_{\text{in}}} = \frac{K_{\text{PD}}K_{\text{I}}K_{\text{VCO}} \cdot F(s)}{s + \frac{9}{8} \cdot K_{\text{PD}}K_{\text{I}}K_{\text{VCO}} \cdot F(s)}$$
(7)

in which

$$F(s) = \frac{1 + sC_2R_2}{s(sC_1C_2R_2 + C_1 + C_2)}.$$
(8)

As expected, the dual PLL is equivalent to a single PLL with a divide ratio of 9/8. In our designs, we have  $K_{\rm PD} \cdot K_{\rm I} =$ 190  $\mu$ A/rad,  $K_{\rm VCO} = 1$  GHz/V for QPSK and BPSK transceivers. The simulated magnitude as a function of phase change rate is shown in Fig. 12(b), suggesting a bandwidth of 3 MHz for BPSK TRx and 80 MHz for QPSK TRx. With the coefficients used in this design, all of the poles are located in the left-hand side of the *s*-plane, implying system stability. The phase margin



Fig. 13. (a) Waveguide adapter and (b) its insertion loss, (c) assembly photograph, and (d) layout and EM simulation.



Fig. 14. (a) Die photographs and power consumption. (b) Testing setup.

for BPSK and QPSK are both 75°. Transistor-level simulation ensures sufficient phase margin over PVT variations.

## VI. CHIP-ANTENNA ASSEMBLY

The interconnection between chip and antenna is of great importance as well. Conventional structures such as probe-fed type [20], slot-coupled type [21], and quasi-Yagi antenna type [22] are usually too costly and too complex for system integration. As shown in Fig. 13(a), we realize the coplanar strip to waveguide by a transition fabricated on a single-layer dielectric substrate [23]. With the chip flipped onto the microstrip line, the RF signal is coupled to the matching element through the substrate, which is connected to the waveguide (i.e., entrance of horn) tightly. In other words, the signal power is transferred by



Fig. 15. (a) LNA and (b) PA measurement.

matched coupling. Note that no expensive or bulky matching network is required here since  $TM_{01}$  resonance of matching element can efficiently convert quasi-TEM mode (in microstrip line) to  $TE_{10}$  mode (in waveguide). The measured insertion loss of the transition is depicted in Fig. 13(b), where the maximum loss from 81–86 GHz is less than 5.3 dB. Fig. 13(c) illustrates a photograph of the assembly. The layout of waveguide transition and its EM simulation result is also illustrated in Fig. 13(d). Since the minimum allowable distance between vias on board is 150  $\mu$ m, we can hardly create a very nice resonance cavity at 84 GHz. As a result, a low-cost transition with moderate performance has been created. Actually, a total loss of 4~5 dB is introduced here. The horn antennas we used here have 24-dBi gain for each.

#### VII. EXPERIMENTAL RESULTS

The BPSK and QPSK transceivers are both fabricated in 65-nm CMOS technology. The photographs of the dies are shown in Fig. 14(a) with the chip sizes listed below. Individual blocks such as LNA and PA are measured independently. For the BPSK TRx, the transmitter consumes 202 mW and the



Fig. 16. (a) Spectrum of 84-GHz carrier in BPSK Tx. (b) Detailed phase noise plot of (a). (c) Spectrum of BPSK Tx output under data transmission (data rate = 650 Mb/s,  $2^{31} - 1 \text{ PRBS}$ ). (d) Tx output spectrum for data rate = 2.5 Gb/s ( $2^{31} - 1 \text{ PRBS}$ ).

receiver 125 mW. The QPSK transmitter dissipates 212 mW and the receiver 166 mW. All circuits are operated with a 1.2-V supply. Note that measurements such as recovered data eye diagrams are conducted by a link through the air, where the Tx and Rx are separated by 1 m. The testing setup is also shown in Fig. 14(b). Different data rate, and data patterns have been tested through a coaxial cable. Individual blocks are fully verified as well. We summarize the results in this section.

### A. Millimeter-Wave Components

The LNA performance has been presented in Fig. 15(a). The small-signal measurement shows maximum gain of 18.5 dB and a minimum input return loss of 27 dB. The -10-dB bandwidth for  $S_{11}$  and  $S_{22}$  are 10 and 11 GHz, respectively. The noise figure in the band of interest is less than 6.9 dB. The input 1-dB compression point ( $P_{1 \text{ dB}}$ ) is given by -22 dBm, and input third-order intercept point ( $IIP_3$ ) is equal to -12.5 dBm.

The PA's measurement result is also demonstrated in Fig. 15(b). It shows 14.8-dB gain and 10-dB output return loss at 87 GHz. Large-signal results reveal output  $P_{1 \text{ dB}} = 6.5 \text{ dBm}$ , saturatio2n power  $(P_{\text{sat}}) = 10.2 \text{ dBm}$ , and maximum power added efficiency  $(PAE_{\text{max}}) = 8\%$ .

## B. BPSK TRx

For BPSK chips, the 84-GHz carrier spectrum in continuous mode is plotted in Fig. 16(a), implying phase noise of -89.1 dBc/Hz at 1-MHz and -103 dBc/Hz at 10-MHz offset.

The phase noise plot of the 662-MHz reference is also demonstrated, revealing 43-dB difference between the two (due to frequency ratio). A precise phase noise plot [Fig. 16(b)] reveals an integrated rms jitter of 248 fs (integrated from 100 Hz to 1 GHz). Under BPSK modulation, the spectrum becomes a sinc function around the carrier frequency. Fig. 16(c) illustrates the spectrum with 650-Mb/s,  $2^{31} - 1$  pseudo random binary sequence (PRBS) data input. The case for higher data rate (2.5 Gb/s) is also shown in Fig. 16(d). In the receive side, the recovered clock spectrum is demonstrated in Fig. 17(a), revealing phase noise of -91.0 dBc/Hz at 10-MHz offset. Here, through the 1-m air link, the transmitted data is a PRBS of length  $2^{31} - 1$  at 2.5 Gb/s. The recovered data waveform is shown in Fig. 17(b). The data jitter measures 13.74 ps,rms and 100 ps,pp, respectively. Due to the lack of a high-speed real-time oscilloscope and wideband down-convert mixer, we were unable to measure the constellation diagram. Note that a CDR can be placed behind the receiver to further clean up the jitter.

## C. QPSK TRx

In the 87-GHz QPSK TRx, the carrier (at a fixed frequency) shows a spectrum with phase noise of -85.8 and -106 dBc/Hz at 1-MHz and 10-MHz offsets, respectively [Fig. 18(a)]. The modulated spectrum (with 650-Mb/s,  $2^{31} - 1$  PRBS data input) is also demonstrated in Fig. 18(b). Note that the bandwidth of the main lobe in QPSK is only half of that in BPSK with



Fig. 17. (a) Spectrum of recovered clock in BPSK Rx. (b) Waveform of recovered data (data rate = 2.5 Gb/s,  $2^{31} - 1 \text{ PRBS}$ , vertical scale: 100 mV/div, horizontal scale: 100 ps/div).



Fig. 18. (a) Spectrum of 87-GHz carrier in QPSK Tx. (b) Spectrum of QPSK Tx output under data transmission (data rate = 650 Mb/s,  $2^{31} - 1 \text{ PRBS}$ ). (c) Tx output spectrum for data rate =  $2.5 \text{ Gb/s} (2^{31} - 1 \text{ PRBS})$ .

the same data rate, as expected. The case for higher data rate (2.5 Gb/s) is also shown in Fig. 18(c). Due to the high loss (>40 dB) of the external harmonic mixer (Agilent 11970W), side lobes are somewhat buried in the noise floor. The recovered clock spectrum is shown in Fig. 19(a), suggesting a phase noise of -93.8 dBc/Hz at 5-MHz offset. The two recovered data

streams are depicted in Fig. 19(b). At data rate of 2.5 Gb/s, the data jitter measures 37 ps,rms and 182 ps,pp, respectively. Here, the air link distance = 1 m and antenna gain = 24 dBi. We have also conducted a BER test (through a coaxial cable link between Tx and Rx) as shown in Fig. 20. For BPSK, this stays less than  $10^{-9}$  until 2.5 Gb/s for a PRBS of length



Fig. 19. (a) Spectrum of recovered clock in QPSK Rx. (b) Waveform of recovered data (data rate = 2.5 Gb/s,  $2^7 - 1$  PRBS, vertical scale: 100 mV/div, horizontal scale: 200 ps/div).

 $2^{31} - 1$ . For QPSK, on the other hand, BER can be less than  $10^{-11}$  until 3.5 Gb/s for a PRBS of length  $2^7 - 1.4$  Note that a Costas loop actually behaves as a clock and data recovery circuit. Sequences with longer consecutive bits leads to carrier phase wander and therefore higher BER. Table I summarizes the overall performance, and compares this work with some other mm-wave transceivers recently published [18], [24], [25]. A better Tx output power can be expected in future design if we can shift the peaking frequencies of all resonance networks to exactly 84 GHz. Our circuits require no baseband circuitry, achieving data rate of at least 2.5 Gb/s while consuming only 327 and 378 mW of power for BPSK TRx and QPSK TRx, respectively. The carrier frequency offset (from desired 84 GHz to 87 GHz) degrades the PA gain and thus output power slightly.

## VIII. FUTURE WORK

These prototypes demonstrate fundamental W-band operation and baseband-less carrier recovery. However, some features of these works can be further improved. For example, the basic BPSK/QPSK modulations are vulnerable to noise or interference that may possibly cause "false" phase locking, that is, there are two and four possible phase locking points (0°/180°, 0°/90°/180°/270°) for BPSK and QPSK receiver respectively.



Fig. 20. BER for (a) BPSK and (b) QPSK TRx (through a coaxial cable link between Tx and Rx).

As SNR decreases, the outputs of I and Q channels may be altered during demodulation. This phase ambiguity issue can be resolved by placing differential coding mechanism into the TRx. Instead of using the bit patterns to set up the carrier phase, differential PSKs (e.g., DBPSK or DQPSK) use phase change (rather than the phase itself) to distinguish the Ones from the Zeros. Since this scheme depends on the difference between successive phases, the data patterns can be recovered in the receiver side correctly. An additional wideband CDR at the Rx's output may be necessary for decoding and further data jitter clean-up. Note

<sup>&</sup>lt;sup>4</sup>The loop capacitor of the QPSK TRx is realized on chip, minimizing unwanted noise coupling. Perhaps it explains why the QPSK TRx achieves better BER.

|                                   | [18]                           | [24]                                                       | [25]                                                        | This Work                                                    |                                                              |
|-----------------------------------|--------------------------------|------------------------------------------------------------|-------------------------------------------------------------|--------------------------------------------------------------|--------------------------------------------------------------|
| Modulation                        | BPSK                           | QPSK                                                       | QPSK                                                        | BPSK                                                         | QPSK                                                         |
| Carrier Recovery                  | NO                             | NO                                                         | NO                                                          | YES                                                          | YES                                                          |
| Clock Gen.                        | External                       | On-Chip                                                    | External                                                    | On-Chip                                                      | On-Chip                                                      |
| Operating Freq.                   | 55~64GHz                       | 59.6~64GHz                                                 | 70~80GHz                                                    | 83.8~85.0GHz                                                 | 86.5~87.5GHz                                                 |
| Tx<br>Phase Noise                 | N/A                            | -112dBc/Hz<br>@10MHz Offset                                | N/A                                                         | -103dBc/Hz<br>@10MHz Offset                                  | -106dBc/Hz<br>@10MHz Offset                                  |
| Tx Output Power                   | 2.4dBm                         | 10dBm                                                      | 9dBm                                                        | –1dBm                                                        | –7dBm                                                        |
| PA Power Gain                     | N/A                            | 14dB                                                       | N/A                                                         | 14.8dB                                                       | 14.8dB                                                       |
| Rx Noise Figure                   | 5.6dB                          | N/A                                                        | 7dB                                                         | 6.9dB**                                                      | 6.9dB**                                                      |
| Rx Conv. Gain                     | 14.7dB                         | N/A                                                        | 50dB                                                        | 19dB**                                                       | 19dB**                                                       |
| Antenna Gain                      | 25dBix2<br>Max. Dist.=2m       | 25dBix2<br>Max. Dist.=1m                                   | N/A<br>(Cable)                                              | 24dBix2<br>Max. Dist.=1m                                     | 24dBix2<br>Max. Dist.=1m                                     |
| Max. Data Rate                    | 6Gb/s                          | 4Gb/s,<br>2 <sup>7</sup> –1 PRBS,<br>BER<10 <sup>–11</sup> | 18Gb/s,<br>2 <sup>7</sup> –1 PRBS,<br>BER<10 <sup>–11</sup> | 2.5Gb/s,<br>2 <sup>31_</sup> 1 PRBS,<br>BER<10 <sup>-9</sup> | 2.5Gb/s,<br>2 <sup>7</sup> –1 PRBS,<br>BER<10 <sup>–11</sup> |
| Need Baseband/<br>Interface Ckt.? | YES                            | YES                                                        | YES                                                         | NO                                                           | NO                                                           |
| Power<br>Consumption              | 374mW*                         | Tx:169.5mW*<br>Rx:138.3mW*                                 | 1.2W*                                                       | Tx:202mW<br>Rx:125mW                                         | Tx:212mW<br>Rx:166mW                                         |
| Chip Area                         | <b>1.3</b> ×0.8mm <sup>2</sup> | 2.5×2.8mm <sup>2</sup>                                     | 1.9×1.1mm <sup>2</sup>                                      | Tx:0.6×0.8mm <sup>2</sup><br>Rx:0.8×1.0mm <sup>2</sup>       | Tx:0.6×1.0mm <sup>2</sup><br>Rx:0.8×1.0mm <sup>2</sup>       |
| Technology                        | 65-nm CMOS                     | 90-nm CMOS                                                 | 130-nm<br>SiGe BiCMOS                                       | 65-nm CMOS                                                   | 65-nm CMOS                                                   |

TABLE I PERFORMANCE SUMMARY

that, in our testing, false locking does not occur because the SNR is quite high (distance = 1 m), but future redesign for long-distance application must take this issue into consideration.

## IX. CONCLUSION

This work presents an analog carrier recovery technique applied to BPSK and QPSK transceivers operating at 80–90 GHz. Achieving very simple structures and very low power consumption, these chips provide promising potential for next generation wireless point-to-point data links.

#### ACKNOWLEDGMENT

The authors thank the TSMC University Shuttle Program for chip fabrication.

#### REFERENCES

- "Allocations and service rules for the 71–76 GHz, 81–86 GHz, and 92–95 GHz bands," FCC Notice of Proposed Rule Making 02-180, Jun. 2002.
- [2] J. Lee *et al.*, "A low-power low-cost fully-integrated 60-GHz transceiver system with OOK modulation and on-board antenna assembly," *IEEE J. Solid-State Circuits*, vol. 45, no. 2, pp. 264–275, Feb. 2010.
- [3] F. Lin et al., "A low power 60 GHz OOK transceiver system in 90 nm CMOS with innovative on-chip AMC antenna," in Proc. IEEE Asian Solid-State Circuits Conf., 2009, pp. 349–354.
- [4] J. Wells, "Multigigabit wireless connectivity at 70, 80 and 90 GHz," *RF Design Mag.*, vol. 29, pp. 50–59, May 2006.
- [5] J. P. Costas *et al.*, "Synchronous communications," *Proc. IRE*, vol. 44, pp. 1713–1718, Dec. 1956.
- [6] H. Takahashi *et al.*, "10-Gbit/s QPSK modulator and demodulator for a 120-GHz-band wireless link," in *IEEE IMS Dig.*, May 2010, pp. 632–635.
- [7] C. Chien *et al.*, "A single-chip 12.7 Mchips/s digital IF BPSK direct sequence spread-spectrum transceiver in 1.2 μm CMOS," *IEEE J. Solid-State Circuits*, vol. 29, no. 12, pp. 1614–1623, Dec. 1994.

- [8] J. Lee *et al.*, "A 20-Gb/s full-rate linear clock and data recovery circuit with automatic frequency acquisition," *IEEE J. Solid-State Circuits*, vol. 44, no. 2, pp. 264–275, Dec. 2009.
- [9] J. Lee et al., "A 75-GHz phase-locked loop in 90-nm CMOS technique," IEEE J. Solid-State Circuits, vol. 43, no. 6, pp. 1414–1426, Jun. 2008.
- [10] B. Gilbert, "A precise four-quadrant multiplier with subnanosecond response," *IEEE J. Solid-State Circuits*, vol. SC-3, no. 4, pp. 365–373, Dec. 1968.
- [11] B. Razavi et al., "A 13.4-GHz CMOS frquency divider," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 1994, pp. 224–225.
- [12] J. Yuan et al., "High-speed CMOS circuit technique," IEEE J. Solid-State Circuits, vol. 24, no., pp. 62–70, Feb. 1989.
- [13] J. Lee, "A 20-Gb/s adaptive equalizer in 0.13- μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 41, no. 9, pp. 2058–2066, Sep. 2006.
- [14] J. Lee et al., "A fully-integrated 77-GHz FMCW radar transceiver in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2746–2756, Dec. 2010.
- [15] B. Heydari et al., "A 60-GHz 90-nm CMOS cascode amplifier with interstage matching," in Proc. Eur. Microw. Integr. Circuit Conf., Oct. 2007, pp. 88–91.
- [16] H. Wang et al., "A 60-GHz FSK transceiver with automatically-calibrated demodulator in 90-nm CMOS," in Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2010, pp. 95–96.
- [17] B. Razavi, "A millimeter-wave CMOS heterodyne receiver with on-chip LO and divider," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 477–485, Feb. 2008.
- [18] A. Tomkins *et al.*, "A zero-IF 60 GHz 65 nm CMOS transceiver with direct BPSK modulation demonstrating up to 6 Gb/s data rates over a 2 m wireless link," *IEEE J. Solid-State Circuits*, vol. 44, no. 8, pp. 2085–2099, Aug. 2009.
- [19] J. Lee *et al.*, "A 40-Gb/s clock and data recovery circuit in 0.18-gm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2181–2190, Dec. 2003.
- [20] Y. Shih et al., "A waveguide-to-microstrip transitions for millimeter-wave applications," in *IEEE MTT-S Int. Microw. Symp. Dig.*, May 1988, vol. 1, pp. 473–475.
- [21] Y. Shih *et al.*, "Microstrip to waveguide transition compatible with MM-wave integrated circuits," *IEEE Trans. Microw. Theory Tech.*, vol. 42, no. 9, pp. 1842–1843, Sep. 1994.

- [22] N. Kaneda *et al.*, "A broadband microstrip-to-waveguide transition using quasi-yagi antenna," *IEEE Trans. Microw. Theory Tech.*, vol. 47, no. 12, pp. 2562–2567, Dec. 1999.
- [23] H. Iizuka et al., "Millimeter-wave microstrip line to waveguide transition fabricated on a single layer dielectric substrate," *IEICE Trans. Commun.*, pp. 1169–1177, Jun. 2002.
- [24] C. Marcu *et al.*, "A 90 nm CMOS low-power 60 GHz transceiver with integrated baseband circuitry," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3434–3447, Dec. 2009.
- [25] I. Sarkas et al., "An 18-Gb/s, direct QPSK modulation, SiGe BiCMOS transceiver for last mile links in the 70–80 GHz band," *IEEE J. Solid-State Circuits*, vol. 45, no. 10, pp. 1968–1980, Oct. 2010.



**Huaide Wang** received the B.Sc., M.S., and Ph.D. degrees from National Taiwan University (NTU), Taipei, Taiwan, in 2006 and 2010, respectively, all in electrical engineering.

He joined MediaTek Inc., Hsinchu, Taiwan, in 2010, where he is currently a Senior Engineer. His research interests include phase-locked loops, high-speed SerDes, and backplane transceivers.



**Pang-Ning Chen** was born in Taipei, Taiwan, in 1986. He received the B.S. degree in electrical engineering from National Chiao Tung University, Hsinchu, Taiwan, in 2008. He is currently working toward the Ph.D. degree at the Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan.

His research interests focus on millimeter-wave wireless transceivers.



Shih-Jou Huang was born in Tainan, Taiwan, in 1986. He received the B.S. degree in electrical engineering from National Tsing-Hua University, Hsinchu, Taiwan, in 2008. He is currently working toward the Ph.D. degree at the Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan.

His research interests focus on millimeter-wave wireless transceivers.



**Jri Lee** (S'03–M'04) received the B.Sc. degree from National Taiwan University (NTU), Taipei, Taiwan, in 1995, and the M.S. and Ph.D. degrees from the University of California, Los Angeles (UCLA), both in 2003, all in electrical engineering.

After two years of military service (1995–1997), he was with Academia Sinica, Taipei, Taiwan from 1997 to 1998, and subsequently Intel Corporation from 2000 to 2002. He joined National Taiwan University (NTU) in 2004, where he is currently a Professor of electrical engineering. His current

research interests include high-speed wireless and wireline transceivers, phase-locked loops, and applications, and mm-wave circuits.

Prof. Lee received the Beatrice Winner Award for Editorial Excellence at the 2007 ISSCC, the Takuo Sugano Award for Outstanding Far-East Paper at the 2008 ISSCC, the Best Technical Paper Award from Y. Z. Hsu Memorial Foundation in 2008, the T. Y. Wu Memorial Award from National Science Council (NSC), Taiwan in 2008, the Young Scientist Research Award from Academia Sinica in 2009, and the Outstanding Young Electrical Engineer Award in 2009. He has also received NTU outstanding teaching award in 2007, 2008, and 2009. He has worked as a guest editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS in 2008. He is currently a Distinguished Lecturer of the IEEE Solid-State Circuits Society (SSCS). He has been serving in the Technical Program Committees of the International Solid-State Circuits Conference (ISSCC, 2007–2010), Symposium on VLSI Circuits (2008–present), and Asian Solid-State Circuits Conference (A-SSCC, 2005–present).



Yu-Ching Yeh was born in Nantou, Taiwan, in 1987. She received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, in 2009 and 2011, respectively. She is currently working toward the Ph.D. degree at the University of California, Berkeley.

Her research interests focus on millimeter-wave wireless transceivers.