Study of Subharmonically Injection-Locked PLLs

Jri Lee, Member, IEEE, and Huaide Wang

Abstract—A complete analysis on subharmonically injection-locked PLLs develops fundamental theory for subharmonic locking phenomenon. It explains the noise shaping phenomenon, locking range and behavior, PVT tolerance, and pseudo locking issue. All of the analyses are verified by real chip measurements. Two 20-GHz PLLs based on the proposed theory are designed and fabricated in 90-nm CMOS technology to demonstrate the superiority and robustness of this technique. The first chip aims at low-noise/low-power/high-divide-ratio design, achieving 149-fs rms jitter (integrated from 100 Hz to 1 GHz) while consuming 38 mW from a 1.3-V supply. The second prototype shoots for the lowest noise performance, presenting 85-fs rms jitter (the same integration interval) with a power dissipation of 105 mW. The jitter generation (from 50 kHz to 80 MHz) measures 48 fs, which is at least twice as small as that of any other known circuits.

Index Terms—Injection locking, phase-locked loop (PLL), phase noise, rms jitter, subharmonic locking.

I. INTRODUCTION

HIGH-SPEED, low-noise clocks prove essential in many applications such as communication, data conversion, and instrumental electronics. Over the years, the phase-locked loops (PLLs) have been serving as key components in different systems, and they have evolved from simple feedback loops to sophisticated architectures (e.g., integer-N, fractional-N, and all-digital). In some applications such as optical transceivers and Ethernet systems, PLLs usually need to provide high-speed clocks with low noise (jitter) at low power, making the integer-N structure a great candidate due to its simplicity. For a typical PLL, it is well-known that the input noise (including the noise from the reference and the phase and frequency detector) and the VCO noise are shaped by a low-pass and a high-pass transfer function, respectively, when they are presented at the output. Generally speaking, an optimal noise performance can be obtained by properly selecting the loop bandwidth. For example, if the input noise is assumed flat (which is not exactly true in reality), the optimal bandwidth of the loop can be chosen as the intersection of VCO phase noise and 2N times input noise [1].

The above approach, however, suffers from an intrinsic limitation. As the VCO frequency increases, its noise begins to dominate and becomes more difficult to suppress. To quantify this issue, let us consider two similar PLLs (with different VCOs inside), running at two frequencies $f_{1}$ and $f_{2}$, respectively. Assuming $f_{2}/f_{1} = N$ and identical quality factor $Q$ for the resonators, we recognize that the two VCO phase noise lines are vertically separated by $20 \log_{10} N$ (dB) [2]. As shown in Fig. 1, we also assume the two loop bandwidths are $\omega_{BW1}$ and $\omega_{BW2}$, and the corresponding VCO phase noise at these points are $L_1$ and $L_2$, respectively. Suppose VCO phase noise is the only noise to be considered, the PLL output spectrum can be readily obtained by multiplying the VCO phase noise by the high-pass transfer function $\left( 1 - 1/\omega_{BW}^{2} \right)^{2}$. That is, the output phase noise remains flat as $L_1$ until $\omega_{BW1} < \omega_{BW2}$, and rolls off at a rate of $-20$ dB/dec beyond the loop bandwidth. On the other hand, the rms jitter is given by integrating the phase noise [3], [4]

$$J_{rms}^{2} = \left( \frac{1}{2\pi f_{c}} \right)^{2} \cdot \int_{0}^{\infty} S_{\phi}(f) df = \left( \frac{1}{2\pi f_{c}} \right)^{2} \cdot \int_{0}^{\infty} 10 \frac{L(f)}{20} df$$

which can also be normalized to one clock period

$$J_{rms,\text{norm}}^{2} = 2 \int_{0}^{\infty} 10 \frac{L(f)}{20} df \quad \text{(rad)}$$

Now, if the two PLLs in Fig. 1 are designed to present the same jitter performance (i.e., identical normalized jitter$^{2}$), we must have

$$10 \frac{L_1}{20} \cdot \omega_{BW1} = 10 \frac{L_2}{20} \cdot \omega_{BW2}.$$ (3)

Here, only the in-band noise (the shadow area) is considered for simplicity. We also assume the loop damping factors are so high that the transfer curve can be modeled as a first-order function. Since $L_1 = L_2 + 20 \log_{10}(\omega_{BW2}/\omega_{BW1}) - 20 \log_{10} N$, we obtain

$$\frac{\omega_{BW2}}{\omega_{BW1}} = N^2$$

from (3). That is, if we migrate from one standard to another that operates at a frequency $N$ times higher, the loop bandwidth needs to be raised up by a factor of $N^2$ in order to maintain the same VCO noise contribution. This requirement is difficult to achieve because 1) some standards pre-define the bandwidths mandatorily; 2) even with no restriction posed, the loop bandwidth still needs to be kept below approximately one twentieth of the reference frequency in order to ensure stability [5]; 3) a high loop bandwidth allows more noise from the phase and frequency detector (PFD) and the charge pump (CP) to come into the output. Nonetheless, at high frequencies, it gets more and

$^{1}$Here we assume the two oscillators consume equal power. In fact, circuit optimization may lead to different optimal powers for different VCO frequencies. The tank Q would vary as well.

$^{2}$It is indeed the case for OC-48 and OC-192. Both standards define the jitter generation as 0.01 UI.
more difficult to reduce the noise (or equivalently, jitter) solely by adjusting the loop bandwidth.

An alternative approach to suppress the jitter is to incorporate a multiplying delay-locked loop (MDLL) [6], [7]. It replaces every $N$th edge of the clock (usually generated by a ring oscillator) with a clean reference edge, resetting the accumulated VCO phase error periodically. While looking attractive, this technique still suffers from deterministic jitter due to the finite mismatch between the regular and the corrected cycles. The operation speed is also limited, since it may require delicate timing control and sophisticated digital calibration to improve the performance. The output frequency of recently published MDLLs spreads from several hundred megahertz to several gigahertz [8]–[10].

Our study demonstrates that the injection locking technique tends to provide an excellent solution to the foregoing difficulties. Indeed, the VCO phase noise can be dramatically reduced by injection locking [11], [12], since the low-noise source would periodically correct the VCO zero crossings. Note that this technique can hardly be applied to a simple VCO without a companion phase-locked loop (which ensures the frequency accuracy), since the locking may fail due to the narrow lock range and PVT variations [12], [13]. Meanwhile, the fundamental injection does not fit in with general-purpose applications. We need to operate the injection locking in subrate since the purpose of a PLL is to generate a high-frequency output from a low-frequency reference. The subharmonic injection locking has been proposed for years, and recently a few related works have been presented [6], [14]. Unfortunately, a complete analysis along with physical verification of important properties is still missing.

In this paper, we present the study of subharmonically injection-locked PLLs and validate our predictions with three circuits. The properties of subharmonic locking on a PLL will be thoroughly examined by means of the testing chip from our previous work of [13]. Based on injection-locking technique and originally designed as a burst-mode CDR, this chip is perfectly suitable for such a study because it can also be operated as a subharmonically injection-locked PLL. With the locking behavior fully understood, we have designed and fabricated two additional 20-GHz clock generators in 90-nm CMOS technology. Targeting both low jitter and low power, chip A realizes the subharmonic injection locking in two steps with specially designed building blocks. Multiplying the 1-GHz reference to the 20-GHz output, this cascade architecture achieves phase noise of $-113$ dBc/Hz at 1-MHz offset and 149-fs rms jitter while consuming only 38 mW from a 1.3-V supply. Designed to achieve the lowest jitter, chip B multiplies the 2.5-GHz reference by a factor of 8 to obtain the 20-GHz output. It incorporates single-step subharmonic injection in both reference edges, providing phase noise of $-123$ dBc/Hz at 1-MHz offset and 85-fs rms jitter with 105-mW power consumption. To the best of the authors’ knowledge, the measured jitters of these two prototypes are better than that of any other PLLs ever published with similar operation frequency.

This paper is organized as follows. Section II describes the subharmonic locking phenomenon, providing theoretical analysis and physical validation of the properties. Sections III and IV present the designs of the two 20-GHz clock generators, including the circuit structures and building blocks. Section V summarizes the results of the two PLLs.

II. SUBHARMONICALLY INJECTION-LOCKED PLLS

Injection locking technique has been widely used in the design of quadrature oscillators and dividers. As a modification, subharmonic injection can be used to suppress the phase noise of PLLs. Unfortunately, scientists and engineers thus far have neither paid enough attention to it nor recognized its powerful potential. We analyze the properties of subharmonically injection-locked PLLs and demonstrate the operation in this section.

To verify our analysis, we reuse our previous design (which is an injection-locked burst-mode CDR) in [13] as a testing vehicle. For convenience, we redraw the circuit in Fig. 2(a). It consists of a 20-GHz PLL, which employs a divider chain with total modulus of 64. In addition to the normal phase locking, the VCO can also be injection-locked to the edges of an independent source $C_{inj}$. Rather than injecting a random data,
here we apply a subrate signal as $CK_{\text{inj}}$ with different frequencies ($\omega_{\text{inj}} = \omega_{\text{inj}}/N$) to investigate the properties of injection-locked PLLs. A constant delay of 25 ps and an XOR gate are employed to generate pulses ($V_{\text{inj}}$) on occurrence of $CK_{\text{inj}}$ transitions, leading to a double-edge injection periodically appearing every $N/2$ cycles [Fig. 2(b)]. The design details of the blocks can be found in [13], and all the following results are obtained from measurement.

A. Noise-Shaping Phenomenon

Let us first consider a typical phase-locked loop as shown in Fig. 3(a). It is well-known that the in-band phase noise of it ($L_{\text{PLL}}$) is shaped from the free-running line of the VCO to a relatively flat response at moderate offset frequencies, and the turning point is roughly given by the loop bandwidth $\omega_{\text{BW}}$. If the oscillator is under fundamental injection locking, it can be shown [12], [15] that the phase noise within the lock range $\omega_{L}$ will be suppressed to that of the injection signal. It is thus deducible that for a subharmonic locking with a frequency ratio $N$, the phase noise inside the lock range $\omega_{L}$ would be constrained to $L_{\text{inj}} + 20 \log_{10} N$, where $L_{\text{inj}}$ denotes the phase noise of the subrate injection signal $CK_{\text{inj}}$. Fig. 3(b) illustrates this phenomenon. Certainly such a noise reduction only occurs when $N$ is an integer. Since usually the lock range of an LC-tank VCO is not only small but sensitive to PVT variations, we must provide a proper control voltage such that the VCO natural frequency can always track the desired multiple of the injection frequency $\omega_{\text{inj}}$. This task is accomplished by combining the injection locking technique with a PLL, as shown in Fig. 3(c) and (d). Here, we have two situations: if $\omega_{L} > \omega_{\text{BW}}$, the whole in-band noise is drawn down to $L_{\text{inj}} + 20 \log_{10} N$ (dB), leading to a significant jitter reduction [Fig. 3(c)]. With the help of the PLL, the noise suppression can always be maintained around the optimal position.\(^3\) If $\omega_{L} < \omega_{\text{BW}}$, on the contrary, the noise shaping becomes less effective because the turning point $\omega_{\text{BW}}$ is not covered within the range of suppression [Fig. 3(d)]. It is intuitive that the spectrum degenerates to that of an ordinary PLL.

\(^3\)It is instinctive to say that the VCO natural frequency is exactly a multiple of $\omega_{\text{inj}}$. However, as will be shown in the following subsection, this statement is not entirely true.
cases with single-edge injection, we may further restrict the frequency ratio. As will be shown in Section III, cascading can be applied to solve this issue. For large \( N \) (e.g., \( N = 128 \)), the output phase noise degenerates to \( \mathcal{L}_{\text{PLL}} \), as expected, because the injection appears so sparse that the noise profile is barely affected.

To further quantify the accuracy of our analysis, we depict the error between the measured and calculated [from (5)] rms jitters in Fig. 6. Here, the integration interval is from 100 Hz to 1 GHz. The prediction of (5) presents sufficient accuracy with maximum error of 8%. As a comparison, the measured rms jitter under subharmonic locking is about 360 fs,\(^5\) whereas that without subharmonic injection (integration of \( \mathcal{L}_{\text{PLL}} \) over the same range) yields an rms jitter of 575 fs. The injection locking technique reduces the jitter by at least 37% for \( N \leq 16 \).

### B. Lock Range

The lock range affects the noise shaping of an injection-locked PLL significantly. It is worth noting that the lock range \( \omega_L \) degrades as \( N \) increases. Actually, if we define the oscillation and injection currents of the LC-tank VCO as \( I_{\text{osc}} \) and \( I_{\text{inj}} \) as shown in Fig. 7, the lock range of fundamental (full-rate) injection is given by \([11], [12]\)

\[
\omega_L = \frac{\omega_{\text{out}}}{2Q} \cdot \frac{I_{\text{inj}}}{I_{\text{osc}}} \cdot \frac{1}{\sqrt{1 - \frac{P_{\text{inj}}}{P_{\text{osc}}}}} \tag{6}
\]

where \( Q \) represents the quality factor of the tank. Note that both \( I_{\text{osc}} \) and \( I_{\text{inj}} \) come from averaging of large signals.

In subharmonic injection, \( I_{\text{inj}} \) needs to be modified as \( I_{\text{inj}}^{\text{eff}} = I_{\text{inj}} / N \) if the injection occurs once every \( N \) cycles. It is because the effective current becomes \( 1/N \) in magnitude. The lock range therefore becomes

\[
\omega_L = \frac{\omega_{\text{out}}}{2Q} \cdot \frac{I_{\text{inj}}^{\text{eff}}}{I_{\text{osc}}} \cdot \frac{1}{N} \cdot \frac{1}{\sqrt{1 - \frac{P_{\text{inj}}^{\text{eff}}}{P_{\text{osc}}^2}}} \approx \frac{\omega_{\text{out}}}{2Q} \cdot \frac{I_{\text{inj}}^{\text{eff}}}{I_{\text{osc}}} \cdot \frac{1}{N} \tag{7}
\]

The above analysis only deals with the case of a standalone VCO. To obtain the actual lock range of the circuit in Fig. 2(a), we short the \( V_{\text{ctrl}} \) to ground through \( S_1 \) and capture it as a function of \( N \). Note that the effective injection current here becomes \( 2I_{\text{inj}}^{\text{eff}} / N \). In this testing vehicle, however, the lock range is expected to be much smaller due to the following reasons. First, the reference PLL is always on during testing, so VCO2 may pull VCO1 significantly through substrate coupling because they are

\(^5\)This value is higher than the measured results of chip A and chip B, because the injection source (MP1803A) we use for this testing chip is more noisy.

---

\[ \mathcal{L}(\omega) = \begin{cases} L_{\text{inj}}(\omega) + 2\log_{10} N, & \text{for } \omega \leq \omega_L \text{ (Region I)} \\ L_{\text{PLL}}(\omega_{\text{inj}}, \log_{10}(\omega_{\text{inj}}/\omega_L)) + \frac{L_{\text{inj}}(\omega_L) + 2\log_{10} N \log_{10}(\omega_{\text{inj}}/\omega_L)}{L_{\text{PLL}}(\omega)}, & \text{for } \omega_L \leq \omega \leq \omega_{\text{inj}} \text{ (Region II)} \\ L_{\text{PLL}}(\omega), & \text{for } \omega \geq \omega_{\text{inj}} \text{ (Region III)} \end{cases} \tag{5} \]
located close to each other. Second, the internal noise of the circuit and possible temperature drifting would affect the locking behavior as well. Fig. 8 plots the calculated and the measured results. Using \textit{in situ} evaluation method [16], the quality factor $Q$ here is estimated to be 10. It is clearly shown that the measured lock range is 3–5 times smaller than the prediction from the over-simplified model. Nonetheless, we analyze the phase noise shaping based on the measured lock range.

C. Tolerance to PVT Variations

As demonstrated in the above analysis, the subharmonic-locking PLLs achieve similar in-band phase noise performance as $\omega_{L} > \omega_{BW}$. It implies that a very stable clock generator can be achieved, given that a clean reference clock is applicable. Fig. 9 demonstrates the output spectra under different conditions with and without the subharmonic locking.
Here, we change the supply voltage to create different loop bandwidths for the reference PLL in Fig. 2(a). It can be shown that even with a ratio of 8, the noise shaping presents almost identical results for different cases. That is, the PLL can be designed in a more relaxed way since it can tolerate a much wider range for variations. Note that the PVT deviation of $\Delta T_2$ has negligible impact on the overall performance due to the injection locking mechanism.

The injection locking technique also rejects the supply noise, if the locking can be maintained throughout the perturbation. To demonstrate this property, we provide a sinusoidal disturbance of 50 mV with different frequencies onto the $V_{DD}$ of the testing circuit. Fig. 10 shows the noise suppression of two cases. The coupled supply variation has little influence on the overall output phase noise if injection locking is imposed. Measurement suggests that, for $N \leq 8$, supply noise at any frequency below 100 MHz is substantially rejected. Fig. 11 depicts the supply noise reduction owing to subharmonic injection locking under different supply noise frequencies. For different $N$, we observe $7 \sim 22$ dB suppression. Here, we conduct the test by using an arbitrary waveform generator (AFG 3252) to create the modulated supply. The available modulation frequency is limited to 100 MHz. Simulation reveals that the circuit can reject noise of much higher frequencies.

**D. Locking Behavior**

One issue hidden behind the beauty of the injection-locked PLLs is the pulling between the two locking forces, namely, the phase locking (from the reference PLL) and the injection locking (from the injection signal). Let’s revisit the circuit in Fig. 2(a) again, and assume the injection clock $CK_{inj}$ comes in after the reference PLL has already reached a steady locking. At this moment, the phase of $CK_{out}$ is exclusively determined by the phase of $CK_{ref}$. As an independent $CK_{inj}$ arrives, finite phase error may exist between $CK_{inj}$ and $CK_{ref}$, i.e., $V_{inj}$ need not coincide with the already existing $CK_{out}$. In other words, the two forces “fight” each other and probably pull the output phase. Such a conflict may lead to quite a few uncertainties. Up to this point, quite a few questions arise. How much phase error...
can it tolerate after all? What happens if the injection signal is totally (180°) out of phase with the intrinsic $CK_{\text{out}}$? Does such a destructive injection still suppress the phase noise? Or it simply destroys the loop locking?

To answer these questions, we must go back to the injection locking theories [11], [12], [17]. Surprisingly, if finite phase error exists between the regular phase locking and the injection locking, the LC tank of the VCO would create a shift on resonance frequency to accommodate the non-zero phase difference, even though $\omega_{\text{res}}$ is exactly a multiple of $\omega_{\text{inj}}$. Following the analysis in [12], we redraw the equivalent half circuit of an injection-locked oscillator in Fig. 12. Indeed, for a subharmonically injection-locked PLL, the VCO core current $I_{\text{osc}}$ (in phase with $CK_{\text{out}}$) and $I_{\text{inj,eff}}$ (in phase with $V_{\text{inj}}$) can be separated by an angle $\theta$. Suppose in the absence of injection, the VCO steadily oscillates at $\omega_{\text{res}}$. As the injection comes in, however, the resonance frequency will no longer stay in $\omega_{\text{res}}$, but shift to some point $\omega_{\text{res}}$ as illustrated in Fig. 12. From the derivation in [12], we realize that the created phase $\phi_0$ is the angle between $I_{\text{osc}}$ and $I_T$ (the total current driving the tank), and $\theta$ (the angle between $V_{\text{inj}}$ and $CK_{\text{out}}$) reaches a maximum as $I_T$ and $I_{\text{inj,eff}}$ form a right angle. That is, at steady state, an injection-locked PLL would automatically adjust the phase relationship to maintain the stability and accomplish the noise suppression. The maximum tolerable phase error is therefore given by

$$\theta_{\text{max}} = \frac{\pi}{2} + \sin^{-1}\left(\frac{I_{\text{inj,eff}}}{I_{\text{osc}}}\right).$$

In our testing circuit, for example, we set $N = 4$ and $I_{\text{inj,eff}} = I_{\text{osc}}/4$, obtaining $\theta_{\text{max}} = 105^\circ$. That is, the maximum tolerable range for phase offset is about $210^\circ(= \pm 105^\circ)$. This effect can be easily verified as follows. Gradually adjusting $\Delta T_3$ in Fig. 2(a), we observe the change of the output spectrum. The recorded jitter for different $\Delta T_3$ is shown in Fig. 13(a). As expected, the rms jitter stays low ($\approx 360$ fs) for approximately $210^\circ$, and goes up dramatically outside the stable region. It fully validates the prediction of (8).

It is instructive to investigate the acquisition of locking. In the beginning, the phase difference between the two inputs of the PFD is very large. The reference PLL tries to neutralize this error through the normal phase locking process, regardless of the existence of injection signal. After this “coarse” locking is achieved, the injection then conducts the “fine” phase tuning, i.e., shifting the resonance frequency of the LC tank to create a proper $\theta$. Note that the two PFD inputs are now roughly aligned, so the fine tuning would take a much longer time. It is because the phase difference for the 20-GHz $CK_{\text{out}}$ (period $\approx 50$ ps) is
very small with respect to the 312.5-MHz reference period = 3.2 ns in Fig. 2(a), making the available current from the V/I converter very small. In our testing circuit, for example, the maximum pumping current coming from the V/I converter is only 0.78% (25 ps ÷ 3.2 ns) as large as its peak value. As a result, the loop presents a settling time at least 100 times longer than a regular PLL. Fig. 13(b) plots the simulated locking behavior. It can be clearly shown that the fine phase adjustment for injection locking draws a long tail (≈ 10 μs). Note that in many applications that require no frequency hopping, the long settling time is not a concern.

The above analysis implies that a proper delay ΔT1 must be maintained over the PVT variations. One would think of placing another delay-locked loop (DLL) around ΔT1 to do so. However, such a solution is plausible because (1) judging from Fig. 13(a), the jitter performance is very constant within the tolerable range of 210º; (2) adding another DLL may induce more noise and consume more power and area, let along the possible instability issue. To evaluate the robustness of the loop, we apply a fixed ΔT1 in Fig. 2 and measure the rms jitter under different conditions. As depicted in Fig. 13(c), for a temperature variation from −20°C to 65°C, the rms jitter deviates no more than 69 fs. Thus, a simple fixed delay (at most with manual tuning capability) is well sufficient in most applications.

E. Pseudo Locking Phenomenon

What happens if the desired θ exceeds θmax? Imagine a fully destructive case as shown in Fig. 14(a), where the positive pulse Vinj aligns with the valley of CKout. In such a case, the required θ is 180º. From (8), we realize that the only possible way to sustain the loop stability is to set \( I_{\text{vinj}} \) = \( I_{\text{circ}} \), which is difficult to achieve in subrate injection. As a result, the loop could never find a solution to satisfy the phase relationship, and the resonance frequency of the VCO would wander back and forth across the lock range. The output frequency is therefore modulated, creating multiple tones around the carrier. Note that it is the case even though the two inputs (\( CK_{\text{ref}} \) and \( CK_{\text{inj}} \) ) are perfectly lined up in frequency. Called “pseudo locking”, this state can never reach a real locking either in phase or frequency.

To further explain this phenomenon, we illustrate the circuit behavior in detail in Fig. 14(b). Suppose the resonance frequency of the tank, \( \omega_{\text{res}} \), locates at position ① initially. Attempting to correct the residual phase, the loop pushes it toward one end of the lock range (i.e., position ②) by lifting the control voltage. Since the desired θ can never be achieved, the VCO becomes out of lock momentarily at some frequency slightly higher than \( N\omega_{\text{inj}} + \omega_L \). The PFD soon accumulates enough phase errors, changing the polarity of the pumping current and moving \( \omega_{\text{res}} \) to position ③. Note that the progress from ② to ③ is relatively fast: if \( \omega_L / \omega_{\text{res}} \approx 1% \), it takes only 25 cycles of \( CK_{\text{out}} \) to create a 90º phase difference. Subsequently, the loop continues to adjust the phase by lowering \( \omega_{\text{res}} \) until it hits the other end of the lock range \( N\omega_{\text{inj}} - \omega_L \), which is position ④. Again, the VCO stays in free run temporarily and the resonance frequency goes back to position ① afterwards. The process repeats itself if the situation continues. Note that throughout the durations of ① → ② and ③ → ④, the VCO is prone to injection locking and the output frequency is very close to \( N\omega_{\text{inj}} \). Utilizing the control voltage variation, it is possible to estimate the cyclic period \( T_0 \) of the circulation. Neglecting the sharp transitions of ② → ③ and ④ → ①, we recognize that \( T_0 \)
is primarily determined by time for the loop capacitor $C$ [in Fig. 2(a)] to charge or discharge. The pumping current under pseudolocking, however, is hard to determine, because it depends on many other factors. Simulation shows that the effective current $I_p$ is about 20% to 40% of the peak current. Overall, we calculate $T_0$ as

$$T_0 \approx \frac{C}{I_p} \times \frac{\omega_L}{K_{VCO}} \times 2.$$  

In the testing chip, we have $I_p = 30 \mu A$, $K_{VCO} \approx 2\pi \times 1$ Grad/sec-V, and $C = 120$ pF, resulting in $T_0 \approx 0.48$ $\mu s$. With the periodic modulation imposed on the control voltage, the output spectrum reveals multiple tones around the desired frequency with a spacing of $1/T_0$. Fig. 14(c) shows the measured output spectrum under pseudo-locking operation. The spacing between adjacent tones is approximately 1.8 MHz, which is 13% lower of the estimation from (9). Such an error is reasonable for our over-simplified calculation. For example, the loop filter here is modeled as a big capacitor. The actual charging and discharging currents are subject to mismatch as well, because $V_{CRL}$ experiences a large swing here. It also causes the different heights for the peaks in Fig. 14(c). Nonetheless, (9) still quantifies this issue with moderate accuracy.

With the behavior of subharmonic locking fully understood, we are ready to build such circuits. Here, two 20-GHz PLLs are presented. The first circuit (chip A) is designed to provide a high divide ratio of 20 with lowest power consumption, achieving very low phase noise by two-step locking. The second design utilizes the double-edge locking technique [13] to accomplish the $8\times$ subharmonic injection in one step, targeting the best phase noise performance. We describe the circuit details in the following sections.

III. CIRCUIT IMPLEMENTATION OF CHIP A

The analysis in Section II implies that a stable and well-behaved subharmonic locking can be achieved, given that the frequency ratio $N$ is less than 10. If we use single-edge injection to lower the power, the maximum $N$ will be cut by half because the effective injection current is reduced by the same amount. To develop general-purpose PLLs which may have much higher divide ratios, we must realize the injection locking in multiple steps. Here we propose a two-step architecture demonstrating great performance.

A. Architecture

Chip A design is shown in Fig. 15(a), where two sub-PLLs performing $x5$ and $x4$ functions are incorporated in cascade. The
20-GHz VCO in PLL2 is subharmonically injection-locked to the 5-GHz output from PLL1, which is also injection-locked to the 1-GHz reference. Two pulse generators are responsible for creating injection signals whenever an input rising edge arrives. Fig. 15(b) illustrates the waveforms of important nodes. Since $CK_{out}$ equivalently gets realigned to a clean edge once every 4 cycles, the output phase noise is expected to follow the reference profile plus 26 dB offset within the effective range. In order to avoid the possible deterministic jitter coming from duty-cycle distortion, we employ injection on rising edges only.\(^8\) Note that two fixed delays $\Delta T_1$ and $\Delta T_2$ are placed in front of PLL1 and PLL2, respectively, providing proper delays to achieve in-phase injections. As already demonstrated in Section II, these delays can tolerate large PVT variations, and a fixed design is more than enough here. To minimize the power consumption, all dividers except the 20-GHz one in PLL2 are realized as true single-phase clocked (TSPC) topology with the device sizes properly scaled. The loop filters are integrated on chip, and the pumping currents and VCO gains for PLL1 and PLL2 are (0.2 mA, 0.9 GHz/V) and (0.4 mA, 1 GHz/V), respectively. Note that the cascade structure can be extended to more stages to accommodate larger multiplication factors, because the power and area penalty (i.e., adding low-speed PLLs) would be relatively low. For example, if we reduced the reference frequency to 250 MHz and add one more PLL stage in front, the power consumption would increase by only 12%\(^\)! The low-power blocks are introduced in the following subsection.

B. Building Blocks

Pulse Generator: A simple injection can be accomplished by passing the rising (or falling) edges of a reference into the VCO directly [6]. Such a design requires a control logic or a gating circuit to ensure the injection occurs only in the vicinity of the edges. To avoid complex design at high speed, we employ a pulse generator for the injection. Similar to that in [13], it creates pulses whose width is nominally equal to half the VCO clock period. The key point here is that the pulses are produced only on occurrence of the rising edges of the reference. Generating a 25-ps pulse with low power is not trivial, since CML buffers are usually power hungry. Fig. 16(a) and (b) illustrate the proposed low-power pulse generators, delivering subharmonic injection to the 20-GHz and 5-GHz VCOs, respectively. In the high-speed approach, we combine CMOS and nMOS logics, creating an injection signal of approximately 600 mV while consuming only 1.15 mW. The device sizes are properly chosen as labeled in Fig. 16(a), where the second inverter is half as large as the first one to sharpen the transitions and to narrow down the pulsewidth to approximately 25 ps. The speed requirement for pulse generator 1 is much more relaxed, so the CMOS logic can be used thoroughly in Fig. 16(b).

VCOs, Buffers, and Dividers: The VCO design is shown in Fig. 17(a). Here, coupling pair $M_3$-$M_4$ receives the single-ended pulses at the gate of $M_1$, and injects a corresponding current into the LC tank. The device dimensions of $M_1$-$M_2$ and $M_3$-$M_4$ pairs as well as the bias circuit $I_{b1}$, $M_5$, and $R_b$ define the injection strength. The two VCOs have the same topology but different device sizes and bias currents in order to optimize the performance. The quality factor $Q_s$ of the inductor here is estimated to be 10 (20 GHz) and 5 (5 GHz), respectively. While operated under subharmonic injection, the VCO1 and VCO2 present approximately 25-MHz and 60-MHz lock ranges, respectively, with a fixed control voltage. To save power, the generated 20-GHz clock is directly fed into the first divider stage without a buffer between. Fig. 17(b) reveals the 20-GHz divider design, which is the typical static topology with a class-AB CML flip-flop [16]. The VCO core [$M_1$-$M_2$ pairs in Fig. 17(a)] establishes a natural biasing for the gate-controlled

---

\(^8\)According to simulation results, we do not expect any duty-cycle distortion to occur.
Fig. 18. (a) Divider operation range. (b) TSPC divider. (c) CML-to-CMOS converter. (d) \( \div 5 \) circuit.

Fig. 19. (a) Chip B architecture. (b) Variable delay cell \( \Delta T_1 \).

switches \( M_{10} - M_{12} \) [Fig. 17(b)], allowing direct dc-coupling between VCO2 and the 20-GHz divider. Such an arrangement needs no extra bias and saves power. Simulation shows that the instantaneous high currents boost the divider operation frequency up to 36 GHz without using inductors.

Beyond the first stage, it is possible to use TSPC dividers to minimize the power consumption. Indeed, the advantages of using differential circuits (e.g., immunity to supply disturbance/noise) becomes less important because of the injection locking. To further justify our observation, we depict the power efficiency of static dividers based on CML and TSPC logics in Fig. 18(a). Judging from the simulation in 90-nm CMOS technology, the TSPC divider dissipates at least 7 times less power than the CML one at 10 GHz. The divider circuit together with detailed parameters is illustrated in Fig. 18(b).

It is worth noting that the CML from the 20-GHz divider output (\( \approx 600 \text{ mV} \)) must be converted to rail-to-rail signal before applied to the subsequent TSPC dividers. The converter design is shown in Fig. 18(c), where the inverter \( \text{INV}_1 \) is self-biased at the high-gain region. In contrast to the current steering topology used in [18], such a structure ensures proper operation up to 13 GHz with very low power dissipation. The \( \div 5 \) circuit in PLL2 follows the design in [19] and is shown in Fig. 18(d), where the flip-flop here is also realized as a TSPC structure. Since the noise from the PFD and the CP also gets suppressed by the injection locking, we use fundamental type IV PFD and single-ended CP (similar to that in [20]) to simplify the design and minimize the power.

IV. CIRCUIT IMPLEMENTATION OF CHIP B

As shown in Fig. 19(a), chip B is a modified version of Fig. 2(a) which realizes 8\( \times \) clock multiplication in one step. Here, we optimize the key blocks (i.e., VCO, dividers, PFD, CP, and loop filter) so as to achieve the lowest phase noise. With a non-distorted input available, the double-edge injection by means of the XOR gate is also preserved. In this chip, a delay line with wide tunable range (\( \Delta T_1 \)) is employed for testing [Fig. 19(b)]. It consists of three identical cells, which adjust the
delay by changing the tail currents of the main \((M_1-M_2)\) and the hysteresis \((M_5-M_4)\) branches. The delay control voltages \(D_{ct1}\) and \(D_{ct2}\) are created through an approach similar to the controller design in [21], forming a tuning range of \(0 \sim 0.5\) mA with opposite directions. pMOS resistors \(M_7\) and \(M_8\) are introduced to enlarge the variable range. As will be demonstrated in Section V, a fixed \(\Delta T_1\) in real products is sufficient for typical PVT variations.

V. EXPERIMENTAL RESULTS

The two 20-GHz PLLs have been fabricated in 90-nm CMOS technology and tested on chip-on-board assemblies. Fig. 20(a) shows the die photos, which measure 0.7 x 0.65 mm\(^2\) (chip A) and 0.65 x 0.5 mm\(^2\) (chip B) including pads. The testing

Fig. 20. (a) Chip micrograph. (b) Testing setup.

Fig. 21. (a) Phase noise plots of chip A. (b) Measured spectrum of its 20-GHz output.

Fig. 22. RMS jitter of the 20-GHz output of chip A under supply and temperature variations.
setup is depicted in Fig. 20(b). The reference clock also serves as a trigger signal in time-domain measurements. Here, a low-noise signal generator SMA100A provides the reference input. For 1-GHz and 2.5-GHz outputs, it presents phase noise of $-144.6 \text{ dBc/Hz}$ and $-141.9 \text{ dBc/Hz}$, respectively, at 1-MHz offset. The temperature measurement is conducted by means of a commercial thermal controller (BL-730) with large tunable range ($-30^\circ \text{C} \sim 100^\circ \text{C}$). We describe the testing results of the two prototypes as follows.

A. Chip A

The 20-GHz and 5-GHz VCOs in chip A present tuning range of 940 MHz and 700 MHz, respectively, revealing a total operation of 940 MHz. It consumes 38 mW from a 1.3-V supply, of which 12 mW dissipates in PLL$_{4}$, 23 mW in PLL$_{2}$, and 2.5 mW in pulse generators. Fig. 21(a) shows the phase noise plots of the two outputs (5 GHz and 20 GHz) of chip A with and without the subharmonic injection. The subharmonic injection inevitably induces reference spurs at 1-GHz offset of $-46 \text{ dBc}$ for 20 GHz output [Fig. 21(b)]. These out-of-band spurs have negligible influence on the overall jitter performance for most wireline applications, since only the noise integrated over the band of interest is concerned. It can be clearly shown that the phase noise curves follow the input profile closely, except a slight deviation in the vicinity of 1-kHz offset. Possible reason for such an imperfection may lie in the internal noise between the two PLL stages. The integrated rms jitter from 100 Hz to 1 GHz for the 20-GHz output is 148.9 fs. Attributed to the rolling-off tail at high frequencies, this value is even better than the predicted jitter (172 fs) by 13.8%. The reference jitter is also measured as 172 fs.

To demonstrate the immunity against PVT variations, we record the rms jitter from phase-noise integration (100 Hz $\sim$ 1 GHz) and plot it as a function of temperature and supply voltage (Fig. 22). The maximum jitter deviation ($-20^\circ \text{C} \sim 50^\circ \text{C}$, 1.3 V $\sim$ 1.5 V) is less than 25 fs, revealing a very consistent result as predicted. Note that during the testing of Fig. 22, no manual adjustment on $\Delta T_1$ and $\Delta T_3$ is required, displaying the robustness of the fixed-delay architecture.

B. Chip B

Chip B achieves an operation range of 0.5 GHz while dissipating 105 mW from a 1.5-V supply. The phase noise plot of the 20-GHz output is shown in Fig. 23, suggesting a close following to the reference profile. The spectrum reveals $-55 \text{ dBc}$ reference spurs at 2.5-GHz offset. Again, it will not be an issue for single-frequency communications. The integrated output jitter reads 84.8 fs (1.5-V supply, 25°C), superior to that of the reference (which is 93.6 fs) by 9.4%. The single-stage injection suppresses the noise tightly and nicely. We conduct the same supply and temperature testing for chip B and plot the result in Fig. 24. The maximum degradation of the rms jitter over variations of 70°C and 0.3 V is only 52 fs. Note that the 84.8-fs rms jitter is measured by probing, whereas the plot in Fig. 24 is obtained from a chip-on-board testing module.

Table I summaries the performance of these two works and some prior arts designed for similar output frequency. Our circuits achieve the best jitter performance with much lower power consumption. Note that the rms jitter of a commercially available signal generator (Agilent 83752A) running at 20 GHz is measured to be 674.8 fs. Fig. 25 characterizes the performance (jitter generation and power consumption) of a few representative PLLs published over the past decade.
TABLE I
PERFORMANCE SUMMARY

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Multiply Ratio</td>
<td>32</td>
<td>8</td>
<td>20</td>
<td>8</td>
</tr>
<tr>
<td>Phase Noise</td>
<td>-101dBc/Hz</td>
<td>-108dBc/Hz</td>
<td>-113dBc/Hz</td>
<td>-123dBc/Hz</td>
</tr>
<tr>
<td>RMS Jitter</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>100Hz-1GHz BW</td>
<td>650fs*</td>
<td>N/A</td>
<td>149fs</td>
<td>85fs</td>
</tr>
<tr>
<td>50kHz-80MHz BW</td>
<td>438fs**</td>
<td>110fs**</td>
<td>110fs</td>
<td>48fs</td>
</tr>
<tr>
<td>Supply Voltage</td>
<td>1.5V</td>
<td>-3.6V</td>
<td>1.3V</td>
<td>1.5V</td>
</tr>
<tr>
<td>Power Diss.</td>
<td>480mW</td>
<td>270mW</td>
<td>38mW</td>
<td>105mW</td>
</tr>
<tr>
<td>Lock Range</td>
<td>1800MHz</td>
<td>1800MHz</td>
<td>940MHz</td>
<td>500MHz</td>
</tr>
<tr>
<td>Loop Bandwidth</td>
<td>6.25MHz</td>
<td>3MHz</td>
<td>10MHz</td>
<td>8MHz</td>
</tr>
<tr>
<td>Chip Area</td>
<td>1.7mm²</td>
<td>1.7 x 1.7mm²</td>
<td>0.7 x 0.65mm²</td>
<td>0.65 x 0.5mm²</td>
</tr>
<tr>
<td>Technology</td>
<td>0.13-µm CMOS</td>
<td>0.18-µm SiGe BiCMOS</td>
<td>90-nm CMOS</td>
<td>90-nm CMOS</td>
</tr>
</tbody>
</table>

* Direct measurement in time domain.
** Calculated from the phase noise plots.

VI. CONCLUSION

A powerful technique substantially reducing the phase noise of general PLLs has been proposed and verified. Two 20-GHz subharmonically injection-locked PLLs, targeting different purposes, have been designed in 90-nm CMOS technology based on the proposed analysis. Achieving 149-fs and 85-fs rms jitters with very low power consumption, these two prototypes outstand themselves among the existing PLL solutions. It provides promising potential for ultra low-noise designs in communications and instrumental electronics.

REFERENCES


Fig. 25. Performance comparison between these two works and the classic PLLs.


Jri Lee (S’03–M’04) received the B.Sc. degree in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 1995, and the M.S. and Ph.D. degrees in electrical engineering from the University of California, Los Angeles (UCLA), both in 2003. His current research interests include high-speed wireless and wireline transceivers, phase-locked loops, and data converters.

After two years of military service (1995–1997), he was with Academia Sinica, Taipei, Taiwan, from 1997 to 1998, and subsequently with Intel Corporation from 2000 to 2002. He joined National Taiwan University (NTU) since 2004, where he is currently an Associate Professor of electrical engineering.

Prof. Lee is currently serving on the Technical Program Committees of the IEEE International Solid-State Circuits Conference (ISSCC), Symposium on VLSI Circuits, and Asian Solid-State Circuits Conference (A-SSCC). He received the Beatrice Winner Award for Editorial Excellence at the 2007 ISSCC, the Takuo Sugano Award for Outstanding Far-East Paper at the 2008 ISSCC, and the NTU Outstanding Teaching Award in 2007 and 2008.

Huaide Wang was born in Taipei, Taiwan, in 1984. He received the B.S. degree in electrical engineering from National Taiwan University, Taipei, in 2006. He is currently pursuing the Ph.D. degree in the Graduate Institute of Electrical Engineering, National Taiwan University. His research interests are PLLs and high-speed transceivers for wireline communication.