# 4×25 Gb/s Transceiver With Optical Front-end for 100 GbE System in 65 nm CMOS Technology

Ping-Chuan Chiang, Student Member, IEEE, Jhih-Yu Jiang, Hao-Wei Hung, Chin-Yang Wu, Member, IEEE, Gaun-Sing Chen, and Jri Lee, Member, IEEE

Abstract—This paper presents a fully-integrated chipset for  $4 \times 25$  Gb/s transceiver gearbox along with laser driver and photo detector front-ends. The transceiver provides 10:4 multiplexing and 4:10 demultiplexing conversion, with built-in clock generation, equalization, and amplification. The optical front-ends are realized as 4-element arrays, presenting remarkable performance with commercial vertical-cavity surface-emitting lasers and photodiodes. Feedforward equalizers and continuous-time linear equalizers are employed to compensate for loss and distortion in both electrical and optical domains. Fabricated in 65 nm CMOS technology, these chips can be lumped together as a compact module with performance exceeding typical 100 GbE standards.

*Index Terms*—100 Gb/s Ethernet, clock and data recovery (CDR), demultiplexer, gearbox TRX, laser diode driver, limiting amplifier, multiplexer, transceiver, transimpedance amplifier (TIA).

### I. INTRODUCTION

T HE ever-growing volume of Ethernet has pushed the backbone network data rate from Mb/s to tens or hundreds of Gb/s in past decades [1]. As illustrated in Fig. 1, the core network data rate in backbone averagely doubles every 1.5 years, which is even faster than the data rate improvement of server I/Os ( $2 \times$  every two years). Modern standards such as 100 Gb/s Ethernet (100GbE) will soon become mainstream products.

The 100 GbE must deal with several difficulties, which are less serious or do not exist at all in older standards: 1) the transceiver must provide bandwidth of at least 25 GHz (some standards such as OIF 28G-VSR [2] and 32G-FC [3] need more) with reasonable power consumption; 2) vertical-cavity surface-emitting laser (VCSEL) driver need to pull large current and perform fractional-bit pre-emphasis [4] at 25 Gb/s; 3) transimpedance amplifier (TIA) has to overcome the input capacitance introduced by photo-diode [5]; 4) interference, noise, and inter-lane coupling must be minimized. In addition, modern commercial optical components (i.e., VCSELs and

Manuscript received May 18, 2014; revised August 11, 2014; accepted October 23, 2014. Date of publication November 13, 2014; date of current version January 26, 2015. This paper was approved by Associate Editor Anthony Chan Carusone. This work was supported in part by Atilia Technology Inc.

J. Lee is with the Graduate Institute of Electronics Engineering and Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, and also with Atilia Technology, Taipei, Taiwan (e-mail: jrilee@ntu.edu.tw).

P.-C. Chiang, J.-Y. Jiang, H.-W. Hung, G.-S. Chen, and C.-Y. Wu are with the Graduate Institute of Electronics Engineering and Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2014.2365700

Fig. 1. Ethernet revolution.

photodiodes) are reaching their bandwidth limits around 25 GHz [6]. Overall speaking, a cost-efficient solution for 100GbE transceiver (TRX) is challenging in many aspects [7]–[11].

This paper provides a broad and deep description of the 100GbE chipset in transistor level with theoretical analysis. It consists of a 10:4 serializer (gearbox TX), a 4:10 deserializer (gearbox RX), a 4-lane laser-diode driver (LDD) array, and a 4-lane transimpedance amplifier/limiting amplifier (TIA/LA) array. Optical components are characterized and modeled as well. This prototype incorporates various circuit techniques to eliminate or at least relax the foregoing challenges, arriving at remarkable results. All circuits are designed and fabricated in 65 nm CMOS technology. A thorough testing result for individual blocks and the whole integrated system has been reported with explanation and discussion.

This paper is organized as follows. Section II introduces the overall architecture of a 100 Gb/s Ethernet, including gearbox physical layer (PHY) and optical front-ends. Sections III–VII present the design details of gearbox TX, gearbox RX, LDD array, and TIA/LA array, comparing pros and cons if more than one structure is described. Section VIII discloses the model and behavior of the optical components, and Section IX reveals the measurement results.

### II. 100 GBE ARCHITECTURE

Fig. 2 shows the architecture of a typical 100 Gb/s Ethernet transceiver, where  $10 \times 10$  Gb/s input data is serialized into  $4 \times 25$  Gb/s bit stream by a 10:4 serializer. A 4-element LDD array subsequently drives four laser diodes, emitting 850 nm

0018-9200 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.





Fig. 2. 100GbE architecture.



Fig. 3. Gearbox TX architecture.

light into four multimode fibers (MMFs). After traveling over a few hundred meters, these optical signals are captured and transformed into electrical domain by means of photo diodes and the TIA/LA array. A 4:10 deserializer recovers the clock and data, and restores the data sequences into  $10 \times 10$  Gb/s outputs. In real applications, gearbox TRX and optical frontends (i.e., LDD and TIA/LA arrays) may be separated by several inches in order to fulfill system-level integration. Thus, 50  $\Omega$  termination, sufficient driving force, and loss compensation (equalization) must be taken into consideration in each blocks. Due to the non-power-of-2 multiplexing (MUXing) and demultiplexing (DMUXing), the serializer and deserializer are actually composed of two identical 5:2 and 2:5 mapping blocks. We describe each function block in the following sections.

## III. GEARBOX TX

A serializer unit must accomplish data multiplexing from  $5 \times 10$  Gb/s to  $2 \times 25$  Gb/s and pre-emphasis output driving. These functions necessitate clocks at different frequencies and phases, which have to be created on chip as well. Fig. 3 depicts the proposed gearbox TX. It consists of a multi-frequency multi-phase



Fig. 4. Gearbox TX building blocks: (a) 5:1 MUX, (b) 1:4 DMUX, (c) TSPC latch, (d) maximum data rate and power consumption as a function of scaling factor k.

clock generator and two identical sets of 5:2 serializer. Each serializer includes five 1:4 DMUXes and four 5:1 MUXes. Modified from [12] with even larger bandwidth, the feedforward equalizer (FFE) utilizes a half-rate structure. A built-in pseudorandom binary sequence (PRBS) generator is introduced to facilitate testing, which provides five independent 10 Gb/s data inputs with length  $2^7-1$ .

The 5:1 MUX circuit is illustrated in Fig. 4(a), which is realized as a 5-input transmission-gate sampler operated by rail-to-rail data and clocks. Five true-single-phase-clock (TSPC) flipflops with a NOR gate feedback produce five 20% duty-cycle clocks  $CK_{1\sim5}$  for proper sampling. The 1:4 DMUX is realized as a typical tree structure [Fig. 4(b)], which employs TSPC latches [Fig. 4(c)] to minimize power consumption. In 65 nm CMOS technology, the minimum device size for NMOS and PMOS is W/L = 120 nm/60 nm. Equivalent NMOS and PMOS devices are ratioed to be around 1:2 to compensate different mobilities as typical digital circuits. However, to increase the bandwidth, we choose actual NMOS and PMOS devices in Fig. 4(c) to be k times larger than their minimum sizes. With balanced pull-up and pull-down strength, we obtain the bandwidth and power consumption of the TSPC latch for different scaling factor k as shown in Fig. 4(d). It is not surprising that the maximum data rate ramps up as devices become larger, since the parasitic capacitance increases in a way slower than the driving force. We select scaling factor k= 50 to reach 19 Gb/s operation [Fig. 4(d)]. Here, Fig. 4(d) is obtained under the loading condition of fan-out-of-1 (FO = 1). Two DMUX stages are properly sized to further save

power. It is worth noting that the allowable data rate eventually degrades slightly for k>50, primarily because the routing capacitance and other parasitics in bigger devices are no longer insignificant. Note that the final inverter stage in Fig. 4(c) could be possibly omitted to further save power. The inverted logic would be corrected in the CMOS to CML converter if we do so.

The built-in phase-locked loop (PLL) in gearbox TX plays an important role in output data jitter performance, since the clock jitter is imposed onto the data output directly. To minimize clock phase noise, we realize TX clock generator as a two-stage subharmonically-injection-locking PLL [13] as depicted in Fig. 5. To keep the two locking forces (i.e., loop locking and injection locking) coexisting in a harmonic way, certain delays must be added to make sure the phase difference between VCO's core current and injection current is within the maximum tolerable phase error [14]. Fortunately, fixed delays are sufficient even for severe temperature variations, as demonstrated in [14]. Here, fixed delays  $\Delta T_1 \sim \Delta T_4$  are inserted to ensure that injections always locate at save zone (approximately 187°) over process, supply voltage, and temperature (PVT) variations. Simulation shows that no "pseudo locking phenomenon" would occur for  $-20 \sim 50^\circ \mathrm{C}, \pm 10\%$  supply, and all corners variations. More details can be found in [14]. Note that a two-stage structure is necessary in such an injection-locking PLL because 12.5, 10 and 2.5 GHz clocks have to be produced.

#### IV. GEARBOX RX

Gearbox RX design (Fig. 6) follows the basic structure of our previous work [15], which incorporates pre-amplifier (with gain



Fig. 5. Clock generator in gearbox TX.



Fig. 6. Gearbox RX architecture.

of  $0 \sim 20$  dB), mixer-based clock and data recovery (CDR) circuits [16], 1:10 DMUXes and 4:1 MUXes. The CDR adopts a full-rate linear phase detection structure, which has been proven in silicon [15]. In this modified version, we employ equalizers along with pre-amplifiers to overcome possible channel loss at input.

The limiting amplifier/equalizer combination is depicted in Fig. 7(a), where stages with different boosting techniques are placed alternately to avoid early saturation or invalid switching. Conventional RC degenerated filter fails to provide large tunable range for boosting, even with the help of inductive peaking. Therefore, we need a novel structure for the boosting cells. It is well known that both series and shunt peaking can be used in a broadband amplifier to increase the bandwidth [17], but the

peaking is subject to PVT variations and is hard to control. As a result, we come up with a tunable gain/boosting stage as shown in Fig. 7(b). Here, depending on the ratio of  $I_1$  and  $I_2$ , part of the data goes through the peaking of  $L_2$  whereas the rest does not. Since the total amount of  $I_1$  and  $I_2$  is a constant ( $I_1 + I_2 = I_{SS}$ ) and their tuning is continuous, we arrive at a gain stage with tunable peaking. It is worth noting that the small-signal gain of Fig. 7(b) reaches a maximum as  $I_1 \approx I_2$ . This gain variation could be compromised in circuits such as limiting amplifiers. The filter stage is shown in Fig. 7(c), which employs inductive peaking to sustain the bandwidth along the data path. The current tuning is accomplished by using a differential pair with two diode-connected current mirrors to copy  $I_1$  and  $I_2$ . The overall boosting behavior is also shown in Fig. 7(a). With tunable gain



Fig. 7. (a) Simplified limiting amplifier/equalizer structure; (b) gain/boosting stage; (c) boosting stage.



Fig. 8. (a) Bit alignment circuit, (b) shift register.

stage, we extend the overall range from 11 to 15 dB. No dc gain is sacrificed by doing so.

In addition to the existing deskew circuit that removes phase offset within 1 bit, we further introduce a bit alignment circuit in front to fix up to  $\pm 7$  bits of misalignment due to optical length difference. Fig. 8(a) reveals the bit alignment design, where a

15 bit shift register [Fig. 8(b)] governed by 4 bit control logic stores data stream. Based on the control signal, 5 consecutive bits out of [-7,...,0,...,7] can be taken as data output and can be set to the subsequent circuit for processing. Since 100GbE only needs fiber channel of  $100 \sim 300$  m, a 15 bit phase shifter is sufficient in most cases.

Anode 30 f<sub>3dB</sub> (GHz) Cathode 20 Driving Active Source **Parasitics** Region 10 **50**Ω **33**Ω 25 fF 131 fF 94 5 2 4 6 8 10 12 IVCSEL (mA) (c)(a) 8 0 OMA (dBm)  $P_{Light}$  (mW) 6 ogic Logic 0 10 5G 2 I<sub>DC</sub> -20 10<sup>8</sup> 10<sup>9</sup> 106 107 1010 1011 2 4 6 8 10 12 IVCSEL (mA) Frequency (GHz) (d)(b)

Fig. 9. (a) Typical VCSEL and its small-signal model, (b) measured frequency response as bias current = 6 mA, (c) -3 dB bandwidth as a function of bias current, (d) VCSEL transfer function.

## V. OPTICAL COMPONENTS

The optical front-end also involves considerable circuit techniques. Before looking at LDD and TIA/LA array design, we need to investigate the behavior of VCSEL and photo detector (PD).

A typical VCSEL is shown in Fig. 9(a), where the cathode is connected to chip for current pulling due to its smaller parasitic capacitance [18]. A small-signal model is established for transient simulation, which tightly matches the measured frequency response as shown in Fig. 9(b). With bias current of 6 mA, the bandwidth is barely enough for 25 Gb/s operation.<sup>1</sup> In addition, the -3 dB bandwidth of a VCSEL increases as bias current increases and saturates as it becomes larger than 2 mA. We plot the -3 dB bandwidth for a VCSEL as a function of bias current in Fig. 9(c). A high-speed VCSEL needs a dc (standby) current ( $I_{\rm DC}$ ) as large as 3 mA to ensure fast switching time between on and off [Fig. 9(d)]. Otherwise, the VCSEL deviates from linear operation and begins to cause errors. A constant pulling current over PVT variations becomes essential here. Other than the above issues, the VCSEL driver needs to overcome the VCSEL's relaxation oscillation phenomenon at large signals. As illustrated in Fig. 10(a), a high-speed VCSEL presents quite significant ringing effect at rising and falling edges due to the exchange of energy between photons and electrons [19]. Unlike regular signal distortion caused by channel loss or reflection, these sharp humps and dents need fractional-bit pre-emphasis. For a two-tap fractional FFE with tunable delay  $\Delta T$  and pre-emphasis factor  $\alpha$  [Fig. 10(b)], we arrive at magnitude and phase response as

$$|H(j\omega)| = \sqrt{1 + \alpha^2 - 2\alpha \cos(\omega \Delta T)} \tag{1}$$

$$\angle H(j\omega) = \tan^{-1} \left[ \frac{\alpha \sin(\omega \Delta T)}{1 - \alpha \cos(\omega \Delta T)} \right].$$
 (2)

Typical VCSEL requires a compensation of approximately 2 dB, i.e.,  $\alpha = 0.25$ . Considering boosting efficiency and phase concordance, we set  $\Delta T = 20$  ps in this prototype.

The high-speed PD must be studied before developing a TIA/LA front-end. Like other discrete components, PD inevitably introduces parasitic capacitance. For modern PDs [Fig. 11(a)] with  $20 \sim 30$  GHz bandwidth, the capacitance is around  $100 \sim 200$  fF, which still causes significant bandwidth

 $<sup>^{1}</sup>$ As we will see in the following paragraphs, the whole optical link including VCSEL + Fiber + PD is only 15.33 GHz.



Fig. 10. (a) Relaxation oscillation, (b) two-tap fractional-bit boosting.



Fig. 11. (a) Photo detector and its small-signal model, (b) measured frequency response.



Fig. 12. Measured  $S_{21}$  of optical link (VCSEL + MMF + PD).

degradation. As depicted in Fig. 11(b), the -3 dB bandwidth improves from 20 to 30 GHz as the reverse-biased voltage  $(V_{\rm RB})$  moves from 0 to -4 V. It is worth noting that the breakdown voltage of the PD is around -5 V. As a result, a bandgap reference is mandatory on chip to provide a stable input common-mode level. To accurately model the optical channel, we measure the frequency response of VCSEL + 10 m multimode fiber + PD (Fig. 12). Here, the VCSEL bias current = 6 mA and PD reverse bias voltage = -3 V. The E-O-E transforming link indicates channel loss as -1.1 dB and the -3 dB bandwidth is only 15.33 GHz. It suggests the use of limiting amplifier and equalizer in the RX.

## VI. LDD ARRAY

Fig. 13 shows the LDD array design. To facilitate testing, we have four identical drivers sharing one 12.5 GHz PLL to deliver  $4 \times 25$  Gb/s output data. Each driver has a built-in  $2^7-1$  PRBS generator. In real system integration, these blocks can be removed. The two-tap fractional FFE is realized as two tail currents  $I_0$  (main cursor) and  $I_{-1}$  (pre-cursor), and the latter is governed by a 2 bit DAC for tuning. A bandgap reference is included to ensure precise current generation. To match  $50-\Omega$  input impedance of the VCSEL, the final output stage has one end terminated on chip and the other dc-coupled through bonding wire to VCSEL. A 3 mA standby current



(b)

Fig. 14. (a) TIA/LA array architecture, (b) TIA design with constant IR drop biasing, (c) boosting stage, (d) 40 Gb/s output eye @ V<sub>in</sub> = 47 mV.

 $(I_{\rm DC})$  has been pulled through  $M_7$  for proper operation.  $M_7$  is thick oxide device which can accommodate higher voltages.

## VII. TIA/LA ARRAY

The TIA/LA array employs TIAs, adaptive analog equalizers, and limiting amplifiers, as shown in Fig. 14(a). TIA must present a well-defined input level for direct dc-coupling from photo diode as well as 50  $\Omega$  input impedance matching.<sup>2</sup> The proposed TIA is depicted in Fig. 14(b), where a differential pair  $M_{1,2}$ 

8

<sup>&</sup>lt;sup>2</sup>The 50  $\Omega$  matching is not necessary if the PD is wire bonded to TIA directly. However, for discrete components or purely electrical testing, it is still useful to include it.



Fig. 15. Chip micrographs and power consumption.



Fig. 16. (a) Gearbox TX PLL phase noise, and 25 Gb/s output data with (b) 0 dB, (c) 9.5 dB boosting.

along with local feedback  $R_{\rm F}$  and  $L_{\rm D}$  forms a fixed input level of  $V_{\rm DD} - (I_{\rm SS}R_{\rm D})/2 \approx 0.8$  V. The tail current of  $M_{1,2}$  pair is biased by a constant IR drop circuit. Here, a sub-1 V bandgap circuit creates a constant voltage  $V_{\rm BG}$ :

$$V_{\rm BG} = \ln n \frac{R_3}{R_1} V_{\rm T} + \frac{R_3}{R_2} \cdot |V_{\rm BE2}|$$
(3)

where n is the device ratio between  $Q_1$  and  $Q_2$ . In this design, we set  $V_{\rm BG} = 0.7$  V as  $R_1 = 1$  k $\Omega$ ,  $R_2 = 12.5$  k $\Omega$ ,  $R_3 = 6.5$  k $\Omega$  and n = 5. Since  $V_{\rm BG}$  is fixed and  $R_4$  is made on chip, the tail current  $I_{\rm SS}$  mirrored from  $M_4$ ,  $M_5$ ,  $M_6$  and  $M_7$ also forms a constant IR drop with  $R_{\rm D}$ . In order words, input DC level is fixed with respect to  $V_{\rm DD}$ . A low dropout (LDO) regulator can be used to stabilize the supply. Meanwhile,  $R_{\rm B}$ and  $C_{\rm B}$  provides unaltered bias point to  $M_2$ . As a result, signal current  $I_{\rm in}$  flows through  $R_{\rm F}$ , converting current into voltage  $V_{\rm out}$ . Here,  $L_{\rm D}$  extends the 50  $\Omega$  matching from 23 GHz to 26 GHz, and inductive peaking is also used in the output point to further broaden the bandwidth. TIA reveals 40 dB gain while consuming 4.8 mW of power.

The adaptation of LAs is accomplished by splitting spectrum power and comparing the higher and lower parts, as described in [20]. An on-chip *RC* filter with corner frequency of 1.5 kHz forms offset cancellation loop. The boosting stage is illustrated in Fig. 14(c), which takes apart the common-source node of  $M_{1,2}$  pair and inserts tunable *R* and *C*. Each TIA/LA set achieves 72.5 dB $\Omega$  gain and 21 GHz bandwidth with 69 mW power consumption. While driven electronically, the TIA/LA itself can be operated at 40 Gb/s with large-signal gain of 68.4 dB $\Omega$  [Fig. 14(d)]. VCSEL anodes are powered by a 3.6 V supply to arrive at desired extinction ratio (approximately 4 ~ 5) and thus a symmetric optical data eye. On the other hand, the cathodes of PD array are connected to a 3.6 V supply for optimal performance. The VCSEL and PD capacitance are 350 fF and 150 fF, respectively.



Fig. 17. (a) Gearbox RX CDR phase noise. Recovered and demultiplexed data at (b) 25 Gb/s, (c) 10 Gb/s.



Fig. 18. (a) Gearbox RX jitter tolerance, (b) inter-channel interference.



Fig. 19. LDD array's output at 25 Gb/s: (a) electrical, (b) optical.

#### VIII. MEASUREMENT RESULTS

All chips are fabricated in 65 nm CMOS technology. The 4-channel gearbox TRX consumes a total power of 1.84 W (TX:  $4 \times 200$  mW, RX:  $4 \times 260$  mW) from a 1.2 V supply. The LDD and TIA/LA arrays dissipate  $4 \times 99$  mW and  $4 \times 69$  mW, respectively. Fig. 15 illustrates the die photos and power consumption of the chips. All of them are tested in a chip-on-board assembly. We present the testing results in the following paragraphs.

1) Gearbox TX: Fig. 16(a) depicts the phase noise of the 12.5 GHz PLL in gearbox TX. With the help of injection locking, the rms jitter integrated from 100 Hz to 1 GHz has been reduced from 384 to 187 fs. The reference spur is -80 dBc. Fig. 16(b) and (c) reveals the output data at 25 Gb/s with 0 and 9.5 dB boosting. The single-ended output swing under normal operation (0 dB boosting) is 200 mV<sub>pp</sub>, while the 20%  $\sim 80\%$  rise/fall time is 10.5 ps. The data jitter is 1.0 ps,rms and 6.2 ps,pp, respectively. The TX is capable of delivering



Fig. 20. Measurement results for TIA/LA array: (a) I/O matching, (b) transimpedance gain.



Fig. 21. (a) TIA/LA output with optical input at -5.5 dBm OMA, (b) TIA/LA sensitivity and bathtub curve at -5.5 dBm OMA.

25 Gb/s data to LDD array through a 10-cm PCB channel on a regular FR4 board. The TX presents a tunable range of 24.6 to 25.7 Gb/s.

2) Gearbox RX: The spectrum of CDR recovered clock (25 GHz) is plotted in Fig. 17(a), which presents a phase noise of -98.3 dBc/Hz at 1 MHz offset. The sub-rate clock (10 GHz) generated on-chip is also recorded for comparison. The recovered data at 25 Gb/s and 10 Gb/s are depicted in Fig. 17(b) and (c), revealing rms jitter of 1.01 ps and 2.34 ps, respectively. Jitter tolerance is also measured with bit-error rate (BER) =  $10^{-12}$  as threshold. It exceeds the extrapolated 802.3ae Mask by at least 0.1 UI<sub>pp</sub> [Fig. 18(a)]. Inter-channel interference measurement suggests an input power penalty of only 0.65 dB when the adjacent lane is turned on [Fig. 18(b)]. This result improves significantly as compared with our previous work [16], primarily owing to better substrate isolation. The RX reveals a tunable range of 24.94 to 25.2 Gb/s, and BER < $10^{-12}$  is achieved.

3) LDD Array: Fig. 19 shows the electrical (directly to oscilloscope) and optical (through VCSEL and optical probe) outputs at 25 Gb/s, suggesting output power of 250 mV<sub>pp</sub> and 1.63 mW, respectively. The jitter measures 1.4 ps,rms and 9.2 ps,pp for the electrical output, and 1.6 ps,rms and 11.4 ps,pp for the optical. Here, the LDD jitter is captured with all four

channels turned on. Jitter could be further reduced if only one channel is activated.

4) TIA/LA Array: Fig. 20(a) shows the measured  $S_{11}$  and  $S_{22}$  for TIA/LA array, presenting at least <-12 dB return loss from dc to 60 GHz. The transimpedance gain measures 72.5 dB $\Omega$  and the -3 dB bandwidth is equal to 21 GHz [Fig. 20(b)]. The low-frequency cutoff corner is less than 1.5 kHz. End-to-end test for optical link is also conducted, where the VCSEL emits 1.2 mW 850 nm light into a 100 m MMF for TIA/LA to pick up. With 0.5 W/A VCSEL efficiency and 0.47 A/W PD responsivity, we obtain the output data as illustrated in Fig. 21(a). The input referred noise is equal to 4.2  $\mu$ A<sub>rms</sub>. TIA/LA signal integrity is demonstrated in Fig. 21(b). It suggests a sensitivity of -6.8 dBm optical modulation amplitude (OMA) at 25 Gb/s for BER =  $10^{-12}$ , and  $\pm 0.12$  UI opening in the bathtub curve. Performance of this work has been summarized in Table I and compared with that of other state-of-the-arts recently published.

## IX. CONCLUSION

Advanced CMOS technologies together with broadband circuit techniques have pushed optical and electrical transceivers toward high data rate and lower power dissipation. E/O

TABLE I Performance Summary

| Gearbox TX                                                         |                                                                                                    |                             |                              |  |
|--------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|-----------------------------|------------------------------|--|
|                                                                    | This Work                                                                                          | [8]                         | [11]                         |  |
| Clock Phase Noise<br>(Carrier = 12.5GHz,<br>@ 1-MHz offset)        | 116dBc/Hz                                                                                          | 106dBc/Hz                   | N/A                          |  |
| Integrated Clock Jitter<br>(12.5GHz)                               | 187fs<br>(100Hz~1GHz)                                                                              | 429fs<br>(10kHz~100MHz)     | 350fs<br>(100kHz~1GHz)       |  |
| Output Data Jitter<br>@ BER = 10 <sup>-12</sup>                    | 1.01ps,rms<br>6.67ps,pp                                                                            | 3.3ps,pp                    | N/A                          |  |
| D <sub>out</sub> Rise/Fall Time (20%-<br>80%)                      | 10.5ps                                                                                             | 25ps                        | 22ps                         |  |
| Operation Range                                                    | 24.6 ~ 25.7Gb/s                                                                                    | 25.78Gb/s                   | 25~28Gb/s                    |  |
| D <sub>out</sub> Swing<br>(Single-ended)                           | 200mV <sub>pp</sub>                                                                                | 355mV <sub>pp</sub>         | 276mV <sub>pp</sub>          |  |
| Power Consumption                                                  | 200mW/lane                                                                                         | 282mW/lane                  | N/A*                         |  |
| Gearbox RX                                                         |                                                                                                    |                             |                              |  |
| Recovered Clock Phase<br>Noise(Carrier=12.5GHz,<br>@ 1-MHz offset) | –98.3dBc/Hz                                                                                        | N/A                         | N/A                          |  |
| Recovered Data Jitter<br>(25Gb/s, @ BER = 10 <sup>-12</sup> )      | 1.01ps,rms<br>6.22ps,pp                                                                            | N/A                         | N/A                          |  |
| Power Penalty                                                      | 0.63dB                                                                                             | N/A                         | N/A                          |  |
| Jitter Tolerance                                                   | 0.2UI <sub>PP</sub>                                                                                | N/A                         | 0.4UI <sub>PP</sub>          |  |
| Tolerable phase error<br>between lanes                             | ±7bit                                                                                              | N/A                         | N/A                          |  |
| Operation Range                                                    | 24.94 ~ 25.2Gb/s                                                                                   | N/A                         | N/A                          |  |
| Power Consumption                                                  | 260mW/lane                                                                                         | 217mW/lane                  | N/A*                         |  |
| Power supply, Area and Process                                     |                                                                                                    |                             |                              |  |
| Supply Voltage                                                     | 1.2V                                                                                               | 1.0V/1.8V                   | N/A                          |  |
| Chip Area                                                          | TX: $1.2 \times 1.1 \times 2mm^{2 \star \star}$<br>RX: $1.9 \times 1.3 \times 2mm^{2 \star \star}$ | $6.3 	imes 3.7 \text{mm}^2$ | $2.4 \times 1.5 \text{mm}^2$ |  |
| Technology                                                         | 65nm CMOS                                                                                          | 65nm CMOS                   | 40nm CMOS                    |  |

| LDD Array                                       |                                                                     |                                                                            |                          |  |
|-------------------------------------------------|---------------------------------------------------------------------|----------------------------------------------------------------------------|--------------------------|--|
|                                                 | This Work                                                           | [21]                                                                       | [22]                     |  |
| Laser Output Power<br>(OMA)                     | 1.63mW                                                              | 0.77mW                                                                     | N/A                      |  |
| Output Data Jitter<br>@ BER = 10 <sup>-12</sup> | 1.6ps,rms*<br>11.4ps,pp*                                            | 16ps,pp                                                                    | 25.2ps,pp                |  |
| Commercial VCSEL?                               | Yes                                                                 | No                                                                         | No                       |  |
| VCSEL parasitic<br>capacitance                  | 350fF                                                               | N/A                                                                        | N/A                      |  |
| VCSEL Efficiency                                | 0.5W/A                                                              | N/A                                                                        | N/A                      |  |
| Power<br>Consumption                            | 99mW/lane                                                           | 46mW/lane                                                                  | 208mW/lane               |  |
| TIA/LA Array                                    |                                                                     |                                                                            |                          |  |
| Transimpedance<br>Gain                          | 72.5dBΩ                                                             | 78.3dBΩ**                                                                  | 71.2dBΩ                  |  |
| Output Data Jitter<br>@ BER = 10 <sup>-12</sup> | 3.1ps,rms<br>18.7ps,pp                                              | 19.2ps,pp                                                                  | N/A                      |  |
| Sensitivity<br>@ BER = 10 <sup>⊣12</sup>        | -6.8dBm @ 25Gb/s                                                    | -4dBm @ 22Gb/s                                                             | N/A                      |  |
| Commercial PD?                                  | Yes                                                                 | No                                                                         | No                       |  |
| PD parasitic<br>capacitance                     | 150fF                                                               | 80fF                                                                       | N/A                      |  |
| PD Responsivity                                 | 0.47A/W                                                             | 0.55A/W                                                                    | N/A                      |  |
| Power<br>Consumption                            | 68mW/lane                                                           | 44.4mW/lane                                                                | 59mW/lane                |  |
| Power supply, Area and Process                  |                                                                     |                                                                            |                          |  |
| Supply Voltage                                  | 1.2V                                                                | 1.0V/1.6V                                                                  | 1.0V/1.8V/2.5V           |  |
| Chip Area                                       | LDD: 1.6 × 1.25mm <sup>2</sup><br>TIA/LA: 1.6 × 0.65mm <sup>2</sup> | LDD: $0.8 \times 0.17 mm^{2}$ ***<br>TIA/LA: $0.25 \times 0.39 mm^{2}$ *** | 3.6 × 5.3mm <sup>2</sup> |  |
| Technology                                      | 65nm CMOS                                                           | 65nm CMOS                                                                  | 40nm CMOS                |  |

\*Total power consumption of TX+RX is 900mW \*\*Calculated for 4x25Gb/s operation

and O/E interfaces can also be implemented and integrated in CMOS as well, demonstrating promising potential for low-cost, low-power, high-performance, and high-yield solution for 100GbE systems.

## ACKNOWLEDGMENT

The authors thank the TSMC university shuttle program for chip fabrication.

#### REFERENCES

- M. Nowell *et al.*, "Overview of requirements and applications for 40 Gigabit and 100 Gigabit Ethernet," in *Ethernet Alliance*, Beaverton, OR, USA, 2007.
- [2] Common electrical I/O (CEI)-Electrical and jitter interoperability agreements for 6 G+bps, 11 G+bps and 25 G+bps I/O, Optical Internetworking Forum, Sep. 2011 [Online]. Available: http://www.oiforum.com/public/documents/OIF\_CEI\_03.0.pdf
- [3] Fibre Channel Solutions Guide Book 2010. Fibre Channel Industry Association (FCIA), 2010 [Online]. Available: http://www.fibrechannel.org/documents
- [4] M. Maeng *et al.*, "Fully integrated 0.18 μm CMOS equalizer with an active inductance peaking delay line for 10 Gbps data throughput over 500 m multimode fiber," in *IEEE MTT-S Int. Microw. Symp. Dig.*, 2005, pp. 1845–1848.
- [5] J. F. Buckwalter *et al.*, "A monolithic 25-Gb/s transceiver with photonic ring modulators and GE detectors in a 130 nm CMOS SOI process," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2181–2190, Dec. 2003.
- [6] J. Jiang, P. Chiang, H. Hung, C. Lin, T. Yoon, and J. Lee, "100 Gb/s Ethernet chipsets in 65 nm CMOS technology," in *IEEE Int. Solid-State Circuits Conf.(ISSCC) Dig. Tech. Papers*, 2013, pp. 120–121.
- [7] G. Balamurugan et al., "A 5-to-25 Gb/s 1.6-to-3.8 mW/(Gb/s) reconfigurable transceiver in 45 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2010, pp. 372–373.

\*The optical bandwidth of oscilloscope is 19GHz \*\*Simulated result \*\*\*Core circuit area

- [8] G. Ono et al., "10:4 MUX and 4:10 DEMUX gearbox LSI for 100-Gigabit Ethernet Link," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)* Dig. Tech. Papers, 2011, pp. 148–149.
- [9] J. Bulzacchelli et al., "A 28Gb/s 4-tap FFE/15-tap DFE serial link transceiver in 32 nm SOI CMOS technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2012, pp. 324–325.
- [10] C. Menolfi et al., "A 28 Gb/s source-series terminated TX in 32 nm CMOS SOI," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2012, pp. 334–335.
- [11] M. Harwood et al., "A 225 mW 28 Gb/s SerDes in 40 nm CMOS with 13 dB of analog equalization for 100 GBASE-LR4 and optical transport lane 4.4 applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2012, pp. 326–327.
- [12] H. Wang, C. Lee, A. Lee, and J. Lee, "A 21-Gb/s 87-mW transceiver with FFE/DFE/linear equalizer in 65 nm CMOS technology," in *Symp. VLSI Circuits Dig. Tech. Papers*, 2009, pp. 50–51.
  [13] J. Lee, H. Wang, C. Chen, and Y. Lee, "Subharmonically injection-
- [13] J. Lee, H. Wang, C. Chen, and Y. Lee, "Subharmonically injectionlocked PLLs for ultra-low-noise clock generation," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2009, pp. 92–93.
- [14] J. Lee and H. Wang, "Study of subharmonically injection-locked PLLs," *IEEE J. Solid-State Circuits*, vol. 44, no. 5, pp. 1539–1553, May 2009.
- [15] K. Wu and J. Lee, "A 2 × 25 Gb/s Deserializer with 2:5 DMUX for 100 Gb/s Ethernet Applications," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, 2010, pp. 374–375.
- [16] J. Lee and K. Wu, "A 20 Gb/s full-rate linear CDR circuit with automatic freqency acquisition," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, 2009, pp. 366–367.
- [17] S. Galal and B. Razavi, "40 Gb/s amplifier and ESD protection circuit in 0.18 μm CMOS technology," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, 2004, pp. 480–481.
- [18] N. Li et al., "High-performance 850 nm VCSEL and photodetector arrays for 25 Gb/s parallel optical interconnects," in Proc. Optical Fiber Commun. Conf. (OFC), Mar. 2010, paper OTuP2.
- [19] B. Razavi, Design of Integrated Circuits for Optical Communications. New York, NY, USA: McGraw-Hill, 2002.
- [20] J. Lee, "A 20-Gb/s adaptive equalizer in 0.13 μm CMOS technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2006, pp. 92–93.

- [21] J. Proesel et al., "25 Gb/s 3.6 pJ/b and 15 Gb/s 1.37 pJ/b VCSEL-Based Optical Links in 90 nm CMOS," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2012, pp. 418–420.
- [22] T. Takemoto et al., "A 25-Gb/s 2.2-W optical transceiver using an analog FE tolerant to power supply noise and redundant data format conversion in 65 nm CMOS," in Symp. VLSI Circuits Dig. Tech. Papers, 2012, pp. 106-107.



Ping-Chuan Chiang (S'12) born in Taipei, Taiwan, in 1987. He received the B.S. degree from the Electrical Engineering Department, National Tsing Hua University (NTHU), Hsinchu, Taiwan, in 2009, and the M.S. degree from the Department of Electronic Engineering, National Chiao-Tung University (NCTU), Hsinchu, Taiwan, in 2011. He is currently working toward the Ph.D. degree at National Taiwan University (NTU), Taipei, Taiwan.

His research interests include RF techniques, clock

and data recovery circuits, wireline transceivers, and

OEIC design.

Mr. Chiang received the Outstanding Undergraduate Student Award from the Ho-Ping Power Company in 2008, and the NOVATEK Fellowship in 2010, 2013, and 2014, respectively. He also received the IEEE Solid-State Circuits Society (SSCS) Student Travel Grant Award (STGA) from the 2013 ISSCC.



1987. He received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 2009. He received the M.S. degree from the Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan, in 2012. He is now with the SerDes design group of MediaTek. His research interests include phase-locked loops and wireline transceivers.





Chin-Yang Wu (M'14) was born in Taipei, Taiwan, in 1989. He received the B.S. degree in electrical engineering from National Tsing-Hua University, Hsinchu, Taiwan, in 2010. He received the M.S. degree from the Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan, in 2014.

He is now with the PLL IC design group of TSMC. His research interests focus on wireline backplane transceiver designs.

Gaun-Sing Chen was born in Taipei, Taiwan, in 1989. He received the B.S. degree in electrical engineering from National Tsing Hua University, Hsinchu, Taiwan, in 2010. He received the M.S. degree in the Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan, in 2014.

His research interests include phase-locked loops and wireline transceivers.



Jri Lee (M'03) received the B.Sc. degree in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 1995, and the Ph.D. degree in electrical engineering from the University of California at Los Angeles (UCLA), Los Angeles, CA, USA, in 2003.

He joined National Taiwan University (NTU) in 2004, where he is currently Professor of electrical engineering.

Prof. Lee received the Beatrice Winner Award at the 2007 ISSCC, the Takuo Sugano Award at the

2008 ISSCC, 10-Year Author-Recognition Award at 2013 ISSCC, and other international and domestic awards. He also received the NTU Outstanding Teaching Award in 2007, 2008, and 2009. He has served on the Technical Program Committees of ISSCC from 2007 to 2010, and the Symposium on VLSI Circuits (2008-present). He was a guest editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS in 2008. He served as a Distinguished Lecturer of the IEEE Solid-State Circuits Society from 2011 to 2013.



Hao-Wei Hung was born in Taipei, Taiwan, in 1988. He received the B.S. degree in electrical engineering from National Taiwan University, Taiwan, in 2010. He received the M.S. degree in the Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan, in 2013.

He is now with the mixed-mode IC design group of EZ Semiconductor. His research interests include phase-locked loops and wireline transceivers for broadband data communication.