# Fully-integrated 40-Gb/s Pulse Pattern Generator and Bit-Error-Rate Tester Chipsets in 65-nm CMOS Technology

Guan-Sing Chen<sup>1,2</sup>, Chin-Yang Wu<sup>1,2</sup>, Chen-Lun Lin<sup>1</sup>, Hao-Wei Hung<sup>1</sup> and Jri Lee<sup>1,2</sup> <sup>1</sup>National Taiwan University, Taipei, Taiwan <sup>2</sup>Atilia Technology, Taipei, Taiwan

Abstract—Fully-integrated 40-Gb/s pulse pattern generator (PPG) and bit-error-rate tester (BERT) chipsets has been presented in 65-nm CMOS technology. Using external clock inputs, the PPG and BERT achieve full operation with ultrawide data range from 40 Mb/s to 40 Gb/s. Built-in PLL and CDR circuits are also included to provide robustness for standard specification testing.

#### INTRODUCTION I.

Broadband pulse pattern generator and bit-error-rate tester have found extensive usage in the past years, and the demand for inexpensive testing equipment has increased when 100 GbE emerges in the market. Such high-end testing equipments used to be implemented by III-V compounds or high-speed devices. Recent development on other nanometer-CMOS technology, however, demonstrates the feasibility of realizing these testing circuits in low-power, low-cost, and high-yield vehicles.

This paper presents fully-integrated PPG and BERT chipsets operating from 40 Mb/s to 40 Gb/s. In PPG, two clock modes are selectable, i.e., the input clock can be provided externally or from the built-in PLL. Four pseudorandom bit sequence (PRBS) lengths  $(2^7-1, 2^{15}-1, 2^{23}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2^{15}-1, 2$  $2^{31}-1$ ) and four injected error rates (0,  $10^{-3}$ ,  $10^{-6}$ , and  $10^{-9}$ ) are programmable. In BERT, adaptive equalization with amplification frontend and built-in CDR circuit are employed. After demultiplexing, four independent error checkers are used to examine error bits. With the help of software and FPGA control board, the BERT can perform bathtub and eye diagram plots. Both chips are fabricated in standard 65-nm CMOS technology with a 1.2-V supply.

#### II. PULSE PATTERN GENERATOR

Figure 1 shows the PPG design, which consists of a PRBS engine, an on-chip 20-GHz PLL, 4 identical FFE



Fig. 1. PPG architecture.

drivers, and a 4:1 high-speed MUX together with lower speed MUXes. Here, the input clock can be provided either externally or from the PLL. A broadband phase shifter tunes the clock until it falls in the sweet sampling spot, i.e., the center of the data eye, for the 4×10-GB/s sequences. Before the high-speed 4:1 MUX, proper delay has been inserted between stages in the subsequent dividers to fit the lowerspeed MUXes sampling points. As a result, 4 in-phase data inputs (up to 12.5 Gb/s) are fed into the 4:1 MUX for final serialization. Here, 2-tap pre-emphasis FFEs are included in the 10-Gb/s drivers, which provide maximum boosting of 8 dB at 5 GHz. Both output magnitude and boosting are programmable. The proprietary 40-Gb/s 4:1 MUX design necessitates precise skew control and broadband techniques, and it was presented in [1].

Governed by commands from PC, the PRBS engine is capable of delivering 4 different pattern lengths, namely,  $2^7$ - $1,2^{15}-1,2^{23}-1$ , and  $2^{31}-1$ , in different data rate. If necessary, users can add up injection errors into the pattern with programmable error rate of  $10^{-3}$ ,  $10^{-6}$ , and  $10^{-9}$ . All logic gates are synthesized in CMOS logics to save power with gate count greater than 100K.

The PRBS engine also provides four lower-speed outputs (up to  $4 \times 12.5$  Gb/s) for optical applications. To overcome PVT variations, bandgap reference and temperature sensing circuits must be included. The simplified biasing circuit is illustrated in Fig. 2. With 2.5-V I/O devices of  $M_1$ - $M_4$ ,  $R_1$  and  $Q_{1,2}$ , we come up with a proportional to absolute temperature



Fig. 2. (a) bandgap reference and temperature sensing circuits design, (b)  $V_{\text{Temp}}$  as a function of temperature, (c)  $V_{\text{BG}}$  as a function of temperature.

(PTAT) current I and have it mirrored by means of  $M_5$ . Using an external resistor  $R_2$  with very low temperature coefficient (±50 ppm/°C) to convert current into voltage, we obtain a voltage  $V_{\text{Temp}}$  which is linearly proportional to absolute temperature. This temperature sensor allows us to do temperature compensation either inside or outside the chip. The subsequent bandgap and constant IR drop circuits create programmable data swing. As a result, we arrive at accurate output magnitude regardless of PVT variations. Figure 2 (b) and (c) illustrate the measured  $V_{\text{Temp}}$  and  $V_{\text{BG}}$  as a function of temperature, suggesting that  $V_{\text{BG}}$  less than 0.3% deviation for  $-20^{\circ}\text{C} \sim 90^{\circ}\text{C}$ .

## III. BIT-ERROR-RATE TESTER

Figure 3 illustrates the BERT structure. It contains a limiting amplifier/equalizer front-end, a purely linear fullrate CDR with reference-less frequency detection, a 1:64 DMUX chain, and an error-check (EC) engine. The adaptive limiting amplifier/equalizer co-design provides a maximum boosting of 15 dB for 40-Gb/s data. The CDR design is modified from that in [2]. To achieve 40-Gb/s data rate, Here, we adopt vernier method, i.e., where the two data paths create an arrival time difference of  $2(\Delta T_1 - \Delta T_2) = 12.5$  ps at the XOR gate. Such a half-bit edge signal is further mixed up with the 40-GHz clock from the VCO. After the low-pass filter, a near-dc signal proportional to the data-clock phase error is obtained and is therefore fed into the  $(V/I)_1$ . As compared with [2], this structure utilizes the delay's difference rather than its absolute value, increasing CDR operation speed by a factor of 2. Note that owing to the bandwidth limitation, the XOR could only generate partial pulses on occurrence of data arrival. It is actually not an issue, as long as the mixer presents sufficient conversion gain. Taking the two middle points of the delay chains gives rise to data streams with a quarter-bit delay in between, which can be used again in the FD to distill the frequency error directly from data without a reference clock. Again, the half-rate input clock can be provided either from outside or from the recovered clock of CDR. Nonetheless, the clock is divided into lower speed in 2's power, and so is the data. Finally, the EC engine compares the bit sequence with



Fig. 3. BERT architecture.

correct PRBS and save the error in a 28-bit counter. Through the microprocessor and USB bridge, the final BER result is obtained.



Fig. 4. (a) Frequency detector, (b) 1:4 DMUX.



Fig. 5. (a) Simplified limiting amplifier/equalizer structure, (b) gain/boosting stage, (c) boosting stage.



Fig. 6. Response of (a) gain/boosting stage, (b) overall maximum boosting.

The FD design is illustrated in Fig. 4(a). Originally inspired by [3], we modify the design in [2] to achieve higher-speed operation. Again, frequency error can be examined by sampling full-rate clock with  $V_A$  and  $V_B$ , which are 6.25 ps apart from each other. Up and down signal  $Q_3$  is therefore generated and sent to  $(V/I)_2$ . Since the PD is dealing with delay's difference, the phase relationship under locked condition is uncertain. That is,  $Q_2$  upon phase locking is no longer guaranteed to be low. It stays at either high or low, based on the phase between data and clock. One example to resolve this issue is to exam whether the frequency locates in the lock-in range, and turn on or off  $(V/I)_2$  accordingly. Alternatively, adding fixed delays could neutralized the skew, letting  $(V/I)_2$  automatically turned off as [2]. The 1:4 DMUX is illustrated in Fig. 4(b). To save power, we abandon the popular tree-structure DMUX, but deserialize the data in one stage. Here, 10-GHz quadrature clocks are applied to the flipflops in sequence, and four outputs are produced.

To accommodate channel loss or distortion, a limiting amplifier/equalizer combination is placed in front to both enlarge and compensate the input data. The proposed structure is depicted in Fig. 5(a), where stages with different boosting techniques are placed alternatively to avoid early saturation or void switching. Conventional RC degenerated filter fails to provide large tunable range for boosting, even with the help of inductive peaking. It is well known that both series and shunt peaking can be used in a broadband amplifier or increase the bandwidth [4], but the peaking is subject to PVT variations and is hard to control. As a result, we come up with a tunable gain/boosting stage as shown in Fig. 5(b). Here, depending on the ratio of  $I_1$  and  $I_2$ , part of the data goes through the peaking of  $L_2$  whereas the rest does not. Since the total amount of  $I_1$  and  $I_2$  is constant and their tuning is continuous, we arrive at a gain stage with tunable peaking. It can be used to compensate for the high-frequency loss. The filter stage is shown in Fig. 5(c), which employs inductive peaking to sustain the bandwidth along the data path. Figure 6(a) shows the simulated frequency response of the gain/filter stage under two extreme current conditions. The 3-dB bandwidth can be further pushed up from 34.5 to 61.4GHz. It is worth noting that the dc gain does not change because of the constant total amount of  $I_1$  and  $I_2$ . The overall boosting behavior is shown in Fig. 6(b). With tunable gain stage, we extend the overall range from 11 dB to 15 dB. No dc gain is sacrificed by doing so.

# IV. EXPERIMENTAL RESULTS

The PPG and BERT have been designed and fabricated in 65-nm CMOS technology. Figure 7(a) shows the die photo of PPG, which occupies  $1.4 \times 1.1 \text{ mm}^2$ . It consumes 350 mW from a 1.2-V supply (excluding the 4:1 MUX), of which only 8 mW is dissipated in the PRBS engine. Figure 7(b) and (c) show the output data at 100 Mb/s and 10 Gb/s, with 1-dB boosting. The latter demonstrates rms and peakto-peak jitter of 0.8 ps and 5 ps, respectively, as the worst case. For data rate less than 9 Gb/s, the jitter can be reduced to 0.3 ps,rms and 2.2 ps,rms. Four 10-Gbs channels can be delivered simultaneously as illustrated in Fig. 7(d). The maximum skew among them is less than 2 ps. All above figures are captured with 2<sup>31</sup>-1 PRBS mode. The FFE provides pre-emphasis from 0 to 8 dB programmable at Nyquist frequency as expected. With external clock mode, the output data phase is directly modulated if the external clock is modulated. Figure 8(a) reveals a case for one 10-Gb/s output under 1-MHz 0.3-UI modulation. Together with



Fig. 7. (a) Chip micrograph of PPG (4:1 MUX not shown), (b) output data at 100 Mb/s (100 mV/div, 1.66 ns/div), (c) output data at 10 Gb/s (100 mv/div, 20.0 ps/div), (d) simultaneous  $4 \times 10$ -Gb/s output data (100mV/div, 40 ps/div).



Fig. 8. (a) 10Gb/s output data under modulation (1 MHz, 0.3 UI), (b) 40-Gb/s(100 mV/div, 10 ps/div) and 25-Gb/s(100 mV/div, 20 ps/div) outputs, (c) rms and peak-to-peak data jitter at 10 Gb/s.



Fig. 9. (a) Chip micrograph of BERT and DMUXes (b) phase noise of recovered 40-GHz clock from CDR, (c) sensitivity test.

the BERT under proper clock phase, the system performs jitter tolerance (JTOL) testing and displays in PC. The high-speed output after 4:1 MUXing is also plotted in Fig 8(b). The 40-Gb/s and 25-Gb/s data reveal rms and peak-to-peak jitter of less than 1.5 ps and 6.8 ps, respectively. The rise/fall time is around  $11 \sim 13$  ps. The rms and peak-to-peak jitter at 10 Gb/s are plotted in Fig. 8(c). Owing to the internal bandgap reference and temperature compensation circuits, the PPG/BERT achieves very stable operation from  $-10^{\circ}$ C to  $70^{\circ}$ C.

The BERT has been fabricated in 65-nm CMOS as well, and the die occupies  $1.1 \times 1 \text{ mm}^2$ . Error detectors and lowspeed DMUXes are realized in another chip. Both die photos are illustrated in Fig. 9(a). The BERT including CDR and DMUX consumes 580 mW. It is powered by a 1.2-V supply with the exception of digital circuits, which goes with a standard 1-V supply. Figure 9(b) shows the measured phase noise of the recovered 40-GHz clock, the free running VCO, and the input data, revealing a loop bandwidth around 20 MHz. The integrated jitter from 100 Hz to 1 GHz is equal to 214 fs,rms, and the recovered clock presents a phase noise of -108 dBc/Hz at 1-MHz offset. Figure 9(c) depicts the input sensitivity. The circuit achieves  $BER < 10^{-12}$  for any data pattern as long as the input is greater than 50 mV. We obtain the recovered waveforms for 10-Gb/s data and 40-GHz clock, as shown in Fig. 10, suggesting 1.73 ps,rms and 277 fs, rms jitter. After deembedding oscilloscope's jitter, the latter is very close to 214 fs. Figure 11 shows the waveforms for chip-on-board assembly with a 5-cm Rogers channel in front, which presents 12-dB loss at 20 GHz. After the limiting amplifier/equalizer, most of the ISI on the 40-Gb/s data due to channel loss is fixed [Fig. 11(a)]. The timing jitter is removed by CDR and Fig. 11(b) shows the recovered data. The eye is clean with 2.43 ps,rms jitter. Table I and II summarizes the detailed specifications of this work.



Fig. 10. Waveforms of recovered 10-Gb/s data (50 mV/div, 20 ps/div) and 40-GHz clock (50 mV/div, 2 ps/div).



Fig. 11. Waveforms for chip-on-board testing: (a) 40-Gb/s data after limiting amplifier/equalizer (40 mV/div, 5 ps/div), (b) recovered 10-Gb/s data (50 mV/div, 20 ps/div).

# ACKNOWLEDGMENT

This work is supported in part by Atilia Technology.

## REFERENCES

- P. Chiang et al., "60Gb/s NRZ and PAM4 Transmitters for 400GbE in 65nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, pp. 42-43, Feb. 2014.
- [2] J. Lee and K. Wu, "A 20Gb/s Full-Rate Linear CDR Circuit with Automatic Frequency Acquisition," in *IEEE ISSCC Dig. Tech. Papers*, pp. 366-367, Feb. 2009.
- [3] A. Pottbacker et al., "A Si Bipolar Phase and Frequency Detector for Clock Extraction up to 8Gb/s," *IEEE J.Solid-State Circuits*, vol. 27, no. 12, pp. 1747-1751, Dec. 1992.
- [4] S. Galal and B. Razavi, "40Gb/s amplifier and ESD protection circuit in 0.18μm CMOS technology," in *IEEE ISSCC Dig. Tech. Papers*, pp.480-481, Feb. 2004.
- [5] D. Kucharski and K. T. Kornegay, "2.5 V 43–45 Gb/s CDR Circuit and 55 Gb/s PRBS Generator in SiGe Using a Low-Voltage Logic Family," *IEEE J.Solid-State Circuits*, vol. 41, no. 9, pp. 2154-2165, Sep. 2006.
- [6] N. Nedovic et al., "A 40–44 Gb/s 3x oversampling CMOS CDR/1:16 DEMUX," *IEEE J.Solid-State Circuits*, vol. 42, no. 12, pp. 2726-2735, Dec. 2007.
- [7] J.-K. Kim et al., "A fully integrated 0.13-μm CMOS 40-Gb/s serial link transceiver," *IEEE J.Solid-State Circuits*, vol. 44, no. 5, pp. 1510-1521, May. 2009.

| PPG                            |                                                           | BERT                        |                                            |  |
|--------------------------------|-----------------------------------------------------------|-----------------------------|--------------------------------------------|--|
| Data Rate                      | 40Mb/s~40Gb/s                                             | Data Rate                   | 40Mb/s~40Gb/s                              |  |
| Pattern                        | PRBS7, 15, 23, 31                                         | Pattern                     | PRBS7, 15, 23, 31                          |  |
| MUX                            | 64:1                                                      | DMUX                        | 1:64                                       |  |
| Inj. Error Rate                | 0, 10 <sup>-3</sup> , 10 <sup>-6</sup> , 10 <sup>-9</sup> | BER                         | < 10 <sup>-12</sup> (all pattern)          |  |
| Pre-Emphasis                   | 1-tap(1~0dB)<br>(programmable)                            | Front End                   | Limiting Amplifier/<br>Analog Equalizer    |  |
| Output Data<br>Jitter          | 0.8 ps,rms<br>5 ps,pp (4x10Gb/s)                          | Rec. Clock<br>Jitter(40GHz) | 214 fs,rms (40GHz)                         |  |
|                                | 1.5 ps,rms<br>6.8 ps,pp (40Gb/s)                          | Rec. Data<br>Jitter(40Gb/s) | 1.73 ps,rms (10Gb/s<br>10.67 ps,pp (10Gb/s |  |
| Output Swing<br>(single-ended) | 100~400mV<br>(programmable)                               | Input<br>Sensitivity        | 50mV                                       |  |
| Rise/Fall                      | 13ns / 13ns                                               | 1.25.74                     | 2.5V I/O                                   |  |
| Time(40Gb/s)                   | Supply                                                    | Supply                      | 1.2V Analog                                |  |
| Supply<br>Voltage              | 2.5V I/O & Bandgap                                        | vonage                      | 1V Digital                                 |  |
|                                | 1.2V Analog                                               | Power                       | 580 mW                                     |  |
|                                | 1V Digital                                                | Chip Area                   | 1.1 mm <sup>2</sup>                        |  |
| Power                          | 800 mW                                                    | Technology                  | 65-nm CMOS                                 |  |
| Chip Area                      | 1.54 mm <sup>2</sup>                                      |                             |                                            |  |
| Technology                     | 65-nm CMOS                                                |                             |                                            |  |

Table I. Performance summary.

Table II. Comparison table

|                   | [5]                     | [6]                        | [7]                     | CDR                     |
|-------------------|-------------------------|----------------------------|-------------------------|-------------------------|
| Data Rate         | 43~45Gb/s               | 40-44Gb/s                  | 36.2~38.2Gb/s           | 39-40.2Gb/s             |
| Eq. Max. Tuning   | N/A                     | N/A                        | 3dB                     | 15dB                    |
| PD Type           | Half-rate,<br>Binary PD | Quarter-rate,<br>Binary PD | Half-rate,<br>Binary PD | Full-rate,<br>Linear PD |
| Freq. Acqui.      | N/A                     | reference-less             | reference               | reference-less          |
| DMUX              | N/A                     | 1:16                       | 1:32                    | 1:64                    |
| BER               | N/A                     | < 10 <sup>-12</sup>        | < 10 <sup>-14</sup>     | < 10 <sup>-12</sup>     |
| Rec. Clock Jitter | 1.1 ps,rms(22.5GHz)     | N/A                        | 1.77 ps,rms(22.5GHz)    | 214 fs,rms(40GHz        |
| Supply Voltage    | 2.5V                    | 1.4V                       | 1.45V                   | 1.2V                    |
| Power             | 650 mW                  | 910 mW                     | 2.04 W                  | 427 mW                  |
| Chip Area         | 1.25 mm <sup>2</sup>    | 1.44 mm <sup>2</sup>       | 4.93 mm <sup>2</sup>    | 1.1 mm <sup>2</sup>     |
| Technology        | 120 GHz SiGe            | 90-nm CMOS                 | 0.13-um CMOS            | 65-nm CMOS              |