A Comparison of Universal and Xilinx Switches

CS 294-7 Final Project Report, Spring 1997
William Tsu

Motivation

A recent article [1] has shown that universal switches can accommodate up to 25% more routing instances compare to the Xilinx switch boxes, but they seem to have a more complicated routing structure. So the focus of this work is to layout the switches and try to understand the tradeoff between them. Some simple analysis will also be provided to estimate when will the wires dominant the total area of the switches as the number of inputs scale up.

Introduction

First, the programming methodology that is being assumed will be described. Second, the structure of the basic configuration bit will be explained. Then, we will dive into the specific switch boxes. The Xilinx switch will first be studied, then comes the Universal one. Following that is a section on explaining why metal 3 is not being used. Finally, the conclusion, of course, will conclude this report.

Programming Methodology

All the programming bits are assumed to be shifted in from left to right across the chip. The phi1 and phi2 in the diagram below are assumed as 2 phase non-overlapping clocks. The configuration bit is stored in the cross-couple inverters and it will control an actual pass transistor.

Basic Configuration Bit

The basic configuration bit layout is shown below.

Phi1, phi2 are the programming clocks in poly running in the vertical direction. The programming bits are shifted in from the left and exit through the right in metal 1. Vdd and Gnd are made extra thick for circuit reliability reasons. The minimum size cross-couple inverters sit right in the middle of the cell, and the actual pass transistor it controlled is at the lower right hand corner. So, metal 1 (blue) is restricted for local cell routing. And the actual data signals should come in as metal 2.

1. Xilinx Switch Box

By using the basic configuration bit layout as a building block, a diamond switch can be constructed as below.

Note that the middle row of configuration bits have been flipped upsidedown, so that the common Vdd and Gnd can be merged. The layout on the right is also a diamond switch layout, but only metal 2 is set visible. Metal 2 here is used for wiring up the 6 pass transistors together to form the actual diamond switch. The labels "Up", "Down", "Left" and "Right" indicate where the actual data should come from.

The diamond switch is sized to have an aspect ratio close to 1. So now, by putting 4 of these together, we can form a 4 input switch box. Here's the layout. (For clarity purpose, the diamond switch layout is abstracted)

The data signals from the left or right come in as metal 1. Different signals are switched through by different diamond switches. Data signals from top or bottom come in as metal 2 (purple). The extra red bars budding out the diamond blocks are the programming clocks in poly driving through the switches.

Since the diamond switch is approximately square, the natural number of inputs (number of inputs that can make the resulting layout to have an aspect ratio close to 1) are m^2, which m is an integer. So, 1, 4, 9, 16, 25 and so on are the natural number of inputs. As we scale the switch box to have more inputs, we will need to expand the wiring channels. The area of the wiring channels grow faster than the switches themselves, and at some point the wires will dominant the area. Following is a graph which demonstrates when will this happen.

Fig 1: Area of diamond switches and routing wires vs Number of inputs for a Xilinx switch box

So, when the number of inputs get close to a 100, the wires occupy just as much area as the switches. But for practical considerations, say when number of inputs equal 16, the wires occupy around 25% of the total area.

2. Universal Switch Box

The basic building block of a universal switch box is a 2-input universal switch box. As pointed out in [1], for a switch module of an even W, (W as the number of inputs) we can partition it into W/2 non-interacting submodules. By keep partitioning, we would end up with W/2 non-interacting 2-input universal switch boxes with routing property as follow.

Twelve configuration bits will be needed for this basic building block. In contrary to the Xilinx's diamond, there are 2 ways we can tile the configuration memories together:

6 rows of configuration bits, each row 2 bits wide

4 rows of configuration bits, each row 3 bits wide

Let's examine each of these strategies individually.

Strategy 1: 6 rows x 2 bits

The wiring channels are at the bottom and at the right of the block, as shown in the following layout.

Again, the layout on the right is just an abstracted version of the one on the left. Most of the inter-configuration memory bits wiring are done in metal 2, with some help by metal 1 on the side. The layout strategy is essentially the same as the Xilinx one, with the vertical data signals coming in as metal 2, and the horiziontal data as metal 1.

By examining the layout carefully, there are actually 4 metal 2 wires on the right hand side to route 2 data signals through. The reason is the poor-locality of the basic cell. All the incoming signals need to be available across the diagonal corners. This makes the routing more complicated. But note that as we stack another basic cell vertically, the vertical routing channels will only grow by 2 wires.

Because of the basic block arrangement, the natural number of inputs to the Universal switch box are 2, 2*2, 2*8, 2*18 and so on. There's a factor of 2 because the basic block is a 2-input universal switch. It is more restrictive than the Xilinx switch in this aspect. The layout of a 4-input Universal switch box is as follow.

At 4 inputs, the Universal switch box layout is just as efficient as the Xilinx layout. When the number of inputs grow, the wiring channels, again will grow faster than the switches. From the graph below, when there are more than a 120 inputs, the wires occupy roughly as much area as the switches.

Fig 2: Area of switches and routing wires vs Number of inputs under strategy 1

For pratical considerations, say when there are 16 inputs, the wires take up around 23% of the total area.

Strategy 2: 4 rows x 3 bits

The wiring channels are also at the right and bottom of the block, as shown below.

This time, the central region is being spread out. Terminals from the diagonal corners can be connected together through this local wiring channel. This layout apparently has the same growth rate as the Xilinx's. The graph below shows that the wiring area is as significant as the switches area only when there are more than a 100 inputs.

Fig 3: Area of switches and routing wires vs Number of inputs under strategy 2

Since this basic block has an aspect ratio close to 1. Bigger Universal switches can be built by tiling the basic one in a square array just like the Xilinx switches. So the natural number of inputs are 2, 2*2, 2*4, 2*9, 2*16 and so on. In brief, 2*m*m which m is a positive integer. When there are 18 inputs, the wires take up around 27% of the total area.

The above list might not be exactly right. As part of the vertical wiring channels in metal 2 can be embedded just right above the configuration memories. This will change the aspect ratio somewhat.

What about Metal 3?

So far, all the layouts only use 2 metal layers. So what if we use metal 3 too? Although I haven't really gone through and do the layout, I believe the following will happen. The basic cell sizes (configuration bit, diamond switch) stay the same. But when bigger switches are built, some of the wiring channels space could be saved. But metal 3 is not going to be very efficient in compacting the layout, because metal 3 is almost twice as wide as metal 1 and 2, space is also needed between via2 (metal3 to metal2) and via (metal2 to metal1), and the minimum enclosure of via2 is twice as large as what is needed for via.

As processes are scaled down, the interconnect does not scale well but in the mean time we build bigger and bigger chips. So to keep the global interconnect's resistance to become excessively large, metal 3 is made extra wide and so they are intended to distribute global signals. Therefore, I beileve it might be wise to leave metal 3 for clock, global set/reset, power/gnd distribution or even for metal fix later on, and restrict the switches layout to use only 2 metal layers.

Conclusion

Here is a plot of Universal vs Xilinx in terms of area. (Universal 1 means the Universal switch implemented under strategy 1, 2 under strategy 2)

Fig 4: Universal and Xilinx switches area vs Number of inputs

When the number of inputs are large, Universal 1 produces the most compact area because of its small basic cell, which all 12 configuration bits are closely packed together. Universal 2 turns out to occupy more area because each of its basic cell has embedded a local wiring channel.

For practical considerations, (number of inputs not more than 50) the area differences are not significant (less than 10%). Also, the total area are dominated by configuration memories. The wiring area only begins to dominant when there are a 100 inputs.

Though the Universal switches use a 2-input switch as a basic building block, since there are 2 ways we can lay it out, the natural number of inputs are approximately as flexible as the Xilinx switch boxes.

As Universal switches offer more routing flexibility, they should be better switch components.

Acknowledgement

Thanks to Andre' in pointing out how to build Universal switches efficiently, and find out a big bug in the early draft. Suggestions from Will, Nick and Christoforus are also very helpful.

Reference

[1] Universal Switch-Module Design for Symmetric-Array-Based FPGAs. By Yao-Wen Chang, D. F. Wong, and C. K. Wong.