LAB 4: Design of a Low Power 4x4 Multiplier

using 1.2um CMOS technology.

ECE 658 VLSI Design Principles (Fall 1998)

By: Todd Tolliver(tolliver@ecs.umass.edu)

 

Index

1.- Introduction
2.- Design and Analysis
3.- Schematic
4.- Multiplier Layout
5.- Functional Verification
6.- Multiplier in Pin Grid
7.- Test Plan
8.- Summary


Introdution

      This webpage describes my efforts to design a low power 4x4 multiplier. I began this project by analyzing 3 different implementations of the full adder circuit for worst case power consumption. The 3 were: static CMOS, hybrid static/dynamic, and full dynamic logic styles. The static CMOS full adder requires 28 transistors made up of complementary NMOS and PMOS halves for each portion of the circuit. The use of this logic style consumes a large area and is generally much slower than other logic styles. The hybrid logic style is that from Lab 3, where the sum circuit is designed using static CMOS while the carry circuit uses the Domino logic style. In this case the number of transistors required to implement the carry portion of full adder is reduced, leading to a reduction in dissipated power and area. Finally, the dynamic style used was np-CMOS. In this case the number of transistors needed is greatly reduced.

Design and Analysis

      Below is a table summarizing the results from HSPICE simulation of a full adder for each type of logic.

Technology Static Power (mW) Dynamic Power (mW)
Static CMOS 4.24x10^(-4) 6.38
Static/Dynamic 3.32x10^(-4) 3.56
np-CMOS 1.56x10^(-4) 1.30

      As can be seen from this table, the np-CMOS implementation of the full adder has the lowest power dissipation. The biggest win is the dynamic power. An added benefit of the np-CMOS logic style is its increased speed over static CMOS. Therefore, np-CMOS logic will be used to implement the full/half adders of the multiplier circuit.

Schematic

      The schematic for the np-CMOS logic style for the full adder is shown here, after page 393 of Rabaey. The schematic shows two different circuits for the full adder. The even slice consists of a n-type carry circuit and a p-type sum circuit. Similarly, the odd slice has the reverse, a p-type carry and n-type sum. This configuration allows one the chain the adders together in such a way that an n-type circuit is always followed by a p-type circuit.

      Next we take a look at block diagrams representing each cell. Click here for the even slice and here for the odd slice. Note that in each slice, a static AND is included. Also, the X, Y, PHI, and PHIbar are routed through the cell. This type of layout reduces the amount of routing required for the multiplier. Here is the layout for the even slice and the odd slice. Also, the layout of the static AND gate can be viewed here.

Multiplier Layout

      With the layout of the adders complete, we can look at the overall layout of the circuit. Here is a block diagram of the multiplier, with no routing. Note the alternating even (N) and odd (P) adder types. The layout of the multiplier can be viewed here. Knowing that we need to route power and ground around the circuit, the layout was done with the following power routing in mind. Take a look at the last row of adders in the multiplier. There are a number of static inverters used to delay the clock signal in that portion of the circuit, a suggestion by the TA (Sriram, Fall 2001). The reason for this was that the last row of adders were switching to their final values before they received the correct information from the adders above. Therefore, it was necessary to delay the last row of adders in order to allow enough time for the values from the adders in the rest of the circuit to ripple down to the last row.

Functional Verification

IRSIM

      The following test cases were used to verify the funtionality of the multiplier:

Case X (Input) Y (Input) Z (Output)
1 1001 1001 0101 0001
2 1001 1000 0100 1000
3 1000 0001 0000 1000
4 0001 0000 0000 0000
5 0000 0100 0000 0000
6 0100 1000 0010 0000
7 0100 0010 0000 1000
8 1000 1001 0100 1000
9 0010 1000 0001 0000


As can be seen from this table,the correct output was acheived for these cases. Here is an irsim plot of cases 1, 3 and 7 (the plot shows them in the following order: 1, 7, 3).

POWER

      The table below shows typical values for the static and dynamic power dissipation of the multiplier.

Static (mW) Dynamic (mW)
2.83 4.39


Notice the large static power dissipation. This is most likely due to the large number of static and gates and buffers in the circuit. It is likely that the use of dynamic gates in place of these would bring this value down.

Multiplier in Pin Grid

      Once the basic layout and functionality test was completed, the multiplier was placed in the pin grid provided by the TA. Here is a diagram of the pin assignments. Below is a table summarizing the meaning of each pin.

Pin Meaning
GND ground
VDD +5V
CLOCK Clock for n-type dynamic circuits.
CLOCKbar Clock for p-type dynamic circuits.
X3,X2,X1,X0 Multiplicand
Y3,Y2,Y1,Y0 Multiplier
Z7,Z6,Z5,Z4,Z3,Z2,Z1,Z0 Result
X Unused

      Here is the layout of the multiplier inside the pin grid.

To verify the functionality of the multiplier in the pin grid, irsim simulations were performed. The results were identical to those of the stand alone multiplier. Results from irsim for cases 1, 3 and 7 can be seen here.

Test Plan

The following test can be used to determine the functionality of the chip if it were to be manufactured:

Given X and Y input vectors, look for correct output vector Z.

Test X (Input Vector) Y (Input Vector) Z (Expected Output Vector)
1 1001 1001 0101 0001
2 1001 1000 0100 1000
3 1000 0001 0000 1000


Although only three test cases are shown, any combination of 4 bit X and Y input vectors can be used to test the chip.

Summary

To summarize, a low power 4x4 multiplier was designed. Three circuit types were compared to determine which logic style would yield the lowest static and dynamic power. NP-cmos was used to design the multiplier and proved to be a challenge. This was the case for two of reasons. First, this was the largest circuit design that we had to perform and brought with it unique issues to deal with as far as routing, etc. Second, delays in the upper portions of the circuit caused incorrect values to appear at the outputs of the multiplier. As suggested by the TA, buffers were used to delay the clock signals at the lower portion of the circuit. This delay was enough to allow time for the correct values to appear and be used by the rest of the circuit. It is my opinion that a more robust design would be acheived by using latches in between the different stages. This pipelined architecture is employed in NORA-cmos and would be an interersting next step in future designs of a low power multiplier. Finally, placing the completed circuit ito the pin grid for manufacturing was an interesting exercise in seeing what a full chip layout would look like.

Go back to index