VLSI DESIGN PRINCIPLES (658)

 

Lab Assignment 3

 

 DANAI CHASAKI 

 

ID: 22169216

 

 

Objective: Design a CMOS circuit and layout for a 4-bit accumulator with four instances of the bitslice accumulator from Lab 2.

 

This lab involves the design of a CMOS circuit and Layout of a 4 bit accumulator. The Accumulator consists of a four instances of the bitslice accumulator from lab2. A 4-bit accumulator consists of a 4-bit full adder and a resetable 4-bit register. Its 7 inputs are clk, {A3, A2, A1, A0}, c_in, and reset. Its 5 outputs are {Q3, Q2, Q1, Q0} and c_out. The adder computes the sum of {A3, A2, A1, A0}, {Q3, Q2, Q1, Q0}, and c_in, and generates a sum {S3, S2, S1, S0} and a carry c_out. The register samples {S3, S2, S1, S0} on the rising edge of clk  and stores the result on {Q3, Q2, Q1, Q0}. 

 

 

POST: Schematic of the 4 bit accumulator

 

 

 

 

 

We obtain the schematic of the 4-bit accumulator by instantiating 4 individual symbols of the 1 bit accumulator and interconnecting them as shown above.

Below we give a detailed view, clearly showing the interconnections and the capacitances placed at the Co and the Q nodes.

 

 

 

 

The accumulator instantiated above is a slightly different version of the one in Lab 2.

 

The nodes Q0 through Q3 here are actually the buffered values Qout in the previous schematic. Also the pin Co_bar that was used for debugging in Lab 2 has been removed as it is no longer required. Node Q from Lab 2 also has been made an internal wire, and is no longer a separate pin.

 

 

POST: Validation test sequence (which includes both inputs and response) and an image of the simulator output (waveforms).

 

 

The circuit must be validated for all possible input combinations, using minimal input vectors. This is done by exploiting the inherent parallelism in the circuit as explained below.

 

First we enter the input vectors and the nodes to be analyzed and reset the entire circuit.

This sets Q to 0000, and S (Sum) to 0000.

 

In the next clock cycle, we set A = 1111, Cin = 1 and Q is 0000 (prev. value of S)

On addition, this leads to a ripple effect testing the CARRY logic in all the 4 individual bit dataslices.

 

Testing the Sum logic is relatively easy to explain, we set Cin =0, and test 0+0 , 0+1 and 1+1 using the sequences as shown above. The 1+0 logic would be the parallel of the 0+1 logic.

The IRSIM command file is shown below.

 

SIM.CMD INPUT FILE

sim.cmd

stepsize 50

vector Q Q3 Q2 Q1 Q0

vector A A3 A2 A1 A0

vector S S3 S2 S1 S0

vector Carry Cout Co2 Co1 Co0

 

analyzer Reset phi Cin A Q S Carry

vector in phi Cin Reset

 

set in 101

set a 0000

s

set in 001

set a 0000

s

 

testing the carry logic via ripple effect

set in 110

set a 1111

s

set in 010

set a 1111

s

 

testing the sum logic

 

testing if 0+0 = 0

set in 100

set a 0000

s

set in 000

set a 0000

s

 

testing if 0+1 = 1

set in 100

set a 1000

s

set in 000

set a 1000

s

 

testing if 1+1 = 0

set in 100

set a 1000

s

set in 000

set a 1000

s

 

 

IRSIM output

 

 

The schematic was extracted into a netlist using the schm2sim.pl perl script and IRSIM was run.

 

 

 

 

The IRSIM output file shows that the Carry and SUM logic work correctly, as explained in the section above. Also, the above simulation shows that the reset logic is working correctly and we can also observe that the value of Sum (S) is latched at the positive clock (phi) edge and passed onto Q.

Thus the circuit functionality is validated.

 

 

POST: An image of the layout, with the total height and width annotated. Description of changes made to the bitslice layout of Lab 2.

 

 

Layout of the 4-bit accumulator:

 

The layout of the accumulator is shown below. The aspect ratio is approx 2:1. This is because the aspect ratio of the 1 bit accumulator was about 3:1 . Placing 4 of those slices side by side and adding the extra wiring to interconnect the individual slices led to the extra increase in the width. There is scope for further reducing the width and gain a few more Lambda of space, by jamming the individual blocks together, as is seen by the gaps. However, this would heavily reduce design clarity and increase debugging complexity. The PHI and RESET signals are run across in a horizontal strip of M1 across the circuit (in level 3 on either side of the gnd signal).

 

Changes made to the layout of lab2:

The accumulator instantiated above is a slightly modified version of that in Lab 2. Nodes Q0 through Q3 here are actually the buffered value Qout in the previous schematic. Also Co_bar pin that was used for debugging in Lab 2 has been removed as it is no longer required. Node Q from Lab 2 also has been made an internal wire, and is no longer a separate pin, to avoid conflict with the naming of pins as required by this problem. The rest of the layout was not modified and is essentially the same. 4 instances were placed the one after the other and interconnected using wires. The labelling of the nodes was also changed to meet the new requirements.

 

 

 

 

 

Algorithmic verification of layout

 

 

POST: Hand calculations of the expected final sum, Simulator waveforms corresponding to the above algorithm, the static and dynamic power dissipation of the accumulator as reported by the simulator (no hand calculations).

 

 

The last 4 digits of my student ID are 9216.

The algorithm executed is as follows:

 

 

 

Hand Calculations:

 

 

Clk(t)

Reset

A(t)

Q(t-1)

S(t)

Carry out(t)

0

1

0000

0000

0000

0

1

0

1001

0000

1001

0

2

0

0010

1001

1011

0

3

0

0001

1011

1100

0

4

0

0110

1100

0010

1

 

FINAL SUM: 0010.

 

 

IRSIM simulation

 

 

Input command file:

stepsize 50
vector Q Q3 Q2 Q1 Q0
vector A A3 A2 A1 A0
vector S S3 S2 S1 S0

analyzer phi Reset Cin Cout A Q S
vector in phi Cin Reset

set in 101
set A 0000
s
set in 001
set A 0000
s

set in 100
set A 1001
s

set in 000
set A 1001
s

set in 100
set A 0010
s

set in 000
set A 0010
s

set in 100
set A 0001
s

set in 000
set A 0001
s

set in 100
set A 0110
s

set in 000
set A 0110
s

 

IRSIM output

 

 

 

 

The average power dissipation as obtained from the HSPICE simulation (log file) is given as 423u Watt and the max power dissipation was obtained as 89m W

 

The dynamic power dissipation is plotted with respect to phi as shown below.

 

 

 

 

POST: A description of the critical path, the test sequence which exercises it, and an image of the simulator output with the clock frequency annotated.

 

 

Critical path analysis:

 

 

The Critical path would be the longest/ slowest path from the input to the output. In the 4-bit ripple carry adder under consideration, looking at the circuit topology, we can estimate that the critical path would involve carry propagation (the ripple effect) in the first 3 blocks, followed by Sum and generation in the 4th block.

           

To test this, we could easily use a shorter version of the test sequence given in the problem statement. The idea is to try to generate a Cout of 1 and then vary the clock frequency and check for deformities in the Cout signal. By deformities, we mean the Cout signal does not reach the expected 2.5V value and is clipped.

 

The test sequence is described below.

 

Normal operation:

 

 

 

 

 

 At 500 MHz, output is distorted at Cout does not register the required 1, leading to erroneous result

 

 

 

 

 

 

Correct frequency of operation of circuit approx 450 MHz, based on simulations.

 

 

 

 

 

 

POST: A table of the setup times and an image of the simulator output with the setup times annotated.

 

 

Setup time estimation:

 

Since there is a flip-flop in each bit slice that samples Sum on the positive edge of the clock (phi), there would be set-up time requirements on the individual inputs A0, A1, A2, A3 and Cin that must be met in order for the system to function correctly.

 

Definition: Setup time is defined as the time that the data must arrive before the clock edge (in this case the rising edge), in order for it to be sampled correctly.

 

We evaluate the setup times by trial and error as shown below.

 

Each bit slice is identical and hence the setup-times of A0, A1, A2 and A3 would be the same. The small variation in their individual setup times due to the clock signal propagation delay /skew from one end to the other of the layout can be ignored given the fact that we are trying to obtain a first order estimate of the various delays and times via simulation. Hence we determine the setup time of one of the blocks say A0 and that of Cin

 

To determine the setup time, the algorithm is as shown below.

 

We need to sample a Q = 1 through the flip-flop, which means we should have a Sum = 1

 

Now Sum = A+Q+Cin

 

Setup time of A is obtained by setting A=1, Q=Cin= 0

Setup time of Cin is obtained by setting Cin=1, A=Q=0

 

Setup time of A0 = 400ps

 

 

 

 

In the above figure, Sum S0 is sampled by the flip-flop and passed on to Q0

 

 

Setup time violation on A results in data not being sampled (below)

 

 

 

 

Setup time analysis of Cin

 

Setup time was found to be 300ps

 

 

 

 

Setup time violation (on Cin) example.

 

 

 

 

 

Node

Setup Time

A0

400 ps

A1

400 ps

A2

400 ps

A3

400 ps

Cin

300 ps

 

 

POST: A table of the propagation delays, a description of how you chose your test sequences and an image of the simulator output with the propagation delays annotated.

 

 

Propagation delay estimation:

 

Propagation delay is defined as the time it takes for a signal to propagate from one end to the other end of the circuit. It may also be defined as the time it takes for the outputs to change with respect to the input. The worst case propagation delay is obtained using the critical path of the circuit.

 

The test sequence is explained below

 

 

1)      Reset the circuit (Sets all Q’s to 0)

2)      Set Cin = 0, and all A’s to logic 1. This will result in all the Sums going high and trigger all the Q’s to 1 on the next clock cycle edge

3)      Measure the delay from Phi to Q.

4)      Reset the circuit again. This will result in all the Q’s going back to logic 0.

5)      Measure the delay from Reset to Q.

 

The propagation delays are expected to be in the following order Cin > A0 > A1> A2 >  A3

The actual simulation results are given below.

 

 

 

 

 

Based on the above estimation, we find propagation delays of all the Q outputs are the same. This is expected, since the layouts are identical

 

 

 

Signal

Tp from Phi

Tp from Reset

Q0

1.57ns

420ps

Q1

1.57ns

420ps

Q2

1.57ns

420ps

Q3

1.57ns

420ps

 

 

 

Propagation delay to C_out estimation

 

 

Propagation delay estimation of Cout from reset and phi is done using the same logic as  shown above. To obtain the delays from A0, A1, A2 A3 and Cin, we use the following method

 

 

1)      Reset the system: this results in Q3Q2Q1Q0 = 0000

2)      Set  A3A2A1A0 =  1111

         This results in Sum = 1111 and Q = 1111 in the next clock edge

3) Now depending on which propagation delay estimate is required, set the corresponding value to 1 and the remaining inputs to 0

 

For example if the delay from Cin is required, set Cin = 1, A = 0000 and clk the ckt. This will result in Cout = 1 and then we can measure the propagation delay from Cin. Similarly, if the delay from A3 is required, set Cin = 0, A = 1000 and clk the ckt. This will result in Cout = 1 and then we can measure the propagation delay from A3.

 

 

Prop delay from A0 to Cout= 1.1ns = 1100ps

 

Prop delay from Reset to Cout= 0.06ns = 60ps

 

 

 

 

 

 

  

Prop delay from A1 to Cout= 900ps

 

 

 

  

 

 

Prop delay from A2 to Cout = 700ps

 

 

 

 

  

 

Propagation delay from A3 to Cout = 400ps

 

 

 

 

 

 

Prop delay from Phi to Cout = 400ps

 

 

 

 

  

 

Propagation delay from Cin to Cout = 1100 ps