ECE 558 / 658 VLSI Design
Lab 2: Design of a 1-Bit Accumulator
Due Thursday, October
31, Midnight before
Lecture 15
By
DuSung,Kim
Objective: Design a CMOS circuit and layout for a bitslice
accumulator.
An accumulator consists of a full adder and a resettable flip-flop.
Its
inputs are phi, A, c_in, and reset. Its outputs are Q and c_out. The
adder
computes the sum of A, Q, and c_in, and generates a sum S and a carry
c_out.
The flip-flop samples S on the rising edge of phi and stores the result
on Q.
This lab involves many more transistors and thus more complex logic and
circuit
simulation. It is also a sequential circuit, so you need to deal with a
clock.
You can't run your circuit faster than your critical timing path or the
accumulator will malfunction.
Be sure to include each required item (indicated by POST:) in
your
report. You must also explain what you did and why; images alone are
not
sufficient. Analyze your results, draw conclusions, and describe what
you
learned.
1. Truth Table

(1) Adder
SUM(D)
|
Cout
|
Cin
|
A
|
B(Q)
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
1
|
1
|
0
|
0
|
1
|
0
|
0
|
1
|
0
|
1
|
1
|
1
|
0
|
1
|
0
|
0
|
0
|
1
|
1
|
0
|
1
|
0
|
1
|
1
|
1
|
0
|
1
|
1
|
1
|
1
|
1
|
(a) Equation for Cout logic
Cout = CA + BC +
AB
--- i)
(b)
Equation for Sum logic
Sum = AQ'Cin' + A'Q'Cin +
AQCin + A'QC'in --- ii)
from i),ii)
Sum = ABCin + Cout'
(A+B+Cin).
(2) FlipFlop
(a) Table of D-FF with Reset
Rst
|
clk
|
Input
|
Current
State
|
Next
State
|
0
|
0->1
|
0
|
X
|
0
|
0
|
0->1
|
1
|
X
|
1
|
0
|
1->0
|
X
|
1
|
1
|
0
|
1->0
|
X
|
0
|
0
|
1
|
X
|
X
|
X
|
0
|
(b) State Diagram
Because full adder is combinational logic, it can be
include in this diagram. But, to reduce complexity I drew for only
sequential part(one D-FF)
(Assume, Input for reset is exist implectly, and if
reset = 1, it move to 0 state immediatly(because reset is asyncronos
signal) and it's output is 0.
(c) State Table (according to state diagram)
For 1bit accumulator, we have only 1 bit data to save and load.
and we need only 2 state(load to adder, and save from adder)
so, we need only 1 FF(In this lab D-FF)
Present
state
|
Next
State
|
Output
b.
|
D
|
S =
0
S = 1 |
S = 0
S
= 1 |
0
|
0
1 (if rst=1 => 0)
|
0
0
|
1
|
0
1 (if rst=1 => 0)
|
1(if
rst=1=> 0)
1(if rst=1
=> 0)
|
2. Schematic
* I made this design by hierarchical
structure. It is helpful to understand, reuse.
(a)
Schmetic for sum logic
This is static c-mos design, sized by
logical effort.

(b) Schmetic for carry logic
* This is a domino logic. The invertor is
need to make only 0->1 transition in domino logic, because when we
use dynamic logic chain it may accur voltage leakage during precharge
stage. In this case it is also helpful to drive Cload. When we make
multi-bit adder this carry may be rippled, which mean it is cascade
dynamic logic. In order to prevent the leakage of precharge voltage
this inverter can be used.
*
I used phi_bar instead of phi. Because
D-FF must sample on rising edge.
If you use phi for carry
logic, it means there are invalid value on output when clock is
low because this phase is just for precharging not for evaluation.
Therefore the output 'Cout' is always zero in this phase and it
propagate to output of Sum logic. For this region we always have to
sample the value which is made at high value of clock.
So, when phi(clock) is going to
high, D-FF will sample the propagated value from
precharging phase. The reason is that clock tick time
is always early then input arrival time for D-FF
Tsum_arrival_time = (phase transition
time(precharge->evaluate)) + (evaluation time) + (propargation time
for sum logic)
Tclock
tick_dff = time for clock
propergation delay
Therefore, the sampling value of D-FF
is propargated value from precharge phase, which means the stored value
of D-FF is wrong value.
To solve this problem, there are 3
method.
1) sampling falling edge :
The requirement for this Lab is sampling at rising edge.
2) Use multiple clock or
put big delay unit on the clock path. However this method make async
circuit which means it is increase potantal timing problem.
3) Changing the phases for
carry logic. We can give simply phi_bar instead of phi. If (clock
cycle/2 > evaluation time+setup time) the D-FF have always valid
value.

(c) Schmetic for 1bit-full adder

(d) Schmetic for D-FF
This FF must sample the input rising edge. Actually, the input of master-slave FF is sampled high 'level', So, I intepreted the constraint that I have to make rising edge triggered FF.
- Here is the master-slave D-FF from the textbook.
To make Rst signal I added 1 pmos transistor at the slave
stack and added 1 nmos transistor . If reset is 0 this FF has the
same behaver with original one, because pmos for reset is short and
nmos for reset is open(but it has little R). If reset is 1 pmos is open
and nmos for reset is short, which means the capacitance will discharge
and no more charge until reset is 0.

This is the FF can sample high 'level' of the clock, which means this is falling edge triggered FF.
I used previous one, for this lab, because I looked 'edge' sampling, seriously.
(e) Schmetic for Fullchip design.
* There are no buffer for feedback line
(requirement).
* I made Clk_bar using Clk this can cause clock
skew, or glich, and it can make little short circuit leakage in dynamic
logic and
FF. The reason is invertor delay. If this kind of delay is big, there
will be serious timing problem'. But, in this lab it is ignorable
because single minimum inverter delay is small.
* I made Cout_bar and Sum_bar for Full adder, for
only reuse purpose.

3. Verification
(a) We usually call "testing" for the
product, but we call "verification" for design.
(b) The purpose of executing IRSIM is to
verify functionality, which means that we can make sure this design
have the same behavior with bool equation regardless of mos
size, delay, noise, and so on.
(c) In Registor Transfer level, because all
elements
are regarded as gate, and we can check functionality of gate. but In
transistor
level, we can check functionality of transistor, which means transistor
works well or not, there are unexpected short or open part in the
circuit and so on.
(d) In most case, verification is very time
consuming task, because current design is very huge,
and lots of input ports, It is almost impossible to check every
input-output cases when we think time to market.
(e) But, at the beginning implementation
process, or in case of small, basic block, we should check
every cases if possible(includeing Boundary case), Because in
small design case, testbench for full testing is more easy, and
garrenty 100% full functional working.
(f) So, I execute full testing, and I marked
what part is need for boundary case.
* First of all I made a testbench for basic
element such as sum logic, carry logic, d-ff. and tested
seperatly. But, if there are no problem for each block, it
doesn't means the assemble block of them will work well. So, we need
'full chip verification' again. Generally, this full chip
verification takes long
time and it is very hard to make proper testbench. For this lab,
I made a testbench which
have every input combination for fulladder. In this accumulator, one of
input for fulladder is output of D-FF, and the input of D-FF is from
output of fulladder. It means we can make every different combination
of {A,Cin,Q} by adjusting {A,Cin}
* We need reset signal as well. For
initial time, there are no value in the D-FF, which means it's output
is hi-impedence condition, to deal this problem we need to force
hi-value(1) or to force low-value(0), because sometimes hi-impedence
condition can be propartate other part of the design. And for the
testing, I gave a reset high the end of testbench, during this time
output(Q) of D-FF must be 0 regardless of other signal.
* I marked with "blue pen" for boundary
validation. It can check every logical stuck at fault, which means it
can check whether transistors work well or not, and the case
represented on K-map. For the huge design, this is effecient way to get
good coverage for given time.
* Carry logic
Because I tested every combination of input and there are no memory
element in this circuit, it is perfectly work functionally.
Logically, we should chack every boundary condition of minterm, but
carry logic is domino logic. which means there are only pull down logic
except one pmos transistor. Therefore, we need to check the exclusive
case for only pull down and pre-charge condition(pre-charge condition
can be redundant several time)
Boundary case : (A,B,Cin) = (0,1,1),(1,0,1),(1,1,0) and phi_bar=0.
(I'm sorry the color is bad. I couldn't change the color, I spend
couple of hours for this but I coundn't.

(Red : pull-down bouldary element)
AB\C
|
0
|
1
|
00
|
|
|
01
|
|
1
|
11
|
1
|
1
|
10
|
|
1
|
* Sum logic
Because I tested every combination of input and there are no memory
element in this circuit, it is perfectly work functionally.
Boundary Case with K-map(red : for pull-down, blue :for pull-up)
CinCout'\AB
|
00
|
01
|
11
|
10
|
00
|
0
|
0
|
0
|
0
|
01
|
0
|
1
|
1
|
1
|
11
|
1
|
1
|
1
|
1
|
10
|
0
|
0
|
1
|
0
|
* Full Adder
Because I tested every combination of input and there are no memory
element in this circuit, it is perfectly work functionally.
I already test the boundary case for carry and sum logic, (actually I
did every case). But, It cannot guarantee the super module will
be working.
Because there are no functional logic which didn't be tested, we need
to check connectivity only. but, I did full test again.

* D-FF
It has simple state table. we can verify simple by clock-edge check,
and rising-falling check, and high-low level check and input-high, and
input-low check.
* Full chip of 1-bit accumulator (Evaluation phase when Clk is low,
Osum is buffered Q)
In this case it is not sufficient to give every input pattern, because
there are memory element. The output of circuit can be determined by
inputs and internal states. So we need to make testbench by combinating
input and state. In this case, we have Rst,A,Cin,(and clk) as inputs
and Q as state. Therefore, I made every combination of A,Cin,Q by
adjusting A,Cin. And for testing Rst, I gave hi for short initial time
and last time. I made sure that it works well.

4. Rise and Fall time analysis
- I inserted the load (100 fF on c_out
and 250 fF on Q) in hspice code.
* Hspice
result.

The rising and falling delay for me is 327.
I changed size of the invertor which is
at output for Cout, and changed size 2 output invertor which is in
output(Osum) of top design.
I tried and fixed a lot of time, and I got some nice
number finally.
* Rising and Falling time is not determined by paracitic capacitance
only in this case, we need to consider the rising and falling time of
input, it affect to the ones of outputs.
|
Cout
|
Sum
|
Rising(worst
case)
|
306ps
|
310ps
|
Falling(worst
case)
|
322ps
|
313ps
|
5. Critical Path analysis
(a) To find critical path is essential for
sequeantial circuit. Because clock frequency depends on it.
Critical path can be determined the worst time consuming path between
syncronized step.
Sum = ABCin + Cout'
(A+B+Cin). To get a
valid value at Sum, This term Cout'
(A+B+Cin) must be evaluated
first and then evaluate Sum with A,B. So, to reduce propertation delay
carry logic must be fast. In this case, carry logic is domino, which
means it's speed is very fast, espesally pull up. Furthermore, the
domino logic is also syncronized by clock, which mean we don't need to
think full path passing the carry logic. the critical path can be
devided into carry logic part and sum logic part.
But, Clock is sampling rising edge and when clock is
high, the carry logic is precharging. Even though critical path is
devided into carry path and D-FF path, still cout_bar is arrived most
lately, if it is low value. Thus, Cin and A must be 1 to make cout_bar
pull down slowly, and B(Q) should be change to 0.
Let's look at this block structure.
The the paths of syncronizing step can be devided as follow.
1. between primary input and clocked unit.
2. between primary output and clocked unit.
3. between clocked unit and clocked unit.
So, we need to devide the Adder logic to find critical path.
the candidate is..
a. Pi-Carry
b. Carry_In-Po
c. Dff_Q-Carry_In
d. Carry_In-Sum-Dff_In
and If we ignore the delay from wire.
a=c
The delay of path delay can be determined by the Capacitance and Resistance among the syncronized unit.
Thus, definitly, d. is the critical path.
(b)
"d". is critical path. so we should deal with it, to determin maximum clock rate.
But, here is some problem. The carry logic must propagate the signal to D-FF in half cycle. Because the evaluating is done during active low. However, there are no big delay element in the circuit when active high, because carry logic is just precharging at that time. and have full 1cycle time to propagate to D-FF.
Thus, to get maximum clock rate, we can change the the waveform of clock. which means active low is long and active high is short(but setup time and hold time must be considered.)
And trisition time of input of the critical path is also should be considered.
Which means we cannot give just square waveform as inputs for each delay path, because the overall clock rate have to include include tp, tsetup_time, and tp should include input transition time.
But, in this situation, the transition time of the input of D-FF(sum from adder) is also slow because there are no drivers for internal wire and I used only minimum size transistor for D-FF, which means setup time and hold time is also importante issue.
So, I checked the delay from cout to input of dff in a full chip test environment.
During tring and fixing. net36 is the output of sum logic. it works find but the D-FF is still malfunctioning, because of setup-hold time.
The waveform is not nice. but from this point, functionaly this circuit is working correctly.
Maximum clock rate is that
280MHz.
6. Layout
(1) Layout image
- layout for D-FF, Sum logic
- Layout for Carry and Q driver and invertor to
generate Clk_bar
* Hspice result from Layout (very simular to schmetic
design)
* I have got the same result from IRSIM as well
* Layout was very painfull work. I had several problem described below
1) Short almost circuit. - I put n-tab and
p-tab wrong way.
2) Label problem - There are 3 Vdd rails and 2Gnd
rails, I tried to label for each whth the same name. I finally
connected every Vdd rail and Ground for testing.
3) Some miss connection for nets - I fixed it by LVS.
4) Capacitance problem : I had
error when I used LVS. I didn't draw capacitance for FF in the Layout.
5) Bad mos problem : I used very big mos for driver.
It generated warning.
6) Vdd, Gnd fan-out, and changing structure : There
are different connections for Vdd and Gnd, and I changed the structure
for drawing convenience, It made some errors.
8) Pin name difference : I used different pin name.
7) The space problem : It was very hard to meet tiny
space.
*Some of LVS output
This unmatched net is caused by unused net(from
adder) and finger of the mos
*
Sample error message I have got from LVS.
|
Device summary for layout
bad total
pmos
4 18
nmos
1 23
I /I15/I6/M3
? Device does not cross-match.
N /vdd!
? Net does not cross-match. It has 18 connections.
|
* Most of this message is caused by misconnection of nets, and wire
naming, and changing structure (like serial- serial-parrallel to
parallel-serial-serial)