ECE 558 / 658 VLSI Design
Lab 2: Design of a 1-Bit Accumulator

Due Thursday, October 31, Midnight before Lecture 15

By DuSung,Kim


Objective: Design a CMOS circuit and layout for a bitslice accumulator.

An accumulator consists of a full adder and a resettable flip-flop. Its inputs are phi, A, c_in, and reset. Its outputs are Q and c_out. The adder computes the sum of A, Q, and c_in, and generates a sum S and a carry c_out. The flip-flop samples S on the rising edge of phi and stores the result on Q.

This lab involves many more transistors and thus more complex logic and circuit simulation. It is also a sequential circuit, so you need to deal with a clock. You can't run your circuit faster than your critical timing path or the accumulator will malfunction.

Be sure to include each required item (indicated by POST:) in your report. You must also explain what you did and why; images alone are not sufficient. Analyze your results, draw conclusions, and describe what you learned.

1. Truth Table


(1) Adder
SUM(D)
Cout
Cin
A
B(Q)
0
0
0
0
0
1
0
0
0
1
1
0
0
1
0
0
1
0
1
1
1
0
1
0
0
0
1
1
0
1
0
1
1
1
0
1
1
1
1
1

     (a) Equation for  Cout logic
          Cout = CA + BC + AB                     --- i)

   
(b) Equation for Sum logic
          Sum = AQ'Cin' + A'Q'Cin + AQCin + A'QC'in    --- ii)
         
          from i),ii)
          Sum = ABCin + Cout'(A+B+Cin).

(2) FlipFlop
(a) Table of D-FF  with Reset

Rst
clk
Input
Current State
Next State
0
0->1
0
X
0
0
0->1
1
X
1
0
1->0
X
1
1
0
1->0
X
0
            0            
1
X
X
X
0

(b) State Diagram
    Because full adder is combinational logic, it can be include in this diagram. But, to reduce complexity I drew for only sequential part(one D-FF)
    (Assume, Input for reset is exist implectly, and if reset = 1, it move to 0 state immediatly(because reset is asyncronos signal) and it's output is 0.




(c) State Table (according to state diagram)
For 1bit accumulator, we have only 1 bit data to save and load.
and we need only 2 state(load to adder, and save from adder)
so, we need only 1 FF(In this lab D-FF)
Present state
Next State
Output b.
D
   S = 0                 S = 1     S = 0                              S = 1
0
   0              1 (if rst=1 => 0)
  0                                        0
1
   0              1 (if rst=1 => 0)
     1(if rst=1=> 0)            1(if rst=1 => 0)


2. Schematic
   
* I made this design by hierarchical structure. It is helpful to understand, reuse.

    (a) Schmetic for sum logic
       This is static c-mos design, sized by logical effort.

 
    (b) Schmetic for carry logic
      * This is a domino logic. The invertor is need to make only 0->1 transition in domino logic, because when we use dynamic logic chain it may accur voltage leakage during precharge stage. In this case it is also helpful to drive Cload. When we make multi-bit adder this carry may be rippled, which mean it is cascade dynamic logic. In order to prevent the leakage of precharge voltage this inverter can be used.
     
      * I used phi_bar instead of phi. Because D-FF must sample on rising edge.
        If you use phi for carry logic, it means there are invalid  value on output when clock is low because this phase is just for precharging not for evaluation. Therefore the output 'Cout' is always zero in this phase and it propagate to output of Sum logic. For this region we always have to sample the value which is made at high value of clock.
        So, when phi(clock) is going to high, D-FF will sample the propagated value from  precharging  phase.  The reason is that  clock tick time is always early then  input arrival time for D-FF

       Tsum_arrival_time = (phase transition time(precharge->evaluate)) + (evaluation time) + (propargation time for sum logic)
       Tclock tick_dff        = time for clock propergation delay

       Therefore, the sampling value of D-FF is propargated value from precharge phase, which means the stored value of D-FF is wrong value.

       To solve this problem, there are 3 method.
          1) sampling falling edge : The requirement for this Lab is sampling at rising edge.
          2) Use multiple clock or put big delay unit on the clock path. However this method make async circuit which means it is increase potantal timing problem.
         3) Changing the phases for carry logic. We can give simply phi_bar instead of phi. If (clock cycle/2 > evaluation time+setup time) the D-FF have always valid value.

 
     (c) Schmetic for 1bit-full adder


     (d) Schmetic for D-FF
   
This FF must sample the input rising edge. Actually, the input of master-slave FF is sampled high 'level', So, I intepreted the constraint that I have to make rising edge triggered FF. - Here is the master-slave D-FF from the textbook. To make Rst signal I added 1 pmos transistor  at the  slave stack and  added 1 nmos transistor . If reset is 0 this FF has the same behaver with original one, because pmos for reset is short and nmos for reset is open(but it has little R). If reset is 1 pmos is open and nmos for reset is short, which means the capacitance will discharge and no more charge until reset is 0.


This is the FF can sample high 'level' of the clock, which means this is falling edge triggered FF.

I used previous one, for this lab, because I looked 'edge' sampling, seriously.
     (e) Schmetic for Fullchip design.
    * There are no buffer for feedback line (requirement).
    * I made Clk_bar using Clk this can cause clock skew, or glich, and it can make little short circuit leakage in dynamic logic and FF. The reason is invertor delay. If this kind of delay is big, there will be serious timing problem'. But, in this lab it is ignorable because  single minimum inverter delay is small.
 
    * I made Cout_bar and Sum_bar for Full adder, for only reuse purpose.



3. Verification
   
(a) We usually call "testing" for the product, but we call "verification" for design.
     (b) The purpose of executing IRSIM is to verify functionality, which means that we can make sure this design have the same behavior with bool equation  regardless of mos size, delay, noise, and so on.
     (c) In Registor Transfer level, because all elements are regarded as gate, and we can check functionality of gate. but In transistor level, we can check functionality of transistor, which means transistor works well or not, there are unexpected short or open part in the circuit and so on.
     (d) In most case, verification is very time consuming task, because current design is  very huge,  and  lots of input ports, It is almost impossible to check every input-output cases when we think time to market.
     (e) But,  at the beginning implementation process,  or  in case of small, basic block, we should check every cases if possible(includeing Boundary case), Because  in small design case, testbench for full testing is more easy, and garrenty 100% full functional working.
     (f) So, I execute full testing, and I marked what part is need for boundary case.

      * First of all I made a testbench for basic element such as sum logic, carry logic, d-ff. and tested seperatly.  But, if there are no problem for each block, it doesn't means the assemble block of them will work well. So, we need 'full chip verification' again.  Generally, this full chip verification takes long time and it is very hard to make proper testbench.  For this lab, I made a testbench which have every input combination for fulladder. In this accumulator, one of input for fulladder is output of D-FF, and the input of D-FF is from output of fulladder. It means we can make every different combination of {A,Cin,Q} by adjusting {A,Cin}
      * We need reset signal as well. For initial time, there are no value in the D-FF, which means it's output is hi-impedence condition, to deal this problem we need to force hi-value(1) or to force low-value(0), because sometimes hi-impedence condition can be propartate other part of the design. And for the testing, I gave a reset high the end of testbench, during this time output(Q) of D-FF must be 0 regardless of other signal.

       * I marked with "blue pen" for boundary validation. It can check every logical stuck at fault, which means it can check whether transistors work well or not, and the case represented on K-map. For the huge design, this is effecient way to get good coverage for given time.

* Carry logic
Because I tested every combination of input and there are no memory element in this circuit, it is perfectly work functionally.
Logically, we should chack every boundary condition of minterm, but carry logic is domino logic. which means there are only pull down logic except one pmos transistor. Therefore, we need to check the exclusive case for only pull down and pre-charge condition(pre-charge condition can be redundant several time)

Boundary case : (A,B,Cin) = (0,1,1),(1,0,1),(1,1,0) and phi_bar=0.

(I'm sorry the color is bad. I couldn't change the color, I spend couple of hours for this but I coundn't.


(Red : pull-down bouldary element)
AB\C
0
1
00


01

1
11
1
1
10

1


* Sum logic
Because I tested every combination of input and there are no memory element in this circuit, it is perfectly work functionally.



Boundary Case with K-map(red : for pull-down, blue :for pull-up)
CinCout'\AB
00
01
11
10
00
0
0
0
0
01
0
1
1
1
11
1
1
1
1
10
0
0
1
0


* Full Adder
Because I tested every combination of input and there are no memory element in this circuit, it is perfectly work functionally.
I already test the boundary case for carry and sum logic, (actually I did every case). But, It cannot guarantee  the super module will be working.
Because there are no functional logic which didn't be tested, we need to check connectivity only. but, I did full test again.

 

* D-FF
It has simple state table. we can verify simple by clock-edge check, and rising-falling check, and high-low level check and input-high, and input-low check.




* Full chip of 1-bit accumulator (Evaluation phase when Clk is low, Osum is buffered Q)
In this case it is not sufficient to give every input pattern, because there are memory element. The output of circuit can be determined by inputs and internal states. So we need to make testbench by combinating input and state. In this case, we have Rst,A,Cin,(and clk) as inputs and Q as state. Therefore, I made every combination of A,Cin,Q by adjusting A,Cin. And for testing Rst, I gave hi for short initial time and last time. I made sure that it works well.




4. Rise and Fall time analysis
   
- I inserted the load (
100 fF on c_out and 250 fF on Q) in hspice code.

    * Hspice result.


    The rising and falling delay for me is 327.
    I changed size of the invertor  which  is at output for Cout, and changed size 2 output invertor which is in output(Osum) of top design.
    I tried and fixed a lot of time, and I got some nice number finally.

  


* Rising and Falling time is not determined by paracitic capacitance only in this case, we need to consider the rising and falling time of input, it affect to the ones of outputs.


Cout
Sum
Rising(worst case)
306ps
310ps
Falling(worst case)
322ps
313ps

  

5. Critical Path analysis
   
(a) To find critical path is essential for sequeantial circuit. Because clock frequency depends on it.  Critical path can be determined the worst time consuming path between syncronized step.
        Sum = ABCin + Cout'(A+B+Cin).  To get a valid value at Sum, This term Cout'(A+B+Cin) must be evaluated first and then evaluate Sum with A,B. So, to reduce propertation delay carry logic must be fast. In this case, carry logic is domino, which means it's speed is very fast, espesally pull up. Furthermore, the domino logic is also syncronized by clock, which mean we don't need to think full path passing the carry logic. the critical path can be devided into carry logic part and sum logic part.
    But, Clock is sampling rising edge and when clock is high, the carry logic is precharging. Even though critical path is devided into carry path and D-FF path, still cout_bar is arrived most lately, if it is low value. Thus, Cin and A must be 1 to make cout_bar pull down slowly, and B(Q) should be change to 0.

Let's look at this block structure.

The the paths of syncronizing step can be devided as follow.
1. between primary input and clocked unit. 2. between primary output and clocked unit. 3. between clocked unit and clocked unit. So, we need to devide the Adder logic to find critical path. the candidate is..
a. Pi-Carry
b. Carry_In-Po
c. Dff_Q-Carry_In
d. Carry_In-Sum-Dff_In
and If we ignore the delay from wire.
a=c
The delay of path delay can be determined by the Capacitance and Resistance among the syncronized unit.
Thus, definitly, d. is the critical path.     (b) "d". is critical path. so we should deal with it, to determin maximum clock rate.
But, here is some problem. The carry logic must propagate the signal to D-FF in half cycle. Because the evaluating is done during active low. However, there are no big delay element in the circuit when active high, because carry logic is just precharging at that time. and have full 1cycle time to propagate to D-FF.
Thus, to get maximum clock rate, we can change the the waveform of clock. which means active low is long and active high is short(but setup time and hold time must be considered.)
And trisition time of input of the critical path is also should be considered. Which means we cannot give just square waveform as inputs for each delay path, because the overall clock rate have to include include tp, tsetup_time, and tp should include input transition time. But, in this situation, the transition time of the input of D-FF(sum from adder) is also slow because there are no drivers for internal wire and I used only minimum size transistor for D-FF, which means setup time and hold time is also importante issue. So, I checked the delay from cout to input of dff in a full chip test environment.


During tring and fixing. net36 is the output of sum logic. it works find but the D-FF is still malfunctioning, because of setup-hold time.


The waveform is not nice. but from this point, functionaly this circuit is working correctly. Maximum clock rate is that 280MHz.


6. Layout
(1) Layout image


     - layout for D-FF, Sum logic
 

     - Layout for Carry and Q driver and invertor to generate Clk_bar



   * Hspice result from Layout (very simular to schmetic design)


   * I have got the same result from IRSIM as well



* Layout was very painfull work. I had several problem described below
    1) Short almost circuit.  - I put n-tab and p-tab wrong way.
    2) Label problem - There are 3 Vdd rails and 2Gnd rails, I tried to label for each whth the same name. I finally connected every Vdd rail and Ground for testing.
    3) Some miss connection for nets - I fixed it by LVS.
    4) Capacitance  problem  :  I had error when I used LVS. I didn't draw capacitance for FF in the Layout.
    5) Bad mos problem : I used very big mos for driver. It generated warning.
    6) Vdd, Gnd fan-out, and changing structure : There are different connections for Vdd and Gnd, and I changed the structure for drawing convenience, It made some errors.
    8) Pin name difference : I used different pin name.
    7) The space problem : It was very hard to meet tiny space.

*Some of LVS output
    This unmatched net is caused by unused net(from adder) and finger of the mos



* Sample error message I have got from LVS.
        Device summary for layout
                   bad  total
        pmos         4     18
        nmos         1     23

I /I15/I6/M3
? Device does not cross-match.
N /vdd!
? Net does not cross-match. It has 18 connections.



* Most of this message is caused by misconnection of nets, and wire naming, and changing structure (like serial- serial-parrallel to parallel-serial-serial)