Lab 1: Design of a CMOS NAND Gate - DuSung,Kim -


Objective: Design a CMOS 2-input nand gate. You will follow the VLSI design flow in a complete design and verification of this gate. This exercise introduces both the design flow and the CAD tools we will use in this class.


Specifications:
The gate must be designed with minimum channel length in 0.25 um technology (L=240nm).
The gate must be able to drive 32 minimum size CMOS inverters (Wn=0.3um, Wp=0.9um) and 100fF of lumped wiring cap.
The propagation delay must be less than 300ps.
Assume the inputs have 50ps rise and fall times.
The layout must fit our standard cell library: total height = 140 lambda, with 20 lambda high rails in M1 for vdd and ground on the top and bottom edges of the cell (respectively). This leaves 100 lambda of vertical space between the rails.
You may only use M1 and M2 metal layers.
The inputs and output must be accessible from the top of the cell (VDD side) in metal 2.
You must minimize the width of the cell.
Here is an example NOR gates in Cadence Note: Dimensions may vary from the spec for this lab.


(1) Truth table

Bool equation : OUT0' = IN0 + IN1

Behavior : Out is 0 if every input is 1, otherwise 0.

OUT0
IN0
IN1
1
0
0
1
0
1
1
1
0
0
1
1

 

(2) Schematic Image

 

(3) Functional Simulation (Logic Simulation)

 

(4) Hand calculations.

(a) Capacident calculation

Worst case for falling is one of this, 00->11, 01->11, 10->11 in this case nmos is connected by serial. If input vector was 00, it means all capacitance of nmos is already discharged, so it is not the case. If IN0 was 0 and IN1 was 1, Cpara for NM2 is already discharged, so it is not the worst case as well. But,when IN0 was 1 and IN1 was 0 and IN1 is going up to high, Cpara of NM1 and NM2 both should be discharged. so, It is the worst falling case.
Worst falling case : IN0 = 1, IN1 = 0 => IN0 = 1, IN1 = 1.
pmos is connected parallely. Req(IN0 = 0, IN1 = 0) is smaller than Req(IN0 = 1, IN1 = 0) So (IN0 = 1,IN1 = 1) => (IN0=1,IN1=0) is the worst case. Worst rising case : IN0 = 1, IN1 = 1 => IN0 = 1, IN1 = 0

Cpara = CgdpA + CdbpA + CgdpB + CdbpB + CgdnB + CdbnB + CgsnB + CsbnB + CgdnA + CdbnA.

(Assume that the drain and source of each transistor is geometrically identical, the two PMOS transistors are identical, and the two NMOS transistors are identical.)

Cpara = 2*(Cgdp + Cdbp) + 3*(Cgdn+ Cdbn)

Cpara = 2*(Cgdo*Wp) + 2*[Cj*Wp*Ls + Cjsw(2*Ls+Wp)] + 3*(Cgdo*Wn) + 3*[Cj*Wn*Ls + Cjsw(2*Ls+Wn)]

Cpara = 2*(0.56 fF/um)(Wp) + 2*[(1.88 fF/um^2)(Wp)(0.72 um) + (0.37um)(2*(0.72 um) + Wp)] + 3*(0.63 fF/um)(Wn) + 3*[(1.92 fF/um^2)(Wn)(0.72 um) + (0.44 fF/um)(2*(0.72 um) + Wn)]

Cpara = 4.5672*Wp + 7.3572*Wn + 2.9664

 

Cinv = Cp + Cn

Cp = Cpox*Lp*Wp , Cn = Cnox*Ln*Wn

Cpox = Cnox = 6.03fF/um^2 , Lp = Ln = 0.24um

Thus, Cinv = 6.03*0.24*(0.3 + 0.9) = 1.736fF

Because there are 32 Invertor driven by nand gate,

Cinv_all = 32 * Cinv = 32 * 1.736 = 55.552

Cload = Cinv_all + Clumped_wire = 55.552 + 100 = 155.552

Ctotal = Cpara + Cload

Ctotal = 7.357Wn + 4.567Wp + 2.966+ 155.552 = 7.357Wn + 4.567Wp + 158.518 ------ eq.1

 

(b) Find Wn,Wp

* Method 1

Imax_delay = Ctotal * dV/dt

--- It takes 300 ps to reach Vm(switching point). Assume switching point is Vdd/2

Imax_delay = (7.357*Wn + 4.567*Wp + 158.518) * (1.25 V) / 300

Imax_delay = 3.066E-5*Wn + 1.903E-5*Wp + 6.605E-4 ---------------eq.a

--- Assume that this circuit operate in a saturation region, find worst case I_dsat

In_dsat = kn'/2 * (Wn / (2*Ln)) * (Vgs - Vt)^2 ---------- 2 times Ln2 because of resistance factor(2 n-mos in series)

In_dsat = 275E-6/2 * (Wn / 2*0.24 um) * (2.5 - 0.43)^2

In_dsat(worst falling case) = 1.23E-3*Wn --------------- eq.b

Ip_dsat = kp'/2 * (Wp / L) * (Vgs - Vt)^2

Ip_dsat = (96E-6)/2 * (Wp / 0.24 um) * (2.5 - 0.62)^2

Ip_dsat(worst rising case) = 7.069E-4 * Wp ------------ eq.c

 

To find minimum Width, let I_dsat = Imax_delay.

Thus, from eq.a, eq.b ,eq.c

1.903E-5*Wp + 3.0655E-5*Wn + 6.605E-4 = 7.069E-4*Wp

6.8787E-4*Wp - 3.0655E-5*Wn = 6.605E-4 --- eq.4

1.903E-5*Wp + 3.0655E-5*Wn + 6.605E-4 = 1.23E-3*Wn

1.19E-3*Wn - 1.903E-5*Wp = 6.605E-4 --- eq.5

From eq.4, eq.5

Wn = 0.630E-4 = 0.5692um

Wp = 0.931E-4 = 0.9856um

 

Following is the result from hspice with this Wn, Wp

HSpice Result
$DATA1 SOURCE='HSPICE' VERSION='W-2005.03-SP1 TA1 SOURCE='HSPICE' VERSION='W-2005.03-SP1 '
.TITLE '**************** nand gate ***************'
pavghl pavglh tphl tplh
temper alter#
3.629e-04 1.856e-04 9.240e-10 1.639e-10
25.0000 1.0000

I realized that Wp, Wn is too small. In this method, The problem is that I assumed that dV/dt is linear and it may suffer from velocity seturation.

 

* Method 2 ( more accurate )

Wn_min = channel length*0.5*3 = 0.24 * 0.5 * 3 = 0.36

Wp_min = Wn_min * 2.5 = 0.9

 

Idsat_n_min = k'W/L [ (Vdd - Vt) Vdsat - V^2dsat/2 ]

= 115 * E-6* (0.36/0.24) [2.5 * 0.63 - (0.63/2)^2]

= 2.5457E-4

Req_n_min = 3/4 * Vdd/Idsat * (1-5/6 * lamda * Vdd)

Req_n_min = 6459.15

TpHL_min = 0.69 * Req_n_min * Cload --------------- eq.2

TpHL_min = 693.27psec

6459.15 : 693.27 = Rnew : 300

Thus, Rnew=2795.07

2.31 times increase from minimum n-mos

Wn = 2.3109* Wmin *2 (Because of 2 nmos serize) = 1.66

Wp = 2.3109 * Wmin * 2.5 = 2.08

HSpice Result
A1 SOURCE='HSPICE' VERSION='W-2005.03-SP1 '
.TITLE '************************** nand gate *****************************'
pavghl pavglh tphl tplh temper alter#
3.483e-04 2.761e-04 3.695e-10 2.298e-10 25.0000 1.0000

I found from this result tphl is over 300ps.

In this case, while finding Wn, Wp I ignored Cpara. I assume that Cpara is very small value compare to Cload, so that I ignored.

Now I am trying to fix this problem

Ctotal = 7.357*1.66 + 4.567*2.08 + 158.518

Ctotal = 180.23

TpHL_min = 0.69 * Req_n_min * Ctotal = 803.2 psec

 

6459.15 : 803.2 = Rnew : 300

Thus, Rnew= 2412.53

2.68 times increase

Result.1

Wn = 2.68* Wmin *2 (Because of 2 nmos serize) = 1.93

Wp = 2.68 * Wmin * 2.5 = 2.412

HSpice Result
$DATA1 SOURCE='HSPICE' VERSION='W-2005.03-SP1 '
.TITLE '************************** nand gate *****************************'
pavghl pavglh tphl tplh temper alter#
3.623e-04 2.815e-04 3.256e-10 2.036e-10 25.0000 1.0000

From this result, I found that still little bit over, because I used simple model to calculation or there are more small paracitic capacident in accurate model of 0.24.

tplh is only 2.036 so, I can reduce the size of p-mos

So, to optimize I'll increase Wn from 1.93 to 2.62 , and decrease Wp from 2.412 to 3.72

Result.2

Wn, Wp is rounded automatically by tool. Thus, this is final result for the worst case. If we take not worst case the size will be reduced a lot, especially pmos.

 

If we use sample input stimulus for nor gate, we can take Wn=2.20,Wp=1.65 to meet 300ps delay. But there are no rising worst case in the sample file. So, to find worst case, we should change the input stimulus. and I tried simulation several time to get the size.

Wn = 2.62

Wp = 3.72

 

(5) A table comparing tPHL, tPLH, tP, PSTAT and PDYN for the analytical model and the simulation.

I used hand calculated value of Wn and Wp here

(a) tpHL = 0.69 * (3/4) (Cload * Vdd ) / Idsat_n

In case of Vdd >> Vtn + Vdsat_n / 2 => tpHL = 0.52 * Cload / (W/L)n * k'n * Vdsat_n (textbook 202)

Thus, tpHL = 0.52 *(155.552fp / ((1.93/(0.24*2)) * (115 * 10E-6) * 0.63) = 277.667 ps => (serize of nmos. thus, 2*L)

 

(b) tpLH = 0.69 * (3/4) (Cload * Vdd ) / Idsat_p

In case of Vdd >> Vtn + Vdsat_n / 2 => tpHL = 0.52 * Cload / (W/L)n * k'n * Vdsat_n (textbook 202)

Thus, tpLH = 0.52 *(155.552 / ((2.412/0.24) * (-30 * 10E-6) * -1) = 268.282 ps

 

(c) tP = (tpHL + tpLH) / 2 = 272.9745 ps

 

(d) Pdyn (out from 0 to 1) = Ctotal * Vdd^2 * f = 155.552fp * (2.5)^2 *(1 / 268.282ps) ) = 3.623mW

Pdyn (out from 1 to 0) = Ctotal * Vdd^2 * f = 155.552fp * (2.5)^2 *(1 / 277.667ps) (time from 0 to 1)) = 3.50mW

Pdyn (out from 1 to 1) = affected by Cpara. so that the number is small.

 

(e) Pstat = (In_stat + Ip_stat) / 2 * Vdd = (1.23E-3*Wn + 7.069E-4*Wp) / 2 * 2.5

= (1.23E-3*2.412 + 7.069E-4*2.68)/2*2.5 = 5.335nW (almost 0)

Analytical . (Result.1)
Simulation . (Result.2)
tPHL
277.667 ps
270ps(worst case)
tPLH
268.282 ps
300ps (worst case)
tP
272.9745 ps
285ps
Pstat
5.335nW (almost 0)
94.48pW (almost 0)
Pdyn

3.623mW(0->1) 3.50mW(1->0)

3.213mW(0->1) 3.551mW(1->0)

 

(6) Simulation waveforms with performance metarics annotated.

* Simulation Wave form showing voltage change for in0, in1, out0

* The input vector can produce both worst case, you can see the delay for rising and falling near 300ps

 

* Pdyn and Pstat

Static power consumtion for cmos is very small, As you can see below, static power is almost zero.

When output is going to hi->low or going to out low->hi, dynamic power is peaking. When out is not changing, there are no dynamic power consumtion, and static power consumtion is almost 0.

 

(7) An image of your layout, with the total height and width annotated.

Total height = 0.24* (1/2) * 140 = 16.8

Vdd, Gnd Width = 0.24 * (1/2) * 20 = 2.4

(a) Layout Image

You should determin more detail thing manually in this stage (e.g. wire length, routing, etc..)

More close to real hardware.

 

(b) Extracted

To make actual hardware description from layout object. you can see the mosfet is overlaping.

You can make netlist here.

 

(8) Image of simulator output with the layout.

(a) Functional Simulation

You can see the same result with the result of schemetic

 

(b) HSpice result

The result is very simular with schmetic. This is worst case simulation of each 0->1, 1->0,

Power is also almost same.

Result from extracted layout

Analytical . (Result.1)
Simulation . (Result.2)
tPHL
277.667 ps
260ps(worst case)
tPLH
268.282 ps
270ps (worst case)
tP
272.9745 ps
285ps
Pstat
5.335nW (almost 0)
94.48pW (almost 0)
Pdyn

3.623mW(0->1) 3.50mW(1->0)

3.103mW(0->1) 4.299mW(1->0)

 

* Layout for not worst case. The delay for rising is very short compare to worst case. So, we have to take Wn,Wp from worst case.

 

* Power plot for layout.

 

Result : The result from schmetic and the result from layout is very simular. so, there are no problem on DUT. My work flow is like this.

(1) I calculated Wn,Wp and other predictable result.

(2) I draw schemetic circuit.

(3) I extracted netlist(for cadence) from schemetic circuit.

(4) I changed from cadence netlist format to IRSIM netlist format by perl script given by TA

(5) I checked the functional behavior.

(6) I changed from cadence netlist format to Hspice netlist format.

(7) I checked delay and power. I continued to modify Wn,Wp until I could get profit size for given delay requirement.

(8) I draw layout and extract it.

(9) I extracted netlist from layout design.

(10) I checked equvalance between schmetic and layout with LVS and by resimulation with IRSIM and hspice.

 

-- FINAL RESULT --