Lab 1: Design of a CMOS NAND Gate - DuSung,Kim -
Objective: Design a CMOS 2-input nand gate. You will follow the VLSI design flow in a complete design and verification of this gate. This exercise introduces both the design flow and the CAD tools we will use in this class.
Specifications:
The gate must be designed with minimum channel length in 0.25 um technology
(L=240nm).
The gate must be able to drive 32 minimum size CMOS inverters (Wn=0.3um, Wp=0.9um)
and 100fF of lumped wiring cap.
The propagation delay must be less than 300ps.
Assume the inputs have 50ps rise and fall times.
The layout must fit our standard cell library: total height = 140 lambda, with
20 lambda high rails in M1 for vdd and ground on the top and bottom edges of
the cell (respectively). This leaves 100 lambda of vertical space between the
rails.
You may only use M1 and M2 metal layers.
The inputs and output must be accessible from the top of the cell (VDD side)
in metal 2.
You must minimize the width of the cell.
Here is an example NOR gates in Cadence Note: Dimensions may vary from the spec
for this lab.
(1) Truth table
Bool equation : OUT0' = IN0 + IN1
Behavior : Out is 0 if every input is 1, otherwise 0.
OUT0 |
IN0 |
IN1 |
1 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
0 |
1 |
1 |
(2) Schematic Image
(3) Functional Simulation (Logic Simulation)
(4) Hand calculations.
(a) Capacident calculation
Worst case for falling is one of this, 00->11, 01->11, 10->11 in this case nmos is connected by serial. If input vector was 00, it means all capacitance of nmos is already discharged, so it is not the case. If IN0 was 0 and IN1 was 1, Cpara for NM2 is already discharged, so it is not the worst case as well. But,when IN0 was 1 and IN1 was 0 and IN1 is going up to high, Cpara of NM1 and NM2 both should be discharged. so, It is the worst falling case.
Worst falling case : IN0 = 1, IN1 = 0 => IN0 = 1, IN1 = 1.
pmos is connected parallely. Req(IN0 = 0, IN1 = 0) is smaller than Req(IN0 = 1, IN1 = 0) So (IN0 = 1,IN1 = 1) => (IN0=1,IN1=0) is the worst case. Worst rising case : IN0 = 1, IN1 = 1 => IN0 = 1, IN1 = 0Cpara = CgdpA + CdbpA + CgdpB + CdbpB + CgdnB + CdbnB + CgsnB + CsbnB + CgdnA + CdbnA.
(Assume that the drain and source of each transistor is geometrically identical, the two PMOS transistors are identical, and the two NMOS transistors are identical.)
Cpara = 2*(Cgdp + Cdbp) + 3*(Cgdn+ Cdbn)
Cpara = 2*(Cgdo*Wp) + 2*[Cj*Wp*Ls + Cjsw(2*Ls+Wp)] + 3*(Cgdo*Wn) + 3*[Cj*Wn*Ls + Cjsw(2*Ls+Wn)]
Cpara = 2*(0.56 fF/um)(Wp) + 2*[(1.88 fF/um^2)(Wp)(0.72 um) + (0.37um)(2*(0.72 um) + Wp)] + 3*(0.63 fF/um)(Wn) + 3*[(1.92 fF/um^2)(Wn)(0.72 um) + (0.44 fF/um)(2*(0.72 um) + Wn)]
Cpara = 4.5672*Wp + 7.3572*Wn + 2.9664
Cinv = Cp + Cn
Cp = Cpox*Lp*Wp , Cn = Cnox*Ln*Wn
Cpox = Cnox = 6.03fF/um^2 , Lp = Ln = 0.24um
Thus, Cinv = 6.03*0.24*(0.3 + 0.9) = 1.736fF
Because there are 32 Invertor driven by nand gate,
Cinv_all = 32 * Cinv = 32 * 1.736 = 55.552
Cload = Cinv_all + Clumped_wire = 55.552 + 100 = 155.552
Ctotal = Cpara + Cload
Ctotal = 7.357Wn + 4.567Wp + 2.966+ 155.552 = 7.357Wn + 4.567Wp + 158.518 ------ eq.1
(b) Find Wn,Wp
* Method 1
Imax_delay = Ctotal * dV/dt
--- It takes 300 ps to reach Vm(switching point). Assume switching point is Vdd/2
Imax_delay = (7.357*Wn + 4.567*Wp + 158.518) * (1.25 V) / 300
Imax_delay = 3.066E-5*Wn + 1.903E-5*Wp + 6.605E-4 ---------------eq.a
--- Assume that this circuit operate in a saturation region, find worst case I_dsat
In_dsat = kn'/2 * (Wn / (2*Ln)) * (Vgs - Vt)^2 ---------- 2 times Ln2 because of resistance factor(2 n-mos in series)
In_dsat = 275E-6/2 * (Wn / 2*0.24 um) * (2.5 - 0.43)^2
In_dsat(worst falling case) = 1.23E-3*Wn --------------- eq.b
Ip_dsat = kp'/2 * (Wp / L) * (Vgs - Vt)^2
Ip_dsat = (96E-6)/2 * (Wp / 0.24 um) * (2.5 - 0.62)^2
Ip_dsat(worst rising case) = 7.069E-4 * Wp ------------ eq.c
To find minimum Width, let I_dsat = Imax_delay.
Thus, from eq.a, eq.b ,eq.c
1.903E-5*Wp + 3.0655E-5*Wn + 6.605E-4 = 7.069E-4*Wp
6.8787E-4*Wp - 3.0655E-5*Wn = 6.605E-4 --- eq.4
1.903E-5*Wp + 3.0655E-5*Wn + 6.605E-4 = 1.23E-3*Wn
1.19E-3*Wn - 1.903E-5*Wp = 6.605E-4 --- eq.5
From eq.4, eq.5
Wn = 0.630E-4 = 0.5692um
Wp = 0.931E-4 = 0.9856um
Following is the result from hspice with this Wn, Wp
HSpice
Result |
| $DATA1 SOURCE='HSPICE' VERSION='W-2005.03-SP1 TA1 SOURCE='HSPICE'
VERSION='W-2005.03-SP1 ' .TITLE '**************** nand gate ***************' pavghl pavglh tphl tplh temper alter# 3.629e-04 1.856e-04 9.240e-10 1.639e-10 25.0000 1.0000 |
I realized that Wp, Wn is too small. In this method, The problem is that I assumed that dV/dt is linear and it may suffer from velocity seturation.
* Method 2 ( more accurate )
Wn_min = channel length*0.5*3 = 0.24 * 0.5 * 3 = 0.36
Wp_min = Wn_min * 2.5 = 0.9
Idsat_n_min = k'W/L [ (Vdd - Vt) Vdsat - V^2dsat/2 ]
= 115 * E-6* (0.36/0.24) [2.5 * 0.63 - (0.63/2)^2]
= 2.5457E-4
Req_n_min = 3/4 * Vdd/Idsat * (1-5/6 * lamda * Vdd)
Req_n_min = 6459.15
TpHL_min = 0.69 * Req_n_min * Cload --------------- eq.2
TpHL_min = 693.27psec
6459.15 : 693.27 = Rnew : 300
Thus, Rnew=2795.07
2.31 times increase from minimum n-mos
Wn = 2.3109* Wmin *2 (Because of 2 nmos serize) = 1.66
Wp = 2.3109 * Wmin * 2.5 = 2.08
HSpice
Result |
| A1 SOURCE='HSPICE' VERSION='W-2005.03-SP1 ' .TITLE '************************** nand gate *****************************' pavghl pavglh tphl tplh temper alter# 3.483e-04 2.761e-04 3.695e-10 2.298e-10 25.0000 1.0000 |
I found from this result tphl is over 300ps.
In this case, while finding Wn, Wp I ignored Cpara. I assume that Cpara is very small value compare to Cload, so that I ignored.
Now I am trying to fix this problem
Ctotal = 7.357*1.66 + 4.567*2.08 + 158.518
Ctotal = 180.23
TpHL_min = 0.69 * Req_n_min * Ctotal = 803.2 psec
6459.15 : 803.2 = Rnew : 300
Thus, Rnew= 2412.53
2.68 times increase
Result.1
Wn = 2.68* Wmin *2 (Because of 2 nmos serize) = 1.93
Wp = 2.68 * Wmin * 2.5 = 2.412
HSpice
Result |
| $DATA1 SOURCE='HSPICE' VERSION='W-2005.03-SP1 ' .TITLE '************************** nand gate *****************************' pavghl pavglh tphl tplh temper alter# 3.623e-04 2.815e-04 3.256e-10 2.036e-10 25.0000 1.0000 |
From this result, I found that still little bit over, because I used simple model to calculation or there are more small paracitic capacident in accurate model of 0.24.
tplh is only 2.036 so, I can reduce the size of p-mos
So, to optimize I'll increase Wn from 1.93 to 2.62 , and decrease Wp from 2.412 to 3.72
Result.2
Wn, Wp is rounded automatically by tool. Thus, this is final result for the worst case. If we take not worst case the size will be reduced a lot, especially pmos.
If we use sample input stimulus for nor gate, we can take Wn=2.20,Wp=1.65 to meet 300ps delay. But there are no rising worst case in the sample file. So, to find worst case, we should change the input stimulus. and I tried simulation several time to get the size.
Wn = 2.62
Wp = 3.72
(5) A table comparing tPHL, tPLH, tP, PSTAT and PDYN for the analytical model and the simulation.
I used hand calculated value of Wn and Wp here
(a) tpHL = 0.69 * (3/4) (Cload * Vdd ) / Idsat_n
In case of Vdd >> Vtn + Vdsat_n / 2 => tpHL = 0.52 * Cload / (W/L)n * k'n * Vdsat_n (textbook 202)
Thus, tpHL = 0.52 *(155.552fp / ((1.93/(0.24*2)) * (115 * 10E-6) * 0.63) = 277.667 ps => (serize of nmos. thus, 2*L)
(b) tpLH = 0.69 * (3/4) (Cload * Vdd ) / Idsat_p
In case of Vdd >> Vtn + Vdsat_n / 2 => tpHL = 0.52 * Cload / (W/L)n * k'n * Vdsat_n (textbook 202)
Thus, tpLH = 0.52 *(155.552 / ((2.412/0.24) * (-30 * 10E-6) * -1) = 268.282 ps
(c) tP = (tpHL + tpLH) / 2 = 272.9745 ps
(d) Pdyn (out from 0 to 1) = Ctotal * Vdd^2 * f = 155.552fp * (2.5)^2 *(1 / 268.282ps) ) = 3.623mW
Pdyn (out from 1 to 0) = Ctotal * Vdd^2 * f = 155.552fp * (2.5)^2 *(1 / 277.667ps) (time from 0 to 1)) = 3.50mW
Pdyn (out from 1 to 1) = affected by Cpara. so that the number is small.
(e) Pstat = (In_stat + Ip_stat) / 2 * Vdd = (1.23E-3*Wn + 7.069E-4*Wp) / 2 * 2.5
= (1.23E-3*2.412 + 7.069E-4*2.68)/2*2.5 = 5.335nW (almost 0)
Analytical
. (Result.1) |
Simulation
. (Result.2) |
|
tPHL |
277.667 ps |
270ps(worst case) |
tPLH |
268.282 ps |
300ps (worst case) |
tP |
272.9745 ps |
285ps |
Pstat |
5.335nW (almost 0) |
94.48pW (almost 0) |
Pdyn |
3.623mW(0->1) 3.50mW(1->0) |
3.213mW(0->1) 3.551mW(1->0) |
(6) Simulation waveforms with performance metarics annotated.
* Simulation Wave form showing voltage change for in0, in1, out0
* The input vector can produce both worst case, you can see the delay for rising and falling near 300ps
* Pdyn and Pstat
Static power consumtion for cmos is very small, As you can see below, static power is almost zero.
When output is going to hi->low or going to out low->hi, dynamic power is peaking. When out is not changing, there are no dynamic power consumtion, and static power consumtion is almost 0.
(7) An image of your layout, with the total height and width annotated.
Total height = 0.24* (1/2) * 140 = 16.8
Vdd, Gnd Width = 0.24 * (1/2) * 20 = 2.4
(a) Layout Image
You should determin more detail thing manually in this stage (e.g. wire length, routing, etc..)
More close to real hardware.
(b) Extracted
To make actual hardware description from layout object. you can see the mosfet is overlaping.
You can make netlist here.
(8) Image of simulator output with the layout.
(a) Functional Simulation
You can see the same result with the result of schemetic
(b) HSpice result
The result is very simular with schmetic. This is worst case simulation of each 0->1, 1->0,
Power is also almost same.
Result from extracted layout
Analytical . (Result.1) Simulation . (Result.2) tPHL 277.667 ps 260ps(worst case) tPLH 268.282 ps 270ps (worst case) tP 272.9745 ps 285ps Pstat 5.335nW (almost 0) 94.48pW (almost 0) Pdyn3.623mW(0->1) 3.50mW(1->0)
3.103mW(0->1) 4.299mW(1->0)
* Layout for not worst case. The delay for rising is very short compare to worst case. So, we have to take Wn,Wp from worst case.
* Power plot for layout.
Result : The result from schmetic and the result from layout is very simular. so, there are no problem on DUT. My work flow is like this.
(1) I calculated Wn,Wp and other predictable result.
(2) I draw schemetic circuit.
(3) I extracted netlist(for cadence) from schemetic circuit.
(4) I changed from cadence netlist format to IRSIM netlist format by perl script given by TA
(5) I checked the functional behavior.
(6) I changed from cadence netlist format to Hspice netlist format.
(7) I checked delay and power. I continued to modify Wn,Wp until I could get profit size for given delay requirement.
(8) I draw layout and extract it.
(9) I extracted netlist from layout design.
(10) I checked equvalance between schmetic and layout with LVS and by resimulation with IRSIM and hspice.
-- FINAL RESULT --