Variability in Microprocessor Logic Design: Trends, Sources, Consequences, & Solutions

## Keith A. Bowman Circuit Research Lab, Intel

keith.a.bowman@intel.com

Acknowledgements: Jim Tschanz, Vivek De, Tanay Karnik, and Steve Duvall

June 7, 2010

**UPC Seminar** 

## **Problem Statement:**

- Variability is one of the primary challenges in the semiconductor industry
- Adversely impacts performance, power, yield, reliability, & time-to-market

## **Focus Areas:**

- 1) Impact of variations on logic design
- 2) Variation-tolerant circuits
- 3) Tomorrow: Resilient microprocessor design for dynamic variation tolerance

# Outline

- Motivation & Technology Trends
- Sources of Variability
- Static Variations:
  - Impact on Logic Design
  - Variation-Tolerant Circuits
- Dynamic Variations:
  - Impact on Logic Design
  - Variation-Tolerant Circuits
- Summary



- Microprocessors with multi-billion transistors...
- Trillion instructions per second performance...
- Constant power envelope...
- ✓ Lower costs…

## **Challenge: Variations**



Gate length control is one of the "grand challenges" in the semiconductor industry – ITRS, 2009

# **Cost of Variations**

#### **Overestimating Variations**

- Increases design time
- Larger power & die size
- Rejection of otherwise good design options
- Missed market windows
- Impacts design

#### **Underestimating Variations**

- Functional yield loss
- Performance reduction
- Increases silicon debug time

Impacts manufacturing

## **Impact of Variations on Revenue**



## **Impact of Variations on Revenue**





Revenue exponentially increases across FMAX bins

# **Technology Outlook**

| Year                    | 2008                             | 2010 | 2012 | 2014            | 2016    | 2018   |
|-------------------------|----------------------------------|------|------|-----------------|---------|--------|
| Technology Node (nm)    | 45                               | 32   | 22   | 16              | 11      | 8      |
| Bulk Planar CMOS        | High Probability                 |      |      | Low Probability |         |        |
| Alternative Device (3G) | Low Probability High Probability |      |      |                 | ability |        |
| Variability             | Mediu                            | m    | Hi   | igh             | Very    | y High |

## **Gate Overdrive Degradation**



Gate overdrive reduction amplifies impact of V<sub>CC</sub>, V<sub>T</sub>, & L variations on drive current

## **Strategic Research Objective**

# Design Reliable Systems with Unreliable Components

# Outline

- Motivation & Technology Trends
- Sources of Variability
  - Static Variations:
    - Impact on Logic Design
    - Variation-Tolerant Circuits
  - Dynamic Variations:
    - Impact on Logic Design
    - Variation-Tolerant Circuits
  - Summary

# **Sources of Variability**

## 1) Static Process Variations

## **2)** Dynamic Operational Variations

## 3) Simulation Tool Uncertainty

## **Scale of Variations**

### Die-to-Die (D2D) Variations

### Within-Die (WID) Variations



Systematic F



**Wafer Scale** 

Die Scale

**Feature Scale** 

## **Static Variations**

# **Gate Length Variation**

## **Die-to-Die Variation**

 Examples: Processing temperatures, equipment properties, polishing, die placement, resist thickness

### **Systematic Within-Die Variation**

• Examples: Lens aberrations, mid-range flare, stepper non-uniformities, scanner overlay control, multiple dies per reticle, wafer topography

### **Random Within-Die Variation**

• Examples: Patterning limitations, shortrange flare, line edge roughness





Source: Nagib Hakim

## **Systematic-WID Variation**



Source: H. Masuda, et al., CICC, 2005.

From circuit design perspective, systematic-WID variation behaves as a correlated random-WID variation

# **Gate Length Variation Trends**



- Total CD control approximately fixed percentage of nominal gate length
- Random-WID variations increase with scaling

## **Systematic-WID Correlation Length**



Correlation length scaling by ~1/sqrt(2)

## **Random Dopant Fluctuation**



## **Interconnect Variations**

### 1) Depth of Focus Variation

Depends on neighboring interconnects

### 2) Chemical Mechanical Polishing (CMP)

Depends on metal density

## 3) Etching

Smaller than depth of focus & CMP variations

## Impact of OPC on Isolated Lines

### **Bossung Plot Example (Isolated Drawn Lines)**



## **Dynamic Variations**

# **Supply Voltage Variations**



## **Temperature Variations**

#### Single Core



#### **Dual Core**



Hamann et. al, ITHERM, 2006.

# Processor activity & ambient change Demonsion 400

**> Dynamic: 100 – 1000μs** 

## **Transistor Aging**



> NMOS & PMOS threshold voltages degrade from bias & temperature stress

J. Tschanz, et al., Symp. VLSI Circuits, 2009.

## **Additional Dynamic Variations**

Cross-Coupling Capacitance



Multiple-Input Switching



# Outline

- Motivation & Technology Trends
- Sources of Variability
- Static Variations:
  - Impact on Logic Design
  - Variation-Tolerant Circuits
  - Dynamic Variations:
    - Impact on Logic Design
    - Variation-Tolerant Circuits
  - Summary



K. Bowman, et al., *JSSC*, 2002.

# Impact of WID Variations on Delay



# Impact of Variations on Delay



#### **FMAX Distribution Model Validation** Density 1.0E+00 **Measured Data** Model Probability 1.0E-02 1.0E-04 Vormalized 1.0E-06 1.0E-08 2 3 -3 -2 -1 0 Δ -4 1

**Number of FMAX Standard Deviations** 

Model agrees closely with measured data from a 0.25μm microprocessor in mean, variance, & shape

> No fitting parameters used in the comparison K. Bowman, et al., JSSC, 2002.

# Impact of Variations on FMAX



WID variations impact FMAX mean

D2D variations impact FMAX variance

## Impact of Variations on Logic Depth Critical Path



**Systematic-WID Variations (** $\rho$ **=1)** 

$$\frac{\sigma_{\mathsf{T}_{\mathsf{CP}}}}{\mathsf{T}_{\mathsf{CP}}} = \frac{\mathsf{N}\sigma_{\mathsf{T}_{\mathsf{GATE}}}}{\mathsf{N}\mathsf{T}_{\mathsf{GATE}}} = \frac{\sigma_{\mathsf{T}_{\mathsf{GATE}}}}{\mathsf{T}_{\mathsf{GATE}}}$$

#### **Random-WID Variations**

| $\sigma_{T_{CP}}$ | $\underline{-} \sqrt{N\sigma_{T_{GATE}}}$ | _ 1  | $\sigma_{T_{GATE}}$      |
|-------------------|-------------------------------------------|------|--------------------------|
| T <sub>CP</sub>   | NT <sub>GATE</sub>                        | _ √N | <b>T</b> <sub>GATE</sub> |

#### Random-WID variations average across N stages

K. Bowman, et al., JSSC, 2002.

# Impact of Variations on Logic Depth



Impact of random-WID variation increases with deeper pipelining

Impact of systematic-WID variation insensitive to pipelining 36

# Impact of Variations on Leakage



# **Static Variation Compensation**

### > Measure:

- Clock Frequency
- Power

## Control Knobs:

- Supply Voltage
- Body Bias

## **Body Bias Review**



### Reverse body bias (RBB)

- **PMOS**: **V**<sub>BP</sub> > **V**<sub>DD</sub>
- NMOS: V<sub>BN</sub> < V<sub>SS</sub>
- V<sub>T</sub> increases (I<sub>ON</sub>, I<sub>OFF</sub> reduce)



- > Forward body bias (FBB)
  - **PMOS**: **V**<sub>BP</sub> < **V**<sub>DD</sub>
  - NMOS: V<sub>BN</sub> > V<sub>SS</sub>
  - V<sub>T</sub> reduces (I<sub>ON</sub>, I<sub>OFF</sub> increase)

# Adaptive V<sub>CC</sub> & Body Bias

**Reduce Impact of D2D Variations** 



## **Adaptive Supply Voltage**



# **Effectiveness of Adaptive Biasing**

![](_page_41_Figure_1.jpeg)

![](_page_41_Figure_2.jpeg)

# **Effectiveness of Adaptive Biasing**

#### Slower Parts (Lower Power)

- Leakage is a small percentage of total power
- Trade-off leakage increase for performance gain
- More effective to apply a forward body bias (FBB)

#### Faster Parts (Higher Power)

- Active & leakage contribute significantly to total power
- V<sub>CC</sub> reduction lowers both active and leakage power
- More effective to reduce V<sub>CC</sub>

## **Static ABB for WID Variations**

#### **WID ABB Concept**

#### **WID ABB Effectiveness**

![](_page_43_Picture_3.jpeg)

**150nm Technology Test-Chip** 

J. Tschanz, et al., JSSC, 2003.

![](_page_43_Figure_6.jpeg)

- No body bias for clock
- Requires triple-well process

# Outline

- Motivation & Technology Trends
- Sources of Variability
- Static Variations:
  - Impact on Logic Design
  - Variation-Tolerant Circuits
- Dynamic Variations:
  - Impact on Logic Design
  - Variation-Tolerant Circuits
  - Summary

# Impact of Dynamic Variations on Conventional Design

![](_page_45_Figure_1.jpeg)

Guardbands required to ensure correct operation within the presence of dynamic variations

## Sensors with Dynamic Voltage & Frequency (DVF) Control

![](_page_46_Figure_1.jpeg)

Detect temperature, V<sub>CC</sub>, & aging variations

#### Adapt F<sub>CLK</sub> & V<sub>CC</sub> to avoid timing violations

T. Fisher, et al., *JSSC*, 2006. R. McGowen, et al., *JSSC*, 2006. J. Tschanz, et al., ISSCC, 2007.

![](_page_47_Figure_0.jpeg)

Adaptive F<sub>CLK</sub> & body bias ensures correct operation & lower leakage at higher temperatures

J. Tschanz, et al., ISSCC, 2007.

# **Power Sensor with DVF**

![](_page_48_Figure_1.jpeg)

Power management scheme increases performance within a power & thermal envelope

T. Fisher, et al., *JSSC*, 2006. R. McGowen, et al., *JSSC*, 2006.

## **Power Sensor with DVF**

![](_page_49_Figure_1.jpeg)

#### Dynamic adaptation to reduce power variation

T. Fisher, et al., *JSSC*, 2006. R. McGowen, et al., *JSSC*, 2006.

## Sensors with Dynamic Voltage & Frequency (DVF) Control

### Advantages:

- Reduces guardbands for slow-changing global dynamic variations
- ✓ Low design overhead

### **Disadvantages:**

- Cannot detect fast-changing or local dynamic variations
- Requires post-silicon calibration

# Summary

- Technology trends amplify microprocessor performance & power variability
- Static Variations:
  - **>** Within-die impacts F<sub>MAX</sub> mean & leakage median
  - > Die-to-die impacts F<sub>MAX</sub> & leakage variances
- Dynamic Variations:
  - Impact F<sub>CLK</sub> guardbands
- Variation-tolerant circuits mitigate the impact of variations on performance & power

# **References (1)**

- [1] S. G. Duvall, "Statistical Circuit Modeling and Optimization," in 5th Intl. Workshop Statistical Metrology, June 2000, pp. 56-63.
- [2] K. A. Bowman, S. G. Duvall, and J. D. Meindl, "Impact of Die-to-Die and Within-Die Parameter Fluctuations on the Maximum Clock Frequency Distribution for Gigascale Integration," IEEE J. Solid-State Circuits, pp. 183-190, Feb. 2002.
- [3] S. Borkar, et al., "Parameter Variations and Impact on Circuits and Microarchitecture," in Proc. 2003 Design Automation Conf. (DAC), June 2003, pp. 338-342.
- [4] H. Masuda, S. Ohkawa, A. Kurokawa, and M. Aoki, "Challenge: Variability Characterization and Modeling for 65- to 90-nm Processes," in IEEE Custom Integrated Circuits Conf. (CICC), Sept. 2005, pp. 593-600.
- [5] Y. Abulafia and A. Kornfeld, "Estimation of FMAX and ISB in Microprocessors," IEEE Trans. VLSI Syst., pp. 1205–1209, Oct. 2005.
- [6] S. M. Burns, M. Ketkar, N. Menezes, K. A. Bowman, J. W. Tschanz, and V. De, "Comparative Analysis of Conventional and Statistical Design Techniques," in Proceedings of the 44th ACM/IEEE Design Automation Conference (DAC), June 2007, pp. 238-243.
- [7] K. A. Bowman, A. R. Alameldeen, S. T. Srinivasan, and C. B. Wilkerson, "Impact of Die-to-Die and Within-Die Parameter Variations on the Clock Frequency and Throughput of Multi-Core Processors," IEEE Trans. VLSI Syst., pp. 1679-1690, Dec. 2009...
- [8] S. Herbert and D. Marculescu, "Characterizing Chip-Multiprocessor Variability-Tolerance," in Proc. 2008 Design Automation Conf. (DAC), June 2008, pp. 313-318.
- [9] J. Tschanz, et al., "Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die Parameter Variations on Microprocessor Frequency and Leakage," IEEE J. Solid-State Circuits, pp. 1396-1402, Nov. 2002.
- [10] J. Tschanz, S. Narendra, R. Nair, and V. De, "Effectiveness of Adaptive Supply Voltage and Body Bias for Reducing Impact of Parameter Variations in Low Power and High Performance Microprocessors," IEEE J. Solid-State Circuits, pp. 826-829, May 2003.
- [11] K. Bowman, J. Tschanz, M. Khellah, M. Ghoneima, Y. Ismail, and V. De, "Time-Borrowing Multi-Cycle On-Chip Interconnects for Delay Variation Tolerance," in *Proceedings of the 2006 International Symposium on* Low Power Electronics and Design (ISLPED), Oct. 2006, pp. 79-84. 53

## **References (2)**

- [12] S. Dighe, et al., "Within-Die Variation-Aware Dynamic-Voltage-Frequency Scaling Core Mapping and Thread Hopping for an 80-Core Processor," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2010, pp. 174-175.
- [13] A. Muhtaroglu, G. Taylor, and T. R. Arabi, "On-Die Droop Detector for Analog Sensing of Power Supply Noise," *IEEE J. Solid-State Circuits*, pp. 651-660, Apr. 2004.
- [14] T. Fischer, J. Desai, B. Doyle, S. Naffziger, and B. Patella, "A 90-nm Variable Frequency Clock System for a Power-Managed Itanium Architecture Processor," *IEEE J. Solid-State Circuits*, pp. 218-228, Jan. 2006.
- [15] R. McGowen, et al., "Power and Temperature Control on a 90-nm Itanium Family Processor," *IEEE J. Solid-State Circuits*, pp. 229-237, Jan. 2006.
- [16] J. Tschanz, et al., "Adaptive Frequency and Biasing Techniques for Tolerance to Dynamic Temperature-Voltage Variations and Aging," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 292-293.

![](_page_54_Picture_0.jpeg)