#### Multi-Core Microprocessor Chips: Motivation & Challenges

#### **Dileep Bhandarkar, Ph. D.**

Architect at Large Digital Enterprise Group Intel Corporation

May 2006



Copyright © 2006 Intel Corporation.

Intel<sup>®</sup> Higher Education Program 2006 Intel Distinguished Lecture

# Agenda

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? 0 Power/Performance Trade-Offs • CMP Directions Beyond CMP Summary



©2006, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countrie \*Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distin

### Intel only: On-time "2-year-cycle"

180nm 130nm 90nm 65nm 45nm Wafer Size (mm): 200/300 300 300 300 200 1999 2005 2007 1<sup>st</sup> Production: 2001 2003 Transistors: SiG SiGe Interconnects: 100nm L<sub>G</sub> 50nm L<sub>G</sub> 70nm L<sub>G</sub> 35nm L<sub>G</sub> **Details** CoSi<sub>2</sub> CoSi<sub>2</sub> NiSi NiSi **Coming! Strain Si Strain Si 8 Cu 6 AI 6** Cu **7** Cu

SiOF

Low-k

Low-k

SiOF

### 45 nm Logic Process on Track for Delivery in 2007



#### Moore's Law continues!

Intel continues to develop a new technology generation every 2 years



#### Intel 11th EMEA Academic Forum

#### **Historical Driving Forces**



Intel<sup>®</sup> Higher Education Program

#### **The Challenges**



Power = Capacitance x Voltage<sup>2</sup> x Frequency also Power ~ Voltage<sup>3</sup>



## Agenda

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs • CMP Directions Beyond CMP Summary



©2005, Intel Corporation

Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries \*Other names and brands may be claimed as the property of others

Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# **Design Challenges**

- Memory latency not scaling as fast as processor speed
- Power growing non-linearly with single thread performance
- Designer productivity lagging design complexity
- Ability to validate and test complex design
- Keeping up with new process technology every two years



Intel<sup>®</sup> Higher Education Program

www.intel.com/education

#### Long Latency DRAM Accesses: Needs Latency Tolerant Techniques



# **DRAM Latency Tolerance**

Continue building even larger caches Every semiconductor process generation provides opportunity to double cache size Cache becomes larger part of die Hide multiple threads of execution behind memory latency Intel implemented simultaneous multithreading in 2000 Implement multi-core products as Moore's Law allows

Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# Agenda

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? 0 Power/Performance Trade-Offs CMP Directions Beyond CMP Summary



©2005, Intel Corporation

Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries \*Other names and brands may be claimed as the property of others

Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# Situational Analysis

- With Each Process Generation transistor density doubles
  - Frequency has increased by ~1.5X; ~1.3x in future
  - Vcc has scaled by about ~0.8x; ~0.9x in future
  - Capacitance has scaled by 0.7x
  - Total power may not scale down due to increased leakage
- Instruction Level Parallelism harder to find
- Increasing single-stream performance often requires non-linear increase in design complexity
- Many server applications are inherently parallel
- Parallelism exists in multimedia applications
  Multi-tasking usage models becoming popular

Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# **Processor** Power



Intel<sup>®</sup> Higher Education Program

www.intel.com/education

#### **Design Complexity and Productivity factors**

- Huge transistor budgets stress ability to design and verify complex chips
- Multi-core fits well with increasing transistor budgets
- Multi-core design addresses density/designer gap



Figure 2. Design complexity and designer productivity. Since 1980, the design gap between growth in chip complexity and productivity growth in logic design tools has widened each year.

Intel<sup>®</sup> Higher Education Program

**inte** 

www.intel.com/education

# Agenda

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? • Power/Performance Trade-Offs CMP Directions Beyond CMP Summary



©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countrie \*Other names and brands may be claimed as the property of others www.intel.com/education 2006 Intel Distin

Iron Law of Performance Execution Time is the product of – Path Length Cycles Per Instruction (CPI) - Cycle Time • CPI is the sum of – infinite-cache core cpi – miss rate \* effective memory latency Bad (good) news is that performance does not scale up (down) linearly with frequency



Intel<sup>®</sup> Higher Education Program

www.intel.com/education

### The Magic of Voltage Scaling

Power = Capacitance \* Voltage<sup>2</sup> \* Frequency Frequency α Voltage in region of interest Power increases as the cube of Frequency Good news is that voltage scaling works 10% reduction in voltage yields – 10% reduction in frequency 30% reduction in power – less than 10% reduction in performance



Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# Simple Dual Core Example

- Assume Single Core processor at 100W
   80W for core, 20W for cache and I/O
  - 50% die are is core
- Dual core within same power envelop
  - -20W for I/O and cache
  - 40W per core
  - Die size increases by 50%
  - Reduce voltage by 21% to reduce core power to 40W
  - Frequency reduces by ~20%
  - Single thread perf reduces by ~15%
  - Throughput increases by 70-80%

Intel<sup>®</sup> Higher Education Program

**(inte** 

www.intel.com/education

# Possible Improvements

- Develop new power efficient core
  - E.g. extensive clock gating
  - Big power savings with little or no performance loss
- Design a smaller core with lower performance
  - Area and power savings much greater than performance loss
  - Use larger number of cores
- Adjust frequency and power of each core with load factor
  - Inactive cores can be put in sleep mode
  - Maintain overall die power constant

Intel<sup>®</sup> Higher Education Program

(intel

www.intel.com/education

# A New Era...

THE NEW

#### THE OLD

Performance Equals Frequency

**Unconstrained Power** 

**Voltage Scaling** 

Performance Equals IPC Multi-Core Power Efficiency Microarchitecture Advancements



### Intel Core Micro-architecture Five Key Innovations

Intel<sup>®</sup> Wide Dynamic Execution

Intel<sup>®</sup> Advanced Digital Media Boost



Intel<sup>®</sup> Intelligent Power Capability

Intel<sup>®</sup> Smart Memory Access

Intel<sup>®</sup> Advanced Smart Cache

# **Multi-Core Trajectory**



#### 2H 2006

1H 2007

#### **Quad-Core**



#### **Microprocessor Design Model**



**OBJECTIVE: Sustained Technology Leadership** 



# Agenda

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary



©2005, Intel Corporation

Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries \*Other names and brands may be claimed as the property of others

Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# **Possible Evolution**

- Transistor density doubles with each process generation
- New generation enables complex new core
- Possible alternative design point
  - Double the cache capacity in same area
  - Double the number of processor cores
  - Frequency improves with process technology

|                        | Core                            | Core                 | Core             |           | ero2<br>Core   | eroD<br>eroD                   |              |
|------------------------|---------------------------------|----------------------|------------------|-----------|----------------|--------------------------------|--------------|
|                        | Cache                           | 2 x C                | ache             |           | 4 x Cache      |                                |              |
| Intel <sup>®</sup> Hig | 90 nm<br>gher Education Program | 65<br>www.intel.com/ | nm<br>/education | all trans | in the start . | <b>nm</b><br>D6 Intel Distingu | ished Lectur |

ire

#### **Ramping Multi-core Everywhere**

|                                                                    | 2005     | 2006* | 2007* |                         |  |  |  |  |
|--------------------------------------------------------------------|----------|-------|-------|-------------------------|--|--|--|--|
| Desktop<br>Mainstream/Performance                                  | Shipping | >70%  | >90%  | Desktop<br>Client       |  |  |  |  |
| Mobile<br>Mainstream/Performance                                   | Shipping | >70%  | >90%  | Mobile<br>Client        |  |  |  |  |
| Server                                                             | Shipping | >85%  | ~100% | Server &<br>Workstation |  |  |  |  |
| * Data is projected run rate exiting the year. Source: Intel       |          |       |       |                         |  |  |  |  |
| Expect to ship >60 million multi-core<br>processors by end of 2006 |          |       |       |                         |  |  |  |  |

All products and dates are preliminary and subject to change without notice.

# **CMP** Challenges

- How much Thread Level Parallelism is there in most workloads?
- Ability to generate code with lots of threads & performance scaling
- Thread synchronization
- Operating systems for parallel machines
- Single thread performance tradeoff
- Power limitations
- On-chip interconnect/cache infrastructure
   Memory and I/O bandwidth required

Intel<sup>®</sup> Higher Education Program

www.intel.com/education

### Intel's Software Tools and Support



Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# How Many Cores?

- Where does the doubling stop?
  Driven by software issues
  Today Microsoft Windows supports only 64 threads!
  How many applications scale to 64 threads?
- How well does performance scale with thread count?



Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# Agenda

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs • CMP Directions Beyond CMP Summary



©2005, Intel Corporation

Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries \*Other names and brands may be claimed as the property of others

Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# Looking Beyond CMP

How far do we push the number of general purpose cores?
Is there are role for application specific engines?
Programming model for heterogeneous cores



Intel<sup>®</sup> Higher Education Program

www.intel.com/education

### Improving Power Efficiency



Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# **Application Specific Engines**

- Can achieve better power efficiency than general purpose cores
  Simpler design due to targeted application and lack of support for full operating
  - system
- Challenge
  - Needs to support high volume application
  - Reconfigurable?
- Graphics and Multimedia engines are good candidates



Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# Agenda

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs • CMP Directions Beyond CMP Summary



©2005, Intel Corporatio

Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries \*Other names and brands may be claimed as the property of others

Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# Summary

One billion transistors are here already!
Chip Level Multiprocessing and large caches can exploit Moore's Law
Amount of parallelism in future microprocessor systems will increase

- Heterogeneous cores may emerge eventually
- Need applications and tools that can exploit parallelism
- Design challenges and software issues remain

#### **Collaborate, Innovate, Lead!**

Intel<sup>®</sup> Higher Education Program

www.intel.com/education

# **Closing Thought**

"Don't be encumbered by past history, go off and do something wonderful."

# - Robert Noyce Intel Co-founder



Intel<sup>®</sup> Higher Education Program

www.intel.com/education