### Last updated 2/25/21

- Millions to hundreds of millions of gates
  - Physical Space
  - Sheer numbers of gates
  - Keeping shared resources close to multiple users (memory)
  - I/O pin access and placement
  - Interconnect getting all the wires connected
    - Typical processes have 6 10 layers of interconnect
    - Cell, local, global, power
  - Performance
  - Power / Heat Dissipation

- Performance Drivers
  - Process Technology
    - Transistor performance
    - Short channel vs. long channel devices
    - High Vt and Low Vt devices
  - Clock Frequency
    - Maximum is set by the longest unit delay
    - Very complex timing tools used to ensure max frequency
  - Interconnect
    - RC delays
    - Capacitive coupling

- Power Drivers
  - Process Technology Dynamic, Static, Short circuit (D/S/SC)
  - Number of gates D/S/SC
  - Clock Frequency D/SC
    - Dynamic power becomes CV<sup>2</sup>f, where f is clock frequency
    - Short circuit power is also multiplied by f
  - Supply Voltage D/S/SC
  - Routing Efficiency D/SC
    - Minimizing capacitance is critical

- Power / Performance Balance
  - Device Level Solutions
    - Multiple T<sub>ox</sub> (V<sub>T</sub>)devices
      - Thin T<sub>ox</sub> devices for high performance paths
      - Thick T<sub>ox</sub> devices
        - Lower leakage
        - Higher voltages
  - Chip Level Solutions
    - Reduced interconnect R and C
    - Power islands gating the power to circuits not in use
    - Clock Gating (module level)
      - Turn off the clocks to circuits not in use
      - Requires synchronization

- Power / Performance Balance
  - System Level Solutions
    - Dynamic Voltage Scaling changing VDD as needed
    - Dynamic Frequency Scaling changing the clock frequency as needed
    - Together these are referred to as DVFS
  - Architectural Solutions
    - Pipelining
    - Multi-core processors
      - Homogeneous dual/quad core
      - Heterogeneous big/little/GPU
    - Memory Hierarchy

#### Power – Performance Tradeoffs



© ti

#### Power – Performance Tradeoffs



#### Technology

### Processor Performance IEEE IRDS



9

Source: Computer Architecture, Hennessy and Patterson, 2017 Elsivier Inc

#### Processor Performance



Source: Computer Architecture, Hennessy and Patterson, 2017 Elsivier Inc

#### Transistors per chip

Moore's Law: The number of transistors on microchips doubles every two years Our World in Data Moore's law describes the empirical regularity that the number of transistors on integrated circuits doubles approximately every two years. This advancement is important for other aspects of technological progress in computing – such as processing speed or the price of computers. Transistor count 50.000.000.000 AMD Epvc Rome IBM z13 Storage Co 10.000.000.000 5.000,000,000 1.000.000.000 500.000.000 100.000.000 50.000.000 Alterit ARM Cortex-A9 10.000.000 5.000.000 Moore's Law 1.000.000 500,000 ARM 700 10M X over 50 years ARM.3 100.000 ARM Motorela 68000 50.000 ARM 6 Intel 80864 10,000 TMS 1000 Zilog 5,000 1,000 2974 2976 a18,980,982 . 48° 48° 48° 49° 49° 49° 49° 49° 49° 49° 40° 20° 20° 20° 20° 20° 20° 20°

Data source: Wikipedia (wikipedia.org/wiki/Transistor\_count) Year in which the microchip was first introduced OurWorldinData.org – Research and data to make progress against the world's largest problems. Licensed under

Licensed under CC-BY by the authors Hannah Ritchie and Max Roser.

#### SPECint Performance – single CPU

42 Years of Microprocessor Trend Data



Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2017 by K. Rupp

#### • SPECint Performance – Including multi-core



#### What about Cost \$\$\$



The minimum gate pitch of Intel's 10 nm process shrinks from 70 nm to 54 nm and the minimum metal pitch shrinks from 52 nm to 36 nm. These smaller dimensions enable a logic transistor density of 100.8 mega transistors per mm<sup>2</sup>, which is 2.7x higher than Intel's previous 14 nm technology and is approximately 2x higher than other industry 10 nm technologies.

- Design Processes
  - Tick-Tock
    - 2 year technology cycle
    - Tick Existing processor design migrated to a new technology node
      - Higher speeds
      - Better power characteristics
    - Tock New processor design introduced on an existing technology node
      - New features
      - Higher performance (IPC, ...)



- Design Processes
  - 3 Phase
    - 3 year technology cycle
    - Process
      - Slight modifications to existing architecture on a new technology node
    - Architecture
      - New architecture on the technology node
    - Optimization
      - Optimized architecture on the technology node

