This course uses industrial and academic research papers published in the IEEE journal and conference literature. Learn how to access these papers using this tutorial.
Week | Topics | Reading | Homework |
---|---|---|---|
1 | Trends in Microarchitecture |
Microarchitecture is always changing as the industry works to
meet the demands on consumers. Recently, highly
advanced superscalar processors have been joined by reconfigurable devices
supporting both hard and soft processor cores. And, warehouse
scale computing has been driving memory architecture in exciting
ways as the traditional bandwidth barrier limits performance.
Finally, core-based architecture has matured and dominates in
a significant range of consumer products. But, core processors
present challenges in both hardware and programmatic design.
These challenges continue to drive research into algorithms and
techniques to utilize cores most effectively.
These papers from the IEEE literature provide some recent insight into the historical development that led to the current state in computer architecture.
| none |
2 | Superscalar Processor Basics |
Superscalar processors exploit instruction-level parallelism by
dispatching multiple instructions to multiple pipelines. Once the
pipelines are full, the processor achieves an
instruction-per-clock greater than 1 (IPC > 1) provided that
hazards are appropriately managed.
These papers from the IEEE literature document two classic architectures from the 1990s -- the Intel Pentium and the Motorola MC68060. The final paper allows you to compare and contrast the superscalarism of the 1990s with the advanced circuitry of one of the most recent Intel processors.
|
none |
3 | Superscalar Processor Techniques |
Superscalar processors use many techniques to coordinate
multiple pipelines, detect and prevent hazards, and manage the
write back of data to the memory pyramid.
These papers from the IEEE literature begins with a classic paper describing the theory of superscalar processors. This well respected paper is an important summary paper that also illustrates the state of the industry in the 1990s as silicon density allowed superscalar techniques to make the jump to integrated circuit processors. The final two papers allow you to compare and contrast the approaches used in the 1990s with the approaches used in a very recent processor from NVIDIA.
|
HW1 |
4 | Superscalar Processor Techniques |
Branch prediction is critical in advanced microarchitecture
because control hazards cause multiple pipelines to flush. And,
because out-of-order execution is usually occuring, these control
hazards further complicate processor and memory state roll-back
if they are not appropriate managed.
This paper set from the IEEE literature first presents the classic Yeh algorithm for branch prediction. Yet, research in branch prediction continues 25 years after this critically important work! The second paper provides a glimpse into some of the latest research in branch prediction.
|
HW2 |
5 | Superscalar Processor Techniques |
Tomasulo's algorithm is fundamental to superscalar out-of-order
execution. This algorithm introduced the concepts of execution
unit reservation and the renaming of registers.
This paper set includes the original paper written by Tomasulo.
|
none |
6 | Superscalar Processor Techniques |
Superscalar processors have grown from 2-way superscalar to
today's highly advanced processors which often are 8-way or more
in superscalar degree.
This paper serves as another case study of an early superscalar processor with a large superscalar degree.
| HW3 |
7 | Vector Processors |
|
none |
8 | Vector Processors |
Vector processors implement operations across a large array of
simple computing elements. This allows massively multithreaded
programming to optimize the speed of matrix manipulations, for
example. This type of computing element is foundational to the
multimedia rich environment of modern computer graphics. The GPU
is a form of vector processor.
This paper presents the NVIDIA Tesla GPU architecture.
|
none |
9 | Multiprocessors and multiple cores |
|
none |
10 | Multiprocessors and multiple cores | TBA | |
11 | Comprehensive Final Exam | none |