TEXTBOOK

This course uses industrial and academic research papers published in the IEEE journal and conference literature. Learn how to access these papers using this tutorial.

WEEKLY SCHEDULE

Week Topics Reading Homework
1 Trends in Microarchitecture Microarchitecture is always changing as the industry works to meet the demands on consumers. Recently, highly advanced superscalar processors have been joined by reconfigurable devices supporting both hard and soft processor cores. And, warehouse scale computing has been driving memory architecture in exciting ways as the traditional bandwidth barrier limits performance. Finally, core-based architecture has matured and dominates in a significant range of consumer products. But, core processors present challenges in both hardware and programmatic design. These challenges continue to drive research into algorithms and techniques to utilize cores most effectively.

These papers from the IEEE literature provide some recent insight into the historical development that led to the current state in computer architecture.

  • Tessier, R.; Pocek, K.; DeHon, A., "Reconfigurable Computing Architectures," in Proceedings of the IEEE , vol.103, no.3, pp.332-354, March 2015
  • Moore, S., "Top 11 Technologies of the Decade: #5 Multicore CPUS", IEEE Spectrum, vol. 48, no. 1, pp. 40-42, Jan. 2011
  • Nair, R., "Evolution of Memory Architecture," in Proceedings of the IEEE , vol.103, no.8, pp.1331-1345, Aug. 2015
none
2 Superscalar Processor Basics Superscalar processors exploit instruction-level parallelism by dispatching multiple instructions to multiple pipelines. Once the pipelines are full, the processor achieves an instruction-per-clock greater than 1 (IPC > 1) provided that hazards are appropriately managed.

These papers from the IEEE literature document two classic architectures from the 1990s -- the Intel Pentium and the Motorola MC68060. The final paper allows you to compare and contrast the superscalarism of the 1990s with the advanced circuitry of one of the most recent Intel processors.

  • Alpert, D.; Avnon, D., "Architecture of the Pentium Microprocessor," Micro, IEEE, vol 13., no 3., pp. 11-21, June 1993.
  • Circello, J.; Edgington, G.; McCarthy, D.; Gay, J.; Schimke, D.; Sullivan, S.; Duerden, R.; Hinds, C.; Marquette, D.; Sood, L.; Couch, A.; Chow, D., "The superscalar architecture of the MC68060," in Micro, IEEE , vol.15, no.2, pp.10-21, Apr 1995
  • Hammarlund, P.; Martinez, A.J.; Bajwa, A.A.; Hill, D.L.; Hallnor, E.; Hong Jiang; Dixon, M.; Derr, M.; Hunsaker, M.; Kumar, R.; Osborne, R.B.; Rajwar, R.; Singhal, R.; D'Sa, R.; Chappell, R.; Kaushik, S.; Chennupaty, S.; Jourdan, S.; Gunther, S.; Piazza, T.; Burton, T., "Haswell: The Fourth-Generation Intel Core Processor," in Micro, IEEE , vol.34, no.2, pp.6-20, Mar.-Apr. 2014
none
3 Superscalar Processor Techniques Superscalar processors use many techniques to coordinate multiple pipelines, detect and prevent hazards, and manage the write back of data to the memory pyramid.

These papers from the IEEE literature begins with a classic paper describing the theory of superscalar processors. This well respected paper is an important summary paper that also illustrates the state of the industry in the 1990s as silicon density allowed superscalar techniques to make the jump to integrated circuit processors. The final two papers allow you to compare and contrast the approaches used in the 1990s with the approaches used in a very recent processor from NVIDIA.

  • Smith, J.E.; Sohi, G.S., "The Microarchitecture of Superscalar Processors," Proceedings of the IEEE, vol 83. no. 12, pp. 1609-1624, Dec. 1995.
  • Blank, G.,; Krueger, S., "The SuperSPARC Microprocessor," Compcon Spring '92, Digest of Papers, pp. 136-141, Feb. 1992.
  • Boggs, D.; Brown, G.; Tuck, N.; Venkatraman, K.S., "Denver: Nvidia's First 64-bit ARM Processor," in Micro, IEEE , vol.35, no.2, pp.46-55, Mar.-Apr. 2015
HW1
4 Superscalar Processor Techniques Branch prediction is critical in advanced microarchitecture because control hazards cause multiple pipelines to flush. And, because out-of-order execution is usually occuring, these control hazards further complicate processor and memory state roll-back if they are not appropriate managed.

This paper set from the IEEE literature first presents the classic Yeh algorithm for branch prediction. Yet, research in branch prediction continues 25 years after this critically important work! The second paper provides a glimpse into some of the latest research in branch prediction.

  • Yeh, T.; Patt, Y., "Two-Level Adaptive Training Branch Prediction", Proceedings of the 24th International Symposium on Microarchitecture, pp. 51-61, 1991 (ACM)
  • Mohammadi, M.; Song Han; Aamodt, T.M.; Dally, W.J., "On-Demand Dynamic Branch Prediction," in Computer Architecture Letters , vol.14, no.1, pp.50-53, Jan.-June 1 2015
HW2
5 Superscalar Processor Techniques Tomasulo's algorithm is fundamental to superscalar out-of-order execution. This algorithm introduced the concepts of execution unit reservation and the renaming of registers.

This paper set includes the original paper written by Tomasulo.

  • Tomasulo, R., "An Efficient Algorithm for Exploiting Multiple Arithmetic Units", IBM Journal, Vol. 11., No. 1, pp. 25-33, 1967 (academic use permission granted: provided by instructor)
none
6 Superscalar Processor Techniques Superscalar processors have grown from 2-way superscalar to today's highly advanced processors which often are 8-way or more in superscalar degree.

This paper serves as another case study of an early superscalar processor with a large superscalar degree.

  • Diefendorff, K.; Allen, M.; , "Organization of the Motorola 88110 superscalar RISC microprocessor ," Micro, IEEE , vol.12, no.2, pp.40-63, Apr 1992.
HW3
7 Vector Processors
  • Sell, J.; O'Connor, P., "The Xbox One System on a Chip and Kinect Sensor," in Micro, IEEE , vol.34, no.2, pp.44-53, Mar.-Apr. 2014
none
8 Vector Processors Vector processors implement operations across a large array of simple computing elements. This allows massively multithreaded programming to optimize the speed of matrix manipulations, for example. This type of computing element is foundational to the multimedia rich environment of modern computer graphics. The GPU is a form of vector processor.

This paper presents the NVIDIA Tesla GPU architecture.

  • Lindholm, E.; Nickolls, J.; Oberman, S.; Montrym, J., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," in Micro, IEEE , vol.28, no.2, pp.39-55, March-April 2008
none
9 Multiprocessors and multiple cores
  • Aingaran, K.; Jairath, S.; Konstadinidis, G.; Leung, S.; Loewenstein, P.; McAllister, C.; Phillips, S.; Radovic, Z.; Sivaramakrishnan, R.; Smentek, D.; Wicki, T., "M7: Oracle's Next-Generation Sparc Processor," in Micro, IEEE , vol.35, no.2, pp.36-45, Mar.-Apr. 2015
none
10 Multiprocessors and multiple cores TBA
11 Comprehensive Final Exam none