CE4930 LEARNING OBJECTIVES

BASIC SYSTEM PRINCIPLES

List the three categories of computer systems.
Define computer architecture.
List the three subcategories of computer architecture.
Compare and contrast the three subcategories of computer architecture: instruction set architecture, microarchitecture, and system architecture.
List the five classic components of a computer.
Define processor in the context of the five classic components of a computer.
Compare and contrast microprocessors and microcontrollers.
Define the stored-program concept.
Compare and contrast primary and secondary memory.
Draw the classic computer architecture memory hierarchy
List the classic memory and storage size prefixes (KB, MB, GB, etc. and how they differ in memory and storage).
Draw the Princeton (von Neumann) system architecture.
Draw the Harvard system architecture.
Describe the Princeton bottleneck.
List the advantages and disadvantages of the Princeton and Harvard system architectures.
Compare and contast execution time, throughput, CPU time, user CPU time, and system CPU time.
Calculate the average clocks per instruction (CPI) given instruction timing information.
Calculate CPU execution time using IC * CPI * period
Compare the performance of different computers by applying the performance equations to system specifications.
Comment on the SPEC benchmarks and their importance in performance analysis.
Define instruction set architecture.
List the four categories of instructions.
Describe the difference between instruction set architecture and microarchitecture.

MIPS ARCHITECTURE REVIEW

State the MIPS magic number.
State the number of MIPS registers.
Categorize the MIPS registers into assembly language groups.
Compare and contrast saved temporary and temporary registers.
List some core MIPS assembly language instructions.
Write MIPS assembly language instructions.
List the three MIPS instruction formats.
Write the MIPS R-format ALU architectural equation.
Write the MIPS I-format ALU architectural equation.
Write the MIPS I-format lw architectural equation.
Write the MIPS I-format sw architectural equation.
Write the MIPS I-format beq architectural equation.
Write the MIPS I-format bne architectural equation.
Draw the R-format instruction binary number showing the fields and associated bit positions.
Draw the I-format instruction binary number showing the fields and associated bit positions.
Draw the J-format instruction binary number showing the fields and associated bit positions.
Compare and contrast the single-phase and double-phase clocking strategies (using one edge versus using two edges) for calcolation and register update.
Describe the use of moltiplexers to select data flow in the single-cycle MIPS processor designed in lecture.

PIPELINING REVIEW

Describe how pipelined microarchitectures exploit instruction level parallelism to improve throughput.
State the number of instructions in flight every clock cycle when a basic MIPS pipelines is foll.
Describe how pipelining improves throughput.
State the theoretical pipeline speedup (compared to single-cycle) for a pipeline of length n.
State why the theoretical pipeline speedup cannot be achieved.
Use the pipeline speedup equation to calcolate real pipeline speedup based on a stall-cycle mix.
Describe the use of interstage registers in pipeline microarchitecture.
Justify the statement "Pipeline microarchitectures make very efficient time usage of components."
List the three types of pipeline hazards.
List examples of structural hazards.
State how structural hazards are eliminated when implementing pipelined microarchitectures.
Justify the Harvard organization in pipelined implementations.
List the two broad categories of data hazards.
Describe the hazard window for MIPS pipeline microarchitectures if hazard-protection is not implemented.
Identify load-use and register-use data hazards in code segments.
List the three principle techniques used to remove data hazards. Describe the advantages and disadvantages of each technique.
Describe how the hazard window suggests forwarding paths for data hazard prevention.
State the two causes of control hazards.
Describe the effect of unconditional and conditional branches on pipeline performance.
List the two principle techniques used to handle control hazards.
Justify advancing jump circuitry into earlier pipeline stages such as IF or ID.
Draw pipeline flight plans showing instructions stalling through branch decision and then flushing after a taken branch.
Compare and contrast the simple (predict branch not taken) and complex (statistical prediction) branch prediction techniques.
Describe the Lee and Smith study on branch predictors. Summarize the key result.
Describe how the Nair study differs from the Lee and Smith study. Summarize the key result.
Comment on the success of state-of-the-art branch predictors.
Describe the use of branch target buffers and branch history buffers as more advanced branch prediction techniques.

SUPERPIPELINING BASICS

Justify deepening a pipeline.
State the type of parallelism (spatial, temporal, or both) exploited by deepening a pipeline.
State how deepening a pipeline affects the pipeline speedup equation.
Explain how deeper pipelines are affected by hazards.
Explain how forwarding complexity increases in deeper pipelines.
List challenges faced by deeper pipelines.

MIPS R4000 SUPERPIPELINED PROCESSOR

Draw an organizational sketch of the MIPS R4000 processor.
Draw a pipeline flightplan that shows the stages of the MIPS R4000 processor using the correct stage names.
Describe the hazard response of the MIPS R4000 processor.

SUPERSCALAR BASICS

Justify extending a microarchitecture to superscalar.
State the type of parallelism (spatial, temporal, or both) exploited by superscalar processors.
Describe the challenges that superscalarism introduces that are not challenges in simple pipelines.
List and describe compile-time techniques, such as loop unrolling and predicated instructions, that can assist in superscalar dispatch.
List and describe run-time techniques, such as reservation stations and register renaming, that can assist in superscalar dispatch.
State how superscalarism improves on the IPC when compared to a pipelined processor.

INTEL PENTIUM PROCESSOR

Draw a basic organizational sketch of the Intel Pentium processor.
Draw a pipeline flightplan that shows the stages of the Intel Pentium processor using the correct stage names.
Describe the hazard response of the Intel Pentium processor.
List the four requirements for Intel Pentium dual-issue.
Describe the forwarding mechanism of the Intel Pentium processor.
Draw the Intel Pentium dynamic branch prediction state machine.
Determine the prediction made by the Intel Pentium dynamic branch prediction state machine for a given code sequence.

MOTOROLA 88110 PROCESSOR

Draw a basic organizational sketch of the Motorola 88110 processor.
Draw a pipeline flightplan that shows the stages of the Motorola 88110 processor using the correct stage names.
Describe the hazard response of the Motorola 88110 processor.
Describe the Motorola 88110 dispatch algorithm.
State why the Motorola 88110 includes two integer units.
Summarize the types of calcolations completed by each of the ten functional pipes in the Motorola 88110 microprocessor.
State the cycle delay for each of the ten functional pipes in the Motorola 88110 microprocessor.
Describe the use of the history buffer in the Motorola 88110 microprocessor. Hint: Have you searched for an 88110 paper in IEEE Explore?
State how the Motorola 88110 handles temporal ordering of writes to the register files.

PENTIUM PRO CASE STUDY

Place the Pentium Pro into historical context by describing the state of the industry and its competition.
Describe why the Pentium Pro was revolutionary rather than simply evolutionary.
Describe architectural key points for the Pentium Pro. What makes it interesting to advanced superscalar architects?
Sketch the basic organizational diagram for the Pentium Pro.
Draw the Pentium Pro flightplan diagram.
Describe the Pentium Pro fetch-decode process.
Describe the basic instruction format for a u-op.
Summarize the u-op translation process.
Translate reg-mem or mem-mem x86 instructions to example u-ops.
State the average number of u-ops per x86 instruction.
State the primary purpose of the reorder buffer.
Describe what gets allocated for each instruction in the reorder buffer.
Summarize the register renaming process implemented inside the reorder buffer.
State the primary purpose of the reservation station.
Describe the dispatch process out of the reservation station.
Compare and contrast the Pentium Pro main and secondary arithmetic units.
Describe the dual-issue limitation imposed by the Pentium Pro secondary arithmetic unit.
Summarize the Pentium Pro integer unit performance. Compare it to the P5 architecture.
List the three result types that can potentially return to the reorder buffer on each clock cycle.
Describe the instruction events that occur as instruction results enter the reorder buffer.
Summarize the Pentium Pro cache behavior.
State the best-case Pentium Pro instruction flight time.
State the average-case Pentium Pro instruction flight time.
Describe why the Pentium Pro has an average-case flight time rather than a fixed-case flight time.
Justify the use of advanced control hazard strategies on the Pentium Pro.
Summarize the conditional move instruction category.
Describe how conditional moves help eliminate specific types of control hazards.
Summarize example static and dynamic branch prediction strategies.
State the success rate of the APNT strategy for static branch prediction.
State the success rate of the BTFNT strategy for static branch prediction.
Compare and contrast the Lee and Yeh approaches to dynamic branch prediction.
Justify the statement "The Yeh approach tracks patterns of branches and not just a specific branch."
Summarize the Pentium Pro branch prediction approach and the success rate.

MULTIPROCESSING AND MULTICORE PROCESSORS

State reasons why drop-in cores begin emerging as a strong design theme in 21st century processors.
Compare and contrast symmetric and asymmetric (distributed memory) multiprocessors.
Describe cluster computing.
List the four categories in the Flynn taxonomy for multiprocessor systems.
Describe each of the four categories in the Flynn taxonomy for multiprocessor sytems.
Give examples of each of the four categories in the Flynn taxonomy for multiprocessor systems. For example, the classic PC of the 1990s is which type?
Sketch a basic organizational diagram for each of the four categories in the Flynn taxonomy for multiprocessor systems.
Describe the two subcategories of MIMD systems.
State the design goal set by the company consortium that developed the molticore processor case-study examined in lecture (Cell).
Describe the basic organizational features of the Cell multiprocessor.
Sketch a basic organizational diagram of the Cell multiprocessor.
List key architectural characteristics and strategies implemented by the Cell multiprocessor Power Processing Element (PPE).
List key architectural characteristics and strategies implemented by the Cell multiprocessor synergistic processor elements (SPE).
Describe how the Cell multiprocessor SPEs implement SIMD mathematics by varying bitwidths.
Describe vector processors including their history, Flynn taxonomy, and places where they have been most successful in the markeplace.
Describe graphics processing units (GPUs) including their history, Flynn taxonomy, and their success in today's markeplace.
Describe VLIW processors including their history, Flynn taxonomy, and their success rate in the marketplace.
Describe current trends in micro-architecture including technology greening, the return of hyperthreading, the asymptotic leveling of core count, and licensable intellectual property (IP) cores.