## ELE 455/555 Spring 2016

Homework 9

Due 4/26/2016

**Beginning of class** 

1) Starting from  $T_{parallelizable}$  and  $T_{sequential}$ , derive the general equation to calculate the speed-up associate with N processors in terms of the fraction parallelizable  $F_{parallelizable}$  - 15 pts

2) Plot a series of curves showing speedup vs. number of processors from 1 to 1024 for  $F_{parrallelizable}$  from .9% to .99999% (log values) - 15 pts

3) Your 100 core multi-core chip can enable 1 to 100 core processors. Each core has a "power cost" of \$0.20 when enabled and each unit of time used in processing "t" has an application cost of \$0.05. If your application uses 500 scalar additions (sequential) and 10,000 matrix additions (parallelizable) with each addition requiring 1 "t" of time, How many cores would you enable? No guess and check – use calculus- 15 pts

4) Your 100 core multi-core chip can enable 1 to 50 core A processors and 1 to 50 core B processors. Core A has a "power cost" of \$0.20 when enabled, Core B has a "power cost" of \$0.50 when enabled. Core B operates twice as fast as Core A. Each unit of time used in processing "t" has an application cost of \$0.04 where each addition in core A requires t units of time. If your application uses 500 scalar additions (sequential) and 10,000 matrix additions (parallelizable) how many cores of which type would you enable? No guess and check – use calculus- 20 pts 5) Find the ratio of clock cycles required using coarse grained threading and fine grained threading for the following instruction sequence:

use round-robin selection and assume 1 cycle stall for thread switching in the coarse grained process and no penalty for switching in the fine grained process 15 pts



6) Map the following threads into an SMT multi-threading approach. Assume a 4 slot superscalar pipeline with HW multi-threading. Fill each slot with the letter for the thread – leave unused slots empty 20pts

