Problem 1: Pipelining (25 points)

    1. Conventional Clocking

    2.  

    3. Wave Pipelining

Problem 2: Cache Cost and Performance Models (25 points)

2.1 Cost of the microprocessor

2.2 Performance in MIPS:

Execution time per inst = 1.5 + cache miss penalty

Icache miss rate: From table A.50 miss rate = 0.0515 for fully associative.
From table A.5 multiply by 1.14 to adjust for assoc = 2.

Icache miss rate = 0.0581.

Icache miss penalty = 1.0 ref/inst * 0.0581 miss/inst * 20 clock/miss

= 1.162 CPI

Dcache miss rate: From table A.38 miss rate = 0.0699 for fully associative.
From table A.5 multiply by 1.14 to adjust for assoc = 2.

Dcache miss rate = 0.0797.

Dcache miss penalty = 0.3 ref/inst * 0.0806 miss/inst * 20 clock/miss

= 0.478 CPI

Execution time = 1.5 + 1.162 + 0.478 = 3.14 CPI

At 150 MHz the performance = 47.8 MIPS

2.3 L2 Cache controller area and cost

Also accepted to include area for valid/dirty bits.

2.4 Performance with L2 Cache

Modifying the miss calculation above with 4 clocks miss delay for L2 hit
gives 0.328 CPI for L1 hit and L2 miss

We need to add the L2 miss penalty. From Table A.26 the L2 cache miss rate is 0.0292 adjusted by a factor 1.04 for 4-way associative = 0.0304.

The penalty for L2 misses = 1.3 ref/inst * 0.0304 miss/inst * 20 clock/miss

= 0.790 CPI

The total CPI = 1.5 + 0.328 + 0.790 = 2.62 CPI

At 150 MHz the performance = 57.3 MIPS

 

Problem 3: Mean Buffer Performance Modeling (25 points)

3.1 The writes are "fully buffered" so use an open queue model. We know enough about the pipeline to use delta-binomial. The pipeline runs at 10 clocks per memory cycle. Normalize time to the memory cycle:

n = 0.1 write/inst * 1.0 inst/clock * 10 clocks/mem-cycle = 1

d = n/z = 0.1 since there are 10 "sources" (z) per mem-cycle from the
pipeline

So we have an open queue B(4,1,0.1) model with r = 0.25 and p = 0.1/4

3.2 Selecting buffer size

We want the probability of overflow on a write to be <1%

By MarkovÝs inequality BF >= 0.15/.01 = 15

By ChebyshevÝs inequality BF >= 0.15 + 1.07/.1 = 10.85, rounded up to 11

Taking the minimum value, the processorÝs write buffer should be 11 entres.

Grading Summary

Using simple binomial ˝5 points; using M/D/1 ˝3 points.

Forgetting to multiply queue size by 4 for 4 modules ˝1

Mistakes in part 1 were carried over to part 2 with no lost credit.

Generally demonstrate application of Markov/ChebyshevÝs bounds +10 points

Wrong standard deviation calculation - 3 points

Just a careless calculation error ˝1 point

Note: A number of students confused variance and standard deviation.

Some students also confused the coefficient of variation [c] we used for service time distribution with the variance of queue size required for this problem.

Some students used simple binomial and calculated a queue length of 0. If something like this occurs in the future, please still follow through on part to demonstrate your knowledge and understanding. Solve symbollically or pick a possible value (e.g., 1) to carry over from part 1.

Problem 4: Memory System Performance Modeling (25 points)

We use the closed queue model with delta-binomial distribution since the caches are blocking.

 

Grading summary

I accepted a wide range of solutions for d as long as some reasoning about the pipeline was demonstrated.

Calculated n or offered bandwidth without considering cache misses ˝5 points. No additional penalty for carrying incorrect result through other parts.

Solve bandwidth equation with n>m ˝2 points.

In part 2, do not scale write bandwidth for achieved performance ˝ 1 point.