Advanced Computer Architecture Homework: Cache and Memory Analysis

Verified

Added on  2020/04/13

|4
|580
|253
Homework Assignment
AI Summary
This document provides comprehensive solutions to a set of advanced computer architecture problems. The solutions cover various aspects of computer architecture, including the analysis of cache misses with and without critical word first and early restart, calculating the Cycles Per Instruction (CPI) for a multiprocessor system considering remote request costs, determining the idle time of a system due to DRAM memory access, and demonstrating the execution of snoopy bus and directory protocols. The document meticulously walks through each problem, explaining the steps and calculations involved, making it a valuable resource for students studying computer architecture. The solutions are well-organized and easy to follow, providing a clear understanding of the concepts.
Document Page
Running Head: ADVANCED COMPUTER ARCHITECTURE 1
Advanced Computer Architecture
Institution
Name
Date
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
ADVANCED COMPUTER ARCHITECTURE 2
PROBLEM 1
(i) Cycles need to service a cache miss Without “critical word first and early
restart”
Time (in cycles) =cycles for first chunk+ cycles for remaining three chunks.
It requires 208 cycles for the first 32 bytes and 2 cycles for the next 3 cycles each
Number of cycles = (2 × 120) + (2×3) = 246
(ii) Cycles need to service a cache miss With “critical word first and early restart”
Wait only for the first 32 byte to be transferred.
Time in cycles = time to transfer 32 bytes
Number of cycles = (1 × 208) = 208
PROBLEM 2
We first find the clock cycles per instruction.
CPI = base CPI + Remote request rate × Remote request cost
= 1.0 + 5% × Remote request cost
And Remote request cost = Remote access cost/ cycle time
= 300ns/1ns = 300.
Thus CPI = 1.0 + (0.05*300) = 16
Now, the multiprocessor with all local references is 16/1.0 =17.0 times faster.
Practically, analysis of performance is more complex because a fraction of the non-
communication references will not be available in the local hierarchy and the remote access time
doesn’t have a single constant value.
PROBLEM 3
The total DRAM memory = 2.5 × 12 × 10^9 bytes J/seconds
= 30 × 10^9 J/s
The total energy required to read or write cache-line to flash and DRAM =
2 × 32 × (4 × 10^-6 + 1.5 × 10^-6)
= 3.52 × 10 ^-4 J
Now the time a system will be idle is = 3.52 × 10 ^-4 J/30 × 10^9 J/s
Thus 3.52× 104 J
30 ×109 J /s = 1.17 × 10^-4 seconds
Document Page
ADVANCED COMPUTER ARCHITECTURE 3
PROBLEM 5
(a). Using the Snoopy Bus protocol
P1 P2 BUS Memory
step State Addr Value State Addr value Action Proc Addr value Addr State {Pro} Value
P1:
Write
A1=3
Ex A1 3 Write
Miss
P1 A1 A1 Ex {P1} 4
5
P1:
Read
A2
Read
Miss
P1 A2 {P1}
Write
Back
P1 A1 3 A1
A2
{} 3
5
Shared A2 5 DaRp P1 A2 5 {P1}
P2:
Write
A2 =
4
Ex A2 5 Write
Miss
P2 A2 4 Ex P2 5
P1:
Read
A1
P1 A1 P1
P2:
Read
A1
Shar A1 P2 A1 P2
Shar A1 4
P1:
Write
A2 =
5
P1 A2 P1
P2:
Read
A2
P2 A2 P2
Document Page
ADVANCED COMPUTER ARCHITECTURE 4
(b). Using the directory protocol
P1 P2 BUS Memory
step State Addr Value State Addr value Action Proc Addr value Addr Value
P1:
Write
A1=3
Ex A1 3 Write
Miss
P1 A1 A1
A2
4
5
P1:
Read
A2
Read
Miss
P1 A2
Write
Back
P1 A2 5 A1
A2
3
5
Share
d
A2 5 Read
Data
P1 A2 5 A2
P2:
Write
A2 =
4
Inv Excl A2 5 Write
Miss
P2 A2 A2 5
P1:
Read
A1
Read
Miss
P1 A1 A1 4
P2:
Read
A1
Excl A2 4 Read
Miss
P2 A1 A1 4
P1:
Write
A2 =
5
Excl A2 5 Write
Back
P1 A2 A2 5
P2:
Read
A2
Shar A2 5 Ftch P2 A2 A2 5
chevron_up_icon
1 out of 4
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]