Parallel Computing: Speedup, Efficiency, and Load Imbalance

Verified

Added on  2019/09/30

|2
|464
|136
Homework Assignment
AI Summary
This assignment delves into parallel computing, comparing shared and distributed memory architectures. It explores speedup and efficiency calculations, addressing a scenario with two execution phases, one sequential and one parallelizable. The solution calculates the maximum speedup achievable and analyzes the speedup and parallel efficiency when the parallel phase is executed on multiple processors. The assignment also explains load imbalance and the benefits of weak scaling. It examines a scenario with seven time units, explaining that the execution time indicates imperfect speedup. Furthermore, the assignment explores the concepts of fat-tree and mesh networks, and the difference between load balancing and load imbalance.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Difference between distributed memory machine and shared memory machine
Both are low-level programming abstractions. On one hand where multiple processing is allowed
in the case of shared memory, it is not with the case of distributed memory. Also, in the former,
multiple processes can share the same memory location. However, in the latter, explicit
commands are required for the transfer of data from one another element.
What is a fat-tree network? How does it differ from a mesh network?
Fat-free network was brought in the picture by Charles E. Leiserson. There are hierarchies in this
network and the branches are thicker at near to the top of the hierarchy. It is different from mesh
network in the sense that it allows technology specific usage however the mesh network are
based on rigid algorithm can be molded to specific technology requirement.
Assuming that we have a program where we have two execution phases. One phase takes 1
time unit to execute and can only execute sequentially. The other phase takes 32 time units
to execute sequentially but can be made parallel. What is the maximum speedup you can
possibly achieve? Present your calculations.
1/((1−P)+P/n))
P is 96.9% as 32/33 is the total time.
lim
n ( 1
( 10.969 ) +0.969/n )

= 1
0.031 =32.25
Here, 1/n is reaches to 0 as n reaches to infinity.
You attack the phase which can be made parallel. You rewrite it and run on eight (8)
processors. You notice that the execution time of your rewritten program is seven (7) time
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
units. What is the speedup compared to the original program? What is the parallel
efficiency? Present your calculations.
Speed up, Sn = T1/TN = 32/7 = 4.57
The parallel efficiency, Ep = Sp/p = Ts/pTp = 32/8*7 = 0.57
The execution time you notices, i.e., seven (7) time units means that you do not see perfect
speedup. You suspect load imbalance. In your own words, what does load imbalance
mean?
The load imbalance is the opposite of load balance. Load balancing refers to the distribution of
workloads among the processors by equally distributing the work. On the other hand, the load
imbalance leads to the unequal distribution of the work load.
You think weak scaling would give you better performance. Why would weak scaling give
you better performance?
I think weak scaling would give better performance as the goal of placing fixed problem size per
processor will ensure that the processor runs at their optimal level instead of uneven distribution
of load.
chevron_up_icon
1 out of 2
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]