Performance Evaluation for Processors: SYST27198 Assignment 1

Verified

Added on 2023/04/25

AI Summary

This assignment focuses on evaluating processor performance using various benchmarks and supercomputers. The student analyzes OpenMP benchmarks, specifically the NPB benchmark set, to measure the performance of Intel's Quad Core processors, defining speedup and its calculation. The assignment also involves analyzing supercomputers at the University of Toronto using the NERSC Excel spreadsheet, detailing the Niagara system's specifications. Furthermore, the student describes how to evaluate the performance of the selected supercomputer using SNAP MiniApplication and STREAM microbenchmark, outlining the benchmark test procedure, required runs, and expected results, including recording bandwidth and the number of cores/threads used. The assignment utilizes resources from Compute Canada and NERSC to provide a comprehensive analysis of processor performance evaluation methodologies.

Title: Performance Evaluation for Processors Using Specific Benchmark
Tools
Student Full Name:
Student ID:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

1. Regarding the OpenMP benchmarks:
a. Describe in 200 words one of the OpenMP benchmarks that were used to measure
the performance of Intel’s Quad Core processors. [2 marks]
Solution
One of the openMp benchmarks used is the NPB benchmark set. NPB is an
acronym for NasPB which is the Numerical, Aerodynamic, Simulation Parallel
Benchmark. This benchmark is used to find the performance evaluation of highly
parallel computers. It follows the computation and data processing principles of
large scale computation of fluid dynamics applications. It is written in two
languages, C programming language and FORTRAN. It uses OpenMP to achieve
parallelism. The problems solved using this benchmark are categorized in
different classes depending on their sizes. These different classes are S, W, A, B,
C, D and E. in this case S is the smallest class while E is the largest.
NPB Benchmarks are developed and maintained by NASA advanced
supercomputing division based at the NASA Ames Research Center. The first
version of NPB required benchmarks to feature parallel algorithms and software
methods, easily verifiable and correctness of results and performance figures, they
should be capable of accommodating new systems, genericness and architecture
neutrality and should be readily distributable. Later versions of NPB are a slight
improvement of the first version. NPB was created since traditional benchmark
were specialized only for vector computers and hence suffered from parallelism
which impeded tuning restrictions. They also had insufficient problem holding
sizes.
b. What is the Speedup performance factor? How is it calculated? How can you
compare some Intel’s processors according to the Speedup? [2 marks]
Solution
Speedup performance factor is a number that is used to measure the relative
performance of two systems that are used to process the same signal/ problem. It
is used to improve the speed at which a task is executed that is execution of tasks
on two similar architectures with different capabilities.
Speedup performance factor is calculated by computing the ratio of the execution
time while using one thread to the execution time while using a given number
(Say N) of threads as shown in the following equation.
Speedup= T 1
Tn ; where T1 is the time for execution when using one thread and Tn
is the time for execution while using N threads.

Some speedup values for some intel processors are as follows;
The core i7 processor obtains a maximum speedup of 4 when running on 8
threads and hyper threading (HT) technology. It achieves a speedup of 3.7 when
Hyper Threading is disabled and using only four threads. Similarly, the core 2
quad processor has a speedup of 3.1, using four threads. For core i7 with Hyper
threading enabled, the speedup is 7 using eight or more threads. This speedup
drops to 4 when HT is disabled and with 4 or more threads.
For BT and ST benchmarks, core i7 attains highest speedup with 4 threads. The
speedup attained from BT benchmark is approximately 3.81 when Hyper
Threading is enabled and 3.79 when it is disabled.
2. Use the NERSC Excel spreadsheet available here. Fill in the blanks of the first sheet
“System Descriptions” to analyze the Supercomputers at one of the four new sites
belonging to Compute Canada. (follow this link). Copy and paste only the cells B10 to
C26 here to describe it. [2 marks]
Solution
The selected site is the University of Toronto with the Niagara system of supercomputer.
System Identifier # 1
Node Count = 1500
Type of Interconnect EDR Infiniband network, High performance interconnect
Interconnect Topology Dragonfly+ topology
Peak Interconnect P2P (GB/s) 100Gb/s
Memory Type/Speed/#Channels 188 GiB
Memory/Node(GB) = 202GB
Processors/Node = 40 Intel Skylake cores
Clock Speed (GHz) = 2.4 GHz
Cores/Processor = 40
HW Threads / core = two intel E5-2683 v4 "Broadwell"
L1 Cache Size = 200 TB
L2 Cache Size = 7 PB
L3 Cache Size = 3 PB
OS Name & Version = Linux CentOS 7
Compiler Name & Version = Intel Skylake, version 1
MPI Name & Version = MPI + OpenMP 4.0
3. Use one of the NERSC Trinity MiniApplications and one of the MicroBenchmarks
available here to describe the way how the performance of the supercomputer selected in
the previous question, can be evaluated. Describe:

a. how to perform the benchmark test [2 marks]
Solution
A benchmark test is performed Using SNAP MiniApplication by first
downloading the software from SNAP.tar.gz and installing it
The second procedure is to run a time-dependent problem. This requires two
copies of the angular flux, one for incoming and one for outgoing of a time step.
Thirdly is to compile the problem. SNAP uses a Fortran compiler which has
OpenMP capabilities. The OpenMP capabilities therefore need to be induced into
the SNAP Program. This requires no changes to the source code nor macros to be
defined.
Fourth step is to run the benchmark. Running is done using scripts of different
types, for small problems and for large problems.
Fifth step is to verify if SNAP runs to completion. If it runs successfully, then the
last step is to report.
Two results are reported in a procurement spreadsheet; the solve time and the
number of total iterations.
b. what are the required runs and their corresponding expected results? [2 marks]
Solution
The required runs and their expected results in performing a benchmark using
STREAM microbenchmark are as follows.
1. The parameter N should be adjusted such that the used memory is
approximately 0.6*M.
2. The following two results should be recorded;
a. The highest bandwidth with enough OpenMP threads to make use of
all hardware cores and threads
b. The bandwidth for a vendor determined lowest number of cores and
threads that achieve highest bandwidth.
The expected results from these runs are similar and they report only a
single result. The number of cores and threads used is also reported where
applicable.
References

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Boxwell Jr, Robert J (1994). Benchmarking for Competitive Advantage. Robert J Boxwell Jr,
New York: McGraw-Hill.
Baer, Jean-Loup (2010). Microprocessor Architecture: From Simple Pipelines to Chip
Multiprocessors. New York: Cambridge University Press.
Hennessy, John L.; David A., Patterson (2012). Computer Architecture: A Quantitive Approach.
Waltham, MA: Morgan Kaufmann.
Speckenmeyer, Ewald (2005). Superlinear Speedup for Parallel Backtracking: Lecture Notes in
Computer Science. 297
https://docs.computecanada.ca/wiki/Cedar
https://www.wseas.us

1 out of 5

Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support