logo

Assignment - Girvan-Newman Algorithm | Spark Framework

5 Pages1250 Words54 Views
   

Added on  2019-09-30

Assignment - Girvan-Newman Algorithm | Spark Framework

   Added on 2019-09-30

ShareRelated Documents
Assignment Overview
In this assignment you are asked to implement the Girvan-Newman algorithm
using the Spark Framework in order to detect communities in the graph. You
will use only video_small_num.csv dataset in order to find users who have the
similar product taste. The goal of this assignment is to help you understand
how to use the Girvan-Newman algorithm to detect communities in an
efficient way by programming it within a distributed environment.
Environment Requirements
Python: 2.7 Scala: 2.11 Spark: 2.2.1
IMPORTANT: We will use these versions to compile and test your code. If you
use other versions, there will be a 20% penalty since we will not be able to
grade it automatically.
You can only use Spark RDD.
Submission Details
For this assignment you will need to turn in a Python, Java, or Scala program
depending on your language of preference.
Your submission must be a .zip file with name:
<Firstname>_<Lastname>_hw4.zip. The structure of your submission should
be identical as shown below. The Firstname_Lastname_Description.pdf file
contains helpful instructions on how to run your code along with other necessary
information as described in the following sections. The OutputFiles directory
contains the deliverable output files for each problem and the Solution directory
contains your source code.
Assignment - Girvan-Newman Algorithm | Spark Framework_1
Datasets
We are continually using Amazon Review data. This time we use a subset of
Amazon Instant Video category. We have already transferred the string id of
user and product to integers for your convenience. You should download one
file from Blackboard:
1. video_small_num.csv
Construct Graph
Each node represents a user. Each edge is generated in following way:
In video_small_num.csv, count the number of times that two users rated the
same product. If the number of times is greater or equivalent to 7 times,
there is an edge between two users.
Task1: Betweenness (50%)
You are required to implement Girvan-Newman Algorithm to find
betweenness of each edge in the graph. The betweenness function
should be calculated only once from the original graph.
Execution Example
The first argument passed to your program (in the below execution) is the
path of video_small_num.csv file (e.g.
spark-2.2.1-bin-hadoop2.7/HW4/video_small_num.csv”). The second input is
the output path (output path is the directory of your output file, not including
file name. e.g. “spark-2.2.1-bin-hadoop2.7/HW4/”). Following we present
examples of how you can run your program with spark-submit both when your
application is a Java/Scala program or a Python script.
A. Example of running a Java/Scala application with spark-submit:
Notice that the argument class of the spark-submit specifies the
main class of your application and it is followed by the jar file of
the application.
You should use Betweenness as your class name for this task.
B. Example of running a Python application with spark-submit:
Result format:
Each line is a tuple, the format is like (userId1, userId2, betweenness value). The
file is ordered by the first element in ascending order and if the first element is
Assignment - Girvan-Newman Algorithm | Spark Framework_2

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Assignment on Different Types of Recommendation Systems
|8
|3049
|287

Data Science Practices Using Pyspark Project 2022
|13
|1910
|10

SPARK PROGRAMMING.
|10
|1692
|87

Program Design for Network Routing using Dijkstra's Algorithm
|6
|828
|262

CMP3750M Cyber Security Assessment
|12
|1547
|19

SD1420 Introduction to Java Programming Assignment
|98
|10941
|360