Friendship Network Updates and Logging


Added on  2019-09-22

6 Pages1676 Words286 Views
In this assignment you will complete the following tasks1.Answer 5 questions that exercise material we covered in classGeneral instructions:This assignment does not have a coding section. Please answer the following question. Please be concise and clear in your answers.Important: Make sure you view this file while logged in as your nyu.edu account.Step 1: Make a copy of this file and rename it <your full name> ex4.Step 2: In the new document answer the following questions:Part 1: Consistency, sharding, MapReduce, FileSystem, APIsQ1: ShardingThink back to the data model you created in ex2 for the subletting site. Assume that there are only 3 models:User: A registered userApartmentListing: An open or closed listing. A closed listing always has a booker field pointing to a user.The time has come to scale-out your site and shard your data model.Q1a) Describe how you would shard your data: What field would you shard by? If there is any hierarchical structure - describe it. (Example in Scalica, Posts are nested under and co-located with the user who posted them). Your answer goes here:Q1b) Which of the following three views will be costly to produce under your sharding scheme? why?1)Display all bookings for a user (past, current and future)2)Display the apartments a user owns3)Display all listings for an apartment (open and closed)Your answer goes here:Q1 bonus for 5 points) How do you propose to solve the problem in Q1b)?
Friendship Network Updates and Logging_1

Q2: Fan in/Fan outScalica has 3 models. All are sharded by user id: ScalicaUser (information about a user), Post(stores the posts the user authored), and Following (the user ids of users this user is following).In scalica, a user - Alice’s timeline is constructed as follows: The list of users Alice follows is read. These users are “bucketed” into the different shards they are in. For each shard a single read gets all the posts for the people Alice follows in that shard. The results from all shards are combined and sorted by time of posting, and then displayed. This is done every time Alice loadsher home page. This technique is called fan-in (You fan-in the data on demand).To improve the home page’s load time, you decide to switch to fan-out. This means that you will construct beforehand and store the timeline for each user, and then simply read it when the user loads their home page(single read) and display it. Give a general description of how you would implement fan-out. In your description includeAny changes to the data model (if any are needed) Storage systems you propose to useWhat the system does when a user posts a new postWhat the system does to read the timelineKeep your answer shorter than half a page.Your answer goes here:Q3: MapReduceYour service has a social graph. That is, you store friendship relationships. Friendship is symmetric, so if Alice is Bob’s friend, then Bob is Alice’s friend. In your database, you store for each user a list of her friends. So if Alice and Bob are friends, Bob’s user id should be stored in Alice’s friends list and vice-versa.
Friendship Network Updates and Logging_2

Your user base is very large, so your database is sharded by user ids. As a result, when your system marks two user (Alice and Bob) as friends, it adds each to the other’s friends list. But because the two users’ data can be in different shards, the updates are NOT done inside a transaction (because distributed transactions are expensive and complicated).Due to machine failures and other issues, sometimes the process only updates the friends list ofthe first user. This introduces inconsistency. People can see Alice as Bob’s friend, but Bob is not Alice’s friend. To mitigate this problem, you decide to run a daily MapReduce job that finds these inconsistencies, and advises how to correct them.A few things to note:For simplicity, assume two users can become friends only once and ‘un-friend’ only once. Assume befriending and un-friending are mutual decisions.When two users are un-friended, this event is marked as a ‘tombstone’ in the friendship list of both users.A friendship list for user id x is a list of pairs <Integer,Boolean> where for each pair the Integer is a user id (e.g. y) and the boolean is True if x and y are friends and False if x and y were friends but unfriended (tombstone).The input to each Map function is a user id and her friend listThe output of the Reduce function should be statements of the form “Add|Delete use id Z To|From the friends list of user id W”. Examples: “Add user id i To the friends list of user id j” or “Delete user id m from the friends list of user id n”. Assume that these directives are fed to another system that knows how to execute them. Also note that Delete x from y” can mean changing <x, True> to <x, False> in y’s friend list or adding <x, False> to y’s friends list (if <x, True> does not exist there). Verify that you understand why the latter case can occur. In any event, you don’t need to worry about how these two cases are executed. You just need to produce the “Delete ...” directive.Write the Map and Reduce function for this job, in pseudo-code. You can use any syntax, but make it easy to understand what you are doing.
Friendship Network Updates and Logging_3

End of preview

Want to access all the pages? Upload your documents or become a member.