In this assignment you will complete the following tasks.

Added on - 22 Sep 2019

  • 6

    Pages

  • 1676

    Words

  • 130

    Views

  • 0

    Downloads

Trusted by +2 million users,
1000+ happy students everyday
Showing pages 1 to 3 of 6 pages
In this assignment you will complete the following tasks1.Answer 5 questions that exercise material we covered in classGeneral instructions:This assignment does not have a coding section. Please answer the following question. Pleasebe concise and clear in your answers.Important: Make sure you view this file while logged in as your nyu.edu account.Step 1: Make a copy of this file and rename it <your full name> ex4.Step 2: In the new document answer the following questions:Part 1: Consistency, sharding, MapReduce, FileSystem, APIsQ1: ShardingThink back to the data model you created in ex2 for the subletting site. Assume that there areonly 3 models:User: A registered userApartmentListing: An open or closed listing. A closed listing always has abookerfield pointing to auser.The time has come to scale-out your site and shard your data model.Q1a) Describe how you would shard your data: What field would you shard by? If there is anyhierarchical structure - describe it. (Example in Scalica, Posts are nested under and co-locatedwith the user who posted them).Your answer goes here:Q1b) Which of the following three views will be costly to produce under your sharding scheme?why?1)Display all bookings for a user (past, current and future)2)Display the apartments a user owns3)Display all listings for an apartment (open and closed)Your answer goes here:Q1 bonus for 5 points) How do you propose to solve the problem in Q1b)?
Q2: Fan in/Fan outScalica has 3 models. All are sharded by user id:ScalicaUser(information about a user),Post(stores the posts the user authored), andFollowing(the user ids of users this user is following).In scalica, a user - Alice’s timeline is constructed as follows: The list of users Alice follows isread. These users are “bucketed” into the different shards they are in. For each shard a singleread gets all the posts for the people Alice follows in that shard. The results from all shards arecombined and sorted by time of posting, and then displayed. This is done every time Alice loadsher home page. This technique is calledfan-in(You fan-in the data on demand).To improve the home page’s load time, you decide to switch tofan-out. This means that youwill construct beforehand and store the timeline for each user, and then simply read it when theuser loads their home page(single read) and display it. Give a general description of how youwould implement fan-out. In your description includeAny changes to the data model (if any are needed)Storage systems you propose to useWhat the system does when a user posts a new postWhat the system does to read the timelineKeep your answer shorter than half a page.Your answer goes here:Q3: MapReduceYour service has a social graph. That is, you store friendship relationships. Friendship issymmetric, so if Alice is Bob’s friend, then Bob is Alice’s friend. In your database, you store foreach user a list of her friends. So if Alice and Bob are friends, Bob’s user id should be stored inAlice’s friends list and vice-versa.
Your user base is very large, so your database is sharded by user ids. As a result, when yoursystem marks two user (Alice and Bob) as friends, it adds each to the other’s friends list. Butbecause the two users’ data can be in different shards, the updates are NOT done inside atransaction (because distributed transactions are expensive and complicated).Due to machine failures and other issues, sometimes the process only updates the friends list ofthe first user. This introduces inconsistency. People can see Alice as Bob’s friend, but Bob isnot Alice’s friend. To mitigate this problem, you decide to run a daily MapReduce job that findsthese inconsistencies, and advises how to correct them.A few things to note:For simplicity, assume two users can become friends only once and ‘un-friend’ onlyonce. Assume befriending and un-friending are mutual decisions.When two users are un-friended, this event is marked as a ‘tombstone’ in the friendshiplist of both users.A friendship list for user id x is a list of pairs <Integer,Boolean> where for each pair theInteger is a user id (e.g. y) and the boolean is True if x and y are friends and False if xand y were friends but unfriended (tombstone).The input to each Map function is a user id and her friend listThe output of the Reduce function should be statements of the form “Add|Delete use idZ To|From the friends list of user id W”. Examples: “Add user id i To the friends list ofuser id j” or “Delete user id m from the friends list of user id n”. Assume that thesedirectives are fed to another system that knows how to execute them. Also note thatDelete x from y” can mean changing <x, True> to <x, False> in y’s friend list or adding<x, False> to y’s friends list (if <x, True> does not exist there). Verify that youunderstand why the latter case can occur. In any event, you don’t need to worry abouthow these two cases are executed. You just need to produce the “Delete ...” directive.Write the Map and Reduce function for this job, in pseudo-code. You can use any syntax, butmake it easy to understand what you are doing.
desklib-logo
You’re reading a preview
Preview Documents

To View Complete Document

Click the button to download
Subscribe to our plans

Download This Document