FOI 2017 TUE13: Movie Recommender System Analysis and Implementation

Verified

Added on 2019/10/09

AI Summary

This assignment focuses on building a movie recommender system using spreadsheet formulas and data analysis. Students are tasked with analyzing movie data, including genres, ratings, and actor information, to compare performance and identify trends. Part 1 involves using spreadsheet formulas to complete data manipulation tasks, answer questions, and visualize the data. Part 2 requires students to implement a simple movie recommendation system based on a defined similarity score, considering factors like actors, directors, and genres. The assignment emphasizes the application of data processing, formula usage, and visualization techniques to create a functional movie recommendation system. The final submission includes the spreadsheet document with solutions and analysis.

Aims
In this phase you will:
 carry out basic data manipulation using spreadsheet's formula
 implement a simple recommender system in spreadsheet
Introduction
In 2014, Hollywood has released more than 600 movies: that is, one movie every
day, plus 4 more during the weekend for you to watch. In the last decade, the way
consumers access such a huge collection of movies has changed. With the
proliferation of the high speed Internet, online movie rental (either via DVD or
streaming) companies are killing the physical video rental shops.
Launched in 1999, Netflix--the largest US online movie rental then--announced
its first billionth DVD delivery in February 2007. It claimed to spend about $300
million a year on postage alone. Around 2008, Australia's BigPond
Movies and QuickFlix had thousands of DVDs on offer only a mouse click away.
Subsequently, in 2009, Netflix delivered its 2 billionth delivery in BluRay. Recently, in
March 2015, Netflix officially launched its streaming service in Australia, competing
with existing providers like Stan and Presto.
Parallel to this trend, consumers--professional critics or moviegoers---have been
sharing and exchanging information about movies online. The IMDb, the largest
online movie database, stores information about movies, actors, directors, and any
other information you can think of. There are many other similar sites that provide a
comprehensive collection of reviews and critics: metacritic, rottentomatoes, Yahoo!
Movies, and movie review query engine. All of these sites allow people to
collaboratively discuss and rate their favourite movies online.
One important function of these sites is to help people to select movies they like by
looking at lists of movie reviews from around the world. This is a case of information
filtering. Online recommendation systems are becoming important information
filtering tools as we are overwhelmed by digital content. Pandora's music
recommendation system and Amazon's book recommendation are such examples.
These systems are very useful, not only for the audience to find their way through
millions of options, but also for business to up-sell their products (Do you want to
upsize your Big Mac meal?). It is so important that Netflix offers USD$1,000,000 to
anyone who can improve their movie recommendation engine.
Recommender system
A 'recommender system' presents a list of items (books, movies, music) that are
likely to be of interest to a user, based on what it knows about that user and the
items. It makes use of intrinsic properties of the large collection of items (the content-
based approach), the user's social environment (the collaborative filtering approach),
or a combination of both. There are many ways to predict what a person would like,
but there is no one correct way - as billions of dollars spent on marketing will attest
to.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

In a 'movie recommender' system, for example, a content-based approach may
employ information such as actors, directors and/or movie genre. The combination of
what the audience thinks about the movies and the audience profiles can be utilized
in the collaborative filtering approach (two users with the same profile are more likely
to enjoy the same movies). As people's borrowing/consuming habits get recorded,
the amount of data that can be used in the system only increases.
In this assignment, you will build a simple version of such a system, which uses
information about movies to find similar movies and produce recommendations.
Tasks
Data Sets
The given data set contains information about 291 popular feature films produced
from 1969 to 2008. The data set captures data such as the movie name, censorship
rating, genre, director, actors, score from various critics, and worldwide gross.
(Attached)
Part 1 - Basic Task
Using spreadsheet formulas, complete the following tasks and answer all the
relevant questions.
1. Compare the performance among movie genres based on the worldwide gross of the
movies with the same genre. Ignore genres that have less than 5 movies. Visualise
the comparison using appropriate chart type. Which three genres are the worst
performers? Compare the performance of movie ratings (PG, G, etc) based on the
same measure. Again, ignore ratings that have less than 5 movies. Visualise the
comparison again. Do PG-rated movies generally earn better than R-rated movie?
2. Which three of the given reviewers in the movie data (Washington Post, Chicago
Sun-Times, The New York Times, LA Weekly, Los Angeles Times, Rolling Stone,
Wall Street Journal, Entertainment Weekly, Empire, Variety, Salon.com, The Onion
(A.V. Club), TV Guide, Slate) are the most consistent with the 'metascore'? You can
do this by calculating the average gaps between the metascore value and the score
from a particular reviewer. Visualise the average gaps of all reviews to see how
close they are to metascore. Consider 0 as an empty score. State your assumption
when dealing with missing data.
3. Present a table of actors versus genres to show the number of movies in each genre
that a particular actor is featured in. Show only actors which have appeared in at
least 6 movies. Colour the cells that contain these counts so that higher counts can
be distinguished from lower counts. Include as the last column the total number of
movies the actor is featured in. Correspondingly, include as the last row the total
number of movies within each genre. Present the actor names in descending order
(based on the movie count).
... genr
e ...
...

... genr
e ...
actor
...
You are not required to provide a single-cell solution, where you try to cramp all the
required formulas within one cell. Find a balance within succinctness and legibility.
Part 2 - Simple Recommender
Student groups should implement a simple movie recommendation system in Google
Spreadsheet, based on the given dataset of movie attributes and reviews. The aim of
the recommender is to provide a user with a list of movies that they might like. This is
best done by comparing the user's query with the existing data set. You need to
design a function that measures the similarity of two movies. If a user likes a
particular movie, then it is most likely that he/she likes similar movies.
The next similarity score assumes that the audience generally likes movies featuring
their favourite actors, as well as movies from the same director, genre, and genre.
FavActor Similarity score, between two movies, is defined as:
 Start with a base similarity score of 0
 For each actor that the movies share, add 3 to the similarity score
 If the two movies have the same director, add 2 to the similarity score
 If the two movies are of the same genre, add 1 to the similarity score
 Return the final similarity score
 If there is a tie, favour the newer movies.
 If there is still a tie, use the metascore to break this.
Use this FavActor Similarity score for your movie recommendation. User will enter the
name of a movie in a cell, and your formula would automatically display a list of
movies ranked by similarity scores defined above. Use exact search in finding the
movie name and you are not required to support ambiguous/partial movie name
query. You can include the movie entered by the user in the results.
Hints: consider using MATCH, INDEX, ARRAYFORMULA functions.
Marking Scheme
Mainly based on the correctness and the quality of your data processing and
visualisation. They include:
 Assumptions and problem understanding: stated necessary assumptions if
applicable, implementation is consistent with the tasks and/or assumptions
 Structure: the structure of the spreadsheet is reasonable and logical, appropriate use
of named ranges
 Data processing: correct data set has been used (no missing data), efficient data
processing (no redundant computation)

 The use of formulas is clear and of reasonable length: no manual data processing
(formula should be used appropriately as much as possible), robustness (the
spreadsheet could be minimally adjusted to adapt to the change in the data)
 Output: produces intended results, appropriate choice of visualisation to support the
intended goal
The marking sheet is attached
Submission
Solutions to Part 1 and 2 should be provided in a single Spreadsheet document. Use
one sheet for each question (you may use multiple sheets for a single question if it
helps you to provide an answer systematically but name your sheet systematically,
e.g. q1_step1, q1_step2, ...). While there is no limit in the number of sheets that you
can create, please use a reasonable amount.
Make sure you put your student login id in the solution for the individual question,
e.g q1_step1 (ivow).
Name your Google Spreadsheet document using the name
FOI_2017_TUE13_8
Share your Google Spreadsheet to us. Use Share > Sharing Settings > Add
People command, and put down the following emails: informatix.one@gmail.com and
the email of your tutor. Make sure you give us edit access, not only view access.
Make sure you do not change the spreadsheet after deadline because we will
overwrite any changes made with the last version just before the due date.
Resources
How to recommend movies to users is part of a highly active field of research
collectively known as 'Recommender Systems'.
Information about movies and recommender system
 MPAA Statistic
 Netflix
 Collaborative Filtering
 A Guide to Recommender Systems
Online movie databases
 imdb
 metacritic
 Yahoo! Movies
 movie review query engine